Move AI Gen 2 vs Xsens: Traditional Mocap killer or just hype? Part 1
- Sergey Vereschagin
- Mar 19
- 19 min read
Updated: Mar 25

Move AI Gen 2: technical characteristics and principle of operation
Move AI Gen 2 is an updated markerless motion capture system that Move AI introduced in March 2025. It is based on computer vision, AI, and physical models to determine a person's pose from video. The main advantage is the ability to work with regular cameras, such as smartphones, without special suits or sensors. Gen 2 has several new models for motion processing: s2 and s2-light for single-camera recording, m2 and m2-xl for multi-camera recording, and enhance for motion enhancement and correction.
Move AI technology is based on markerless tracking: AI algorithms analyze video, determine joint position and reconstruct 3D movement. Gen 2 has improved spatial perception and animation stability. For example, the s2 model (for one camera) has become much more accurate compared to s1 - it now better estimates depth, which helps to get rid of “foot slip” and improves foot contact with the surface. Another important improvement is that the system is now able to track multiple people even from a single camera.
The m2 (multi-camera) model is designed to accurately capture movements from multiple cameras. According to the developers, it provides the same quality as mocap optical systems, but at a much lower cost. The m2-xl version expands the capabilities of multi-camera tracking, allowing to work on huge sites - up to stadiums. This model is able to track more than 20 people at a time and can effectively cope even in situations where people partially overlap each other.
Another important innovation in Gen 2 is the enhance model. It uses generative neural networks to restore and enhance movements in cases where the original data is corrupted or partially lost. For example, the system can automatically remove artifacts, refine movements in case of overlaps and correct tracking errors.
Technically, Move AI Gen 2 combines multi-camera neural reconstruction (similar to Neural Rendering) and temporal stability models that smooth trajectories and make movements more natural. This solves the classic problems of markerless systems: tracking loss during overlaps, jerks and noise in animation. As a result, Gen 2 quality is as close as possible to traditional mocap methods.
Limitations and possible problems of Move AI Gen 2
Despite its advanced features, Move AI Gen 2 still retains the limitations typical of markerless systems. Capture quality is key: reliable tracking requires good cameras (typically 2-6 cameras or iPhones) and a well-lit, spacious room.When using a single camera, the system has to reconstruct depth and 3D coordinates from only one projection, which complicates the task and can compromise accuracy. In Gen 2, this is partially compensated for by improved algorithms, but the lack of stereo data is a physical limitation that cannot be completely eliminated.
In multi-camera setups, it is important to properly calibrate the cameras, especially in large venues. The system must accurately understand their locations, otherwise motion capture accuracy will suffer.
Occlusions (overlaps) remain a major problem. If a person covers their face with their hands or falls to the floor, partially out of camera range, tracking accuracy is reduced. In complex cases, such as tumbling, where limbs are hidden from view, the system can lose tracking or produce erroneous data. Gen 2 has improved overlap handling, especially in the m2-xl model, which is designed for complex scenes with many interacting objects. Also added algorithms for additional drawing of hidden joints, but it is still difficult to completely eliminate errors.
Quick movements are another problem. Sharp turns, kicks, jumps - anything that causes blurring in the video or abrupt pose changes between frames can cause reconstruction failures. High-frequency movements, such as quick kicks or jumps, are traditionally difficult for computer vision and AI - there can be loss of accuracy or delays in pose detection. Gen 2 tries to compensate for this with higher frequency processing and physics, but it's impossible to get rid of artifacts completely. Extremely fast movements may still cause jerking or limb shaking. In such cases, you have to clean the data additionally, either using the built-in enhancement model or manually at the post-processing stage.
Multiple actors and interactions. Gen 2 is able to track multiple people at once, even from a single camera, but in practice this doesn't always work perfectly. The more people in the frame, the higher the risk of overlap and confusion. Without markers, the system can confuse skeletons, especially if people are in close contact - for example, in fight scenes or dancing. In such cases, algorithms can either lose individual characters or “glue” their movements together, creating tracking errors.
Computer vision still struggles to handle dense interactions, especially when participants often overlap. The more occlusions, the more accuracy drops, and error correction requires more manual work. In lab tests, Gen 2 shows tracking of dozens of people on a large set, but in a confined space where actors are constantly interacting, the system can still fail, so additional validation and data correction is needed.
Delayed results and dependence on computational resources. Unlike inertial systems that transmit movements in real time, markerless technologies require processing, which means delays are inevitable. Move AI works on the principle of first recording video, then uploading it to the cloud for processing. This means that time passes between filming and receiving the animation. Users note that a minute of video can take about 20 minutes of processing, and without stable internet, uploading files at all can be a problem.
The company is developing a Move Live solution for real-time motion streaming, but so far the system is designed for offline processing. That means there's no instant feedback during filming. If something goes wrong, the error will only become apparent after processing, and the scene will have to be re-shot blindly. Unlike Move AI, traditional systems like Xsens allow you to see the results immediately and correct tracking on the fly. Therefore, when working with Move AI, it is important to allow time for post-processing and, if possible, shoot additional takes.
Cost and licensing. With Move AI, you don't need a million-dollar investment at the start - just cameras (e.g., smartphones) and a subscription to the service. But if you use the system regularly, the costs can accumulate. Move AI has different pricing plans: according to users, a Pro license for two actors costs about $7,000 per year, and early versions of the pricing model included a price of about $4,000 per hour of processed data.
That is, if you shoot infrequently, Move AI is indeed cheaper than traditional mocap systems. But with constant use, the subscription cost can equal the price of a good inertial suit. In addition, the basic packages may have a limitation on the meterage of animation - for example, up to 30 minutes of data can be processed without additional fees, and then you need to buy extensions.
In addition to the subscription there are hidden costs: the purchase of several high-quality cameras, tripods, lighting, a powerful PC for local work with video. Although this equipment is not tied to Move AI and can be used in other tasks, its costs are still worth considering. As a result, the system is convenient for those who need mocap without complex equipment, but for permanent work, the financial model should be calculated in advance.
Output by constraints
Move AI Gen 2 has seriously pushed markerless technology forward: shaking, foot sliding, pose errors - all of this has become noticeably better thanks to the new AI models. But the video analysis approach itself still has limitations. It's important to consider lighting, camera placement, and planning of takes for quality results. In complex scenes with full overlaps or extreme movements, errors are possible, so the system cannot be taken as a completely error-free tool.
Professionals working with Move AI usually develop their own ways around problems: conduct test recordings, use more cameras to cover all angles, put markers in the room for calibration, and then refine the animation manually - filter trajectories, correct tracking errors, replenish lost data. If you approach the process competently, you can get very accurate and natural animation, but it is important to understand the limits of the technology and be ready for additional processing.
Move AI Gen 2 data accuracy and quality compared to traditional mocap systems
The main question is how accurately Move AI Gen 2 can compete with classic markerless (optical) and inertial mocap systems. Judging by tests and reviews, in basic scenarios (walking, running, turns, gestures without complex overlaps) the level of accuracy of markerless-technology is comparable to professional suits.
For example, a report by Han Yang (Unreal Han) shows that simple movements - walking, breathing, body turns, hand work - look natural and smooth when processed in Move AI. Moreover, at SIGGRAPH 2022 Electronic Arts conducted an experiment: the same actor was recorded in two ways - through Move AI and through a traditional mocap system. The difference in detail and accuracy of movements turned out to be minimal, which indicates that the algorithms are highly developed. Move AI CEO Tino Millar states that their technology makes it possible to capture “super high-quality” movements using only conventional video cameras. Specifically, the Gen 2 multi-camera mode (m2) is positioned as an alternative to optical systems, but at a lower cost and without complex installation.
However, under extreme loads, traditional mocap systems still win out. If there are many obstacles in the scene, actors interact closely or move outside the boundaries of the shooting area, Move AI loses accuracy. For example, in occlusions (if the actor tumbles, lies on the floor, goes out of the field of view of the cameras) markerless-system can fail, while inertial suits like Xsens continue to work stably.
AI tracking also has problems with high-speed movements: sharp kicks, jumps, and rapid pose changes sometimes result in skipped or blurred animations. In contrast, inertial systems capture movements with high frequency and without visual artifacts.
Conclusion: in standard conditions Move AI Gen 2 almost catches up with classic mocap systems, but in complex scenes (strong occlusions, contact with surfaces, extreme dynamics) marker and inertial technologies are still more reliable.
Optical marker systems (e.g. Vicon) are still the leaders in terms of absolute accuracy. They record marker positions with millimeter accuracy, whereas markerless technologies, including Move AI, work on the basis of statistical skeletal models and produce approximate poses. In most cases, the difference is subtle, but if the task requires scientific precision (e.g., biomechanical analysis or medical research), calibrated systems with sensors remain more reliable.
That said, markerless animation sometimes wins out in realism. Creators point out that movements derived from video contain micro nuances that make them more lifelike. For example, director Ilya Nodia believes that the key advantage of Move AI is the naturalness of micro-movements, which can be lost when using costumes. One expert who compared Move AI with Xsens noted that after minimal processing, the AI animations looked even better, including the detail of the fingers.
In the movie and gaming industry, the quality of Move AI Gen 2 is already so high that a viewer won't be able to tell if the capture was done via a markerless system or traditional mocap. The difference only becomes noticeable in extreme conditions or when technically analyzing the data.
It turns out that Move AI Gen 2 is capable of providing studio-quality mocap and can already compete with traditional systems. However, for complex tasks (acrobatics, multiple objects, shooting with instant feedback) marker and inertial technologies remain more stable. More and more studios are opting for a combined approach: markerless is used for quick iterations and creativity, while suits are used for mission-critical shoots. The gap is closing, and Move AI is no longer just an interesting experiment, but a full-fledged tool in professional production.
Aplicación de Move AI Gen 2 en la industria
Move AI Gen 2 is already being used extensively in movies, gameplay, advertising and virtual events. Here are a few examples that show how markerless-technology is changing the approach to motion capture.
Virtual concerts and events
One of the most striking cases was the collaboration between Sony Music and Move AI to create a virtual concert for Fortnite. Singer Myles Smith performed digitally and his movements were animated using Move Pro (multi-camera capture). The Racquet Studios team working on the project achieved smooth and accurate animation without the use of traditional mocap hardware.
The key advantage is speed and flexibility. A traditional mocap would have required weeks of preparation: calibration, costume setup, data processing. Move AI allowed to record, process and integrate animation much faster, which is critical for projects with tight deadlines. This case showed that if properly organized, markerless capture can completely replace markerless mocap even in big-budget productions.
Advertising and sports content
Move AI Gen 2 is also actively used in advertising and sports projects. Nike used markerless-mocap in its Dri-FIT ADV campaign. Although the details have not been revealed, the fact that a major brand has trusted Move AI technology to work with athletes confirms its quality. Most likely, the system helped to quickly capture athletes' dynamic movements to create impactful visuals.
Music videos have also become a field for experimenting with markerless-technology. Move AI has been known to be used in music videos, including projects with Grimes. Electronic Arts has been testing the system to speed up the production of animations in games. This is an important signal for the entire industry, as EA is one of the biggest players in the market using traditional mocap systems.
In the entertainment industry, Move AI is seen as a way to “democratize” motion capture, making it accessible not only to large studios with expensive equipment, but also to small teams that previously couldn't afford AAA-level mocap.
Independent filmmakers and studios
Move AI Gen 2 is becoming an important tool for independent filmmakers, allowing them to create quality mocap without multi-million dollar investments.Ilya Nodia, the aforementioned director and animator, is one of the professionals who switched to Move AI for creative freedom. Initially, he used off-the-shelf animation libraries, but at some point, he needed unique moves that fit his script. Buying traditional mocap equipment was too expensive, and Move AI became a real “breakthrough” for him: now he can record complex stunts himself and immediately apply them to his characters.
Ilya compares the emergence of Move AI to Unreal Engine, which made 3D graphics accessible to small teams. Now high-quality mocap doesn't require huge budgets, which opens up new opportunities for animation and movies. The main thing it values is the accuracy of micromotion. Markerless-technology captures the smallest nuances: changes in balance, limb movements, natural body fluctuations, which makes the characters on the screen more alive.
Another convenience is direct integration with PC. You can record directly through the iPhone app and then immediately synchronize it with the PC scene, which speeds up the process and allows you to focus on directing rather than on technical complexities.
Game developers are also actively testing Move AI Gen 2 and sharing their experiences on Unreal Engine and Reddit forums. One of the experts used the system for a short movie: he installed 6 GoPro cameras (4K@60fps) around an actor, processed the data for 20 minutes, and then imported the animation into Unreal Engine and applied it to MetaHuman. He was satisfied with the result - minimal editing, only light smoothing to remove small shakes. The result was better than the animation from his Xsens suit with gloves. After that, he decided to sell the Xsens and switch completely to Move AI.
During his tests, he tried complex scenes: two actors at the same time, interaction with objects (stairs, stools), even a small fight. The system coped - not perfectly, but well enough. The animation was a bit noisy during the fight, but it was easily corrected by antialiasing, and overall the result was workable. This is interesting, because markerless-technologies were not seen as suitable for complex interactions before, and here it turned out that they could handle it quite well.
Of course, he used a powerful rig with 6 cameras and a paid Pro license (about $7,000 a year for two actors), but for a small production it's still cheaper than a few mocap suits.
Reviews and impressions
Professionals who have tried Move AI note that the system significantly lowers the barriers to motion capture. You can now record actors anywhere, without having to put costumes or sensors on them. This opens up new possibilities - for example, you can film a dancer right on the street or in an actual set by simply arranging smartphones.
Experts also talk about speeding things up. At Sony Music and Racquet Studios, abandoning marker-based mocap has reduced the time from recording to implementing animation in a project from weeks to a few days. For creators, flexibility is especially important: you can spend an evening shooting, and the next day start editing in Unreal or Unity.
Of course, there are nuances. Beginners may need time to get used to camera settings and angles - the first takes may not turn out perfectly. But after a little practice, the process becomes stable.
As for the cost, most users consider it justified. Subscription costs a lot, but it's still cheaper than renting a mocap studio or buying several costumes, especially for small projects.
Critics point out that Move AI has yet to replace traditional systems in the most complex scenarios. Vicon and Xsens are still the standard for stunt scenes and tasks that require maximum precision. However, the trend is changing: more and more professionals are considering AI capture as a real alternative or additional tool.
The conclusion from the reviews is simple: if the scene is relatively simple or moderately complex, Move AI can handle it and save time and money. If the movements are complex and accuracy is critical, you should either combine methods or use a traditional mocap for insurance.
Move AI Gen 2 stability during complex movements
One of the main improvements of Move AI Gen 2 is increased stability in complex movements: sharp rotations, jumps, working with props and interaction of several actors. The developers have emphasized these scenarios by refining the algorithms in the m2 model - now the system better tracks the kinematics of the spine and shoulder girdle (6DOF for shoulders) and gives smoother and more stable animation even in dynamic scenes.
The performance in high occlusion conditions has been particularly refined. In the m2-xl version the algorithms are adapted for scenes where many people or objects overlap each other. This is probably due to improved neural network processing of multi-camera data: if one camera “loses” a joint, the system uses information from other angles and predicts movement based on previous frames.
In practice, the possibilities have already been tested. For example, in an experiment involving grappling (a style of close combat in wrestling) of two people, Move AI was able to capture the movement, although a little post-processing was required to smooth out the jitters. Nevertheless, the very ability to record a markerless fight scene is a major step forward, as such movements have traditionally been considered unsuitable for markerless systems.
It is important to realize that for now the system works best under relatively controlled conditions. If the scene includes chaotic, ultra-fast movements (e.g. sharp punches or hard parterre in MMA), failures are possible. But progress is evident: if markerless mocap was not previously considered for fight scenes at all, it can now record them with acceptable quality, albeit with some adjustments.
Sharp turns and spins. For a single camera, sharp turns, especially when the character's back is turned, pose a serious problem. In such cases, the system can face ambiguity - it's difficult to determine the exact position of joints if the face or chest temporarily disappears from view.
In Gen 2, this challenge has been improved with more advanced algorithms trained on large data sets. The multi-camera setup also helps solve the problem: if cameras are arranged in a circle, there is always at least one that captures the front of the body, reducing the likelihood of errors.
Users note that torso rotations and flips are handled stably if there is good camera coverage and lighting. However, for very fast rotations (such as dance spins or breakdancing elements), small jerks may appear. This is because jerky movements can blur frames and make it harder for algorithms to accurately detect pose.
To minimize such errors, Gen 2 uses interpolation and physical constraints. If a pose is blurred over several frames, the system prevents the body from “teleporting” and smooths out the motion trajectory. As a result, the animation subjectively looks smoother and more natural than in Gen 1, where such movements were much less stable.
Jumping and flying is another challenge for markerless grip. When your feet are off the ground, the system loses its fixation on the surface and errors become more likely. Unlike inertial suits like Xsens, which capture accelerations using accelerometers, Move AI only sees a picture - a person in the air. If legs are blurred due to speed or a shadow gives a false clue, the algorithm can get it a little wrong.
In Gen 2, jumps are tracked better, probably due to the use of a physics model where the system predicts the trajectory of the center of mass. Normal jumps (up, forward) are handled decently, but if the movement becomes more complex - such as a very high jump or a flip - inaccuracies can occur. Sometimes the system incorrectly determines which leg touched the ground first, or the bending angle of the knees may be slightly jittery when landing.
For game character animations, these errors can be easily corrected manually by tweaking keyframes. And the built-in Gen 2 Enhance tool can potentially correct such moments itself, analyzing the physics of the fall and compensating for small artifacts.
Props and objects are one of the unique advantages of Move AI. Unlike traditional mocap systems that only record the actor's movement, here the algorithms can analyze the environment as well. For example, if a character is holding a ball, the system is able to track its position. Of course, this works better with simple objects - a ball, a box, large objects like a chair or a weapon.
But for sudden movements with props, such as swinging a sword or throwing an object, you can't rely 100% on automatic tracking yet. Small details can get lost, and it's easier to animate such objects separately. However, the actor's movements with a large object are recorded normally, which gives flexibility in post-processing.
Another nuance is occlusions due to objects. If the actor holds a shield in front of him, hiding his torso and face, the system may lose accuracy in pose detection. In such cases, an additional camera from a different angle or subsequent manual correction of the data helps. But the approach itself is convenient because there are no limitations: Move AI simply captures everything it sees, and then the data can be used depending on the project's objectives.
Stability of postures and contacts is an important aspect of markerless technology. Previously, one of the main problems was the “sliding” of the feet on the floor. Since the system does not sense the physical resistance of the surface, the feet could move even if the character should stand still.
In Gen 2, this point has been significantly improved. Algorithms began to better detect foot planting (the moment of stable contact with the ground), and users confirm that the problem of slipping has been noticeably reduced. However, in some cases, small artifacts may still appear - for example, if the character is leaning heavily or resting his hands on the wall.
These errors are corrected by the built-in postprocessing Move AI, which applies IK-corrections, or by means of engines. For example, when importing animation into Unity, it is recommended to enable Foot IK, so that the system additionally fixes the position of the feet. As a result, the character stands confidently on the ground, and the scene looks natural.
Overall, Gen 2 is much more reliable in complex scenes compared to previous markerless solutions. It better holds tracking during fast movements, overlaps and object interactions, thanks to multi-camera data and motion prediction algorithms.
But physical limitations are nowhere to be found: if the cameras don't physically see the right joints (e.g., during a sudden jerk or full body occlusion), the system will be forced to guess the data, and this can lead to inaccuracies. So far, no AI system can replace sensors attached to the body in conditions of complete chaos. For example, if an actor falls into water or the scene is shrouded in smoke, Move AI may lose tracking, whereas an inertial suit will continue recording without issue.
But for typical complex movements - jumps, quick turns, working with props - Gen 2 already provides production-level stability. Small errors that occur are not critical and are treated as a normal part of the animation Pipeline, similar to the way data from mocap suits is corrected.
Move AI Gen 2 integration with Unreal Engine, Unity and other engines
One of the main advantages of Move AI is convenient integration with popular engines and DCC packages. After video processing, the user receives animation data in FBX format (with recorded skeletal motion) or skeletal transformation data in JSON/CSV for biomechanics analysis. Move AI gives several ways to import and provides instructions and plugins to make the process as easy as possible.
In Unreal Engine workflow is very clear: import FBX with animation, bind it to the character skeleton, retargeting to MetaHuman or any other rig. Move AI adapts the animation to Humanoid-standard in advance, which simplifies retargeting. In Unreal, you just load FBX, create an Animation Sequence and assign it to a skeletal mesh. If you need to customize the bones, Retarget Manager or IK Retargeter is used, especially if the animation is going to MetaHuman. Judging by user reviews, importing goes without problems: downloaded, imported, retargeted - the character moves without additional tambourine dancing. One of the examples: the animation lay on MetaHuman perfectly, the author only replaced the meshes with armor, and everything worked. This shows that Move AI is fully compatible with Epic Games' skeleton system.
In Unity the process is similar: import FBX, set Animation Type = Humanoid, apply it to the character via Animator Controller. The Move AI documentation has some useful tips on Unity nuances - for example, if the engine drops FPS from 60 to 30 during import, you can simply double the Speed in AnimationClip to get it back to normal. It is also recommended to enable Foot IK in Animator to prevent feet from sliding, and disable excessive compression of the animation if it gives artifacts. In general, Move AI animations are no different from those created in Xsens or Mixamo, in terms of integration - they are regular bone transformations that can be edited, combined with other animations and used in game mechanics.
Other platforms. In addition to Unreal Engine and Unity, Move AI has prepared instructions for Maya, Blender, Cinema 4D, NVIDIA Omniverse and other DCC packages. For example, Blender has a special plugin that allows you to export animation directly from the browser to a character without additional steps. Maya supports HumanIK retargeting, which makes it possible to quickly adapt movements to the animated character's rig.
Move AI even integrates with virtual production and visualization platforms. Mocap animations can be imported into NVIDIA Omniverse, making them available to work with digital doppelgangers and simulations.
Overall, the company is clearly aiming for maximum compatibility: no matter what program or engine an animation is needed in, it can be imported and adapted seamlessly.
Real-time (Live). While Move AI is mostly offline, the company is also developing Move Live 2.0, which allows for real-time motion capture. Judging by the documentation, this requires high frame rates: 60fps for calibration and 110fps for stable capture. This means you need high-speed cameras and probably a powerful local server with a GPU to process the data.
Move Live is able to stream animations directly into Unreal and Unity, similar to how Xsens Live works through a plugin. Move AI also has a Move API that allows developers to embed single-camera capture into their apps.
So far, live mode isn't as developed, but its arrival shows that Move AI is moving to compete with Xsens, Vicon, and other systems that already offer live capture. This is especially promising for broadcast, VR/AR and interactive applications where instantaneous motion transfer to digital environments is important.
Practical examples already confirm that the integration of Move AI with engines works without problems. For example, in the virtual concert for Fortnite (described above), character animations were made in Unreal Engine based on Move AI data. MetaHuman animations have also been successfully tested by enthusiasts, and in some projects Move AI is used for pre-visualization: a director can quickly record a scene with an actor, process the data and upload it to Unreal for rough editing, without wasting time booking a mocap studio. This gives flexibility and speeds up production considerably.
From a technical standpoint, Move AI is compatible with standard skeletons (HumanIK, Epic Skeleton, etc.), which means minimal problems with importing into game engines. Most user comments are about fixing minor defects - for example, adjusting IK for legs or adjusting frame rates - but all of this is easily corrected during the import phase.
The threshold of entry into the system is very low. Even beginners, having downloaded Move AI trials, can literally import them into Unity or Unreal in an hour, following the instructions. This is in stark contrast to older mocap systems that required special plugins and complex format conversion. Move AI was originally designed to be as end-user friendly as possible.
In the next installment, let's move on to a direct comparison between Move AI Gen 2 and Xsens, one of the leaders in the traditional mocap market. We'll look at the strengths and weaknesses of each solution, as well as the areas where Xsens is still out of the competition and how it can maintain its position in the AI era. Stay tuned for updates.
Comentários