ModSkill: Physical Character Modularization

1University of Pennsylvania, 2The University of Hong Kong

We propose a modularized skill learning framework, ModSkill, which incorporates body-part-level inductive bias for motor skill acquisition. ModSkill decouples full-body motion into skill embeddings for controlling individual body parts. Learned from large-scale motion datasets, these modular skills can be combined to control a simulated character to perform diverse motions and seamlessly reused for various downstream tasks.

Abstract

Human motion is highly diverse and dynamic, posing challenges for imitation learning algorithms that aim to generalize motor skills for controlling simulated characters. Prior methods typically rely on a universal full-body controller for tracking reference motion (tracking-based model) or a unified full-body skill embedding space (skill embedding). However, these approaches often struggle to generalize and scale to larger motion datasets. In this work, we introduce a novel skill learning framework, ModSkill, that decouples complex full-body skills into compositional, modular skills for independent body parts, leveraging body structure-inspired inductive bias to enhance skill learning performance. Our framework features a skill modularization attention mechanism that processes policy observations into modular skill embeddings that guide low-level controllers for each body part. We further propose an Active Skill Learning approach with Generative Adaptive Sampling, using large motion generation models to adaptively enhance policy learning in challenging tracking scenarios. Results show that this modularized skill learning framework, enhanced by generative sampling, outperforms existing methods in precise full-body motion tracking and enables reusable skill embeddings for diverse goal-driven tasks.

Method

Left: We extract modular skills from a large-scale motion dataset using a motion imitation objective, enabling low-level controllers to control various body parts of a physically simulated character. Active skill learning, through adaptive sampling from an off-the-shelf motion generation model, further enhances policy performance. Right: The learned modular skills can be transferred to downstream tasks by freezing the low-level controllers and training a high-level policy with task-specific rewards.

BibTeX

@misc{huang2025modskillphysicalcharacterskill,
        title={ModSkill: Physical Character Skill Modularization}, 
        author={Yiming Huang and Zhiyang Dou and Lingjie Liu},
        year={2025},
        eprint={2502.14140},
        archivePrefix={arXiv},
        primaryClass={cs.CV},
        url={https://arxiv.org/abs/2502.14140}, 
  }