Synergetic Reconstruction from 2D Pose and 3D Motion for Wide-Space Multi-Person Video Motion Capture in the Wild

Yosuke Ikegami²

¹NTT DOCOMO

²The Univerisity of Tokyo

Abstract

Although many studies have investigated markerless motion capture, the technology has not been applied to real sports or concerts. In this paper, we propose a markerless motion capture method with spatiotemporal accuracy and smoothness from multiple cameras in wide-space and multi-person environments. The proposed method predicts each person's 3D pose and determines the bounding box of multi-camera images small enough. This prediction and spatiotemporal filtering based on human skeletal model enables 3D reconstruction of the person and demonstrates high-accuracy. The accurate 3D reconstruction is then used to predict the bounding box of each camera image in the next frame. This is feedback from the 3D motion to 2D pose, and provides a synergetic effect on the overall performance of video motion capture. We evaluated the proposed method using various datasets and a real sports field. The experimental results demonstrate that the mean per joint position error (MPJPE) is 31.5 mm and the percentage of correct parts (PCP) is 99.5% for five people dynamically moving while satisfying the range of motion (RoM).

Paper

[Journal] [arXiv]

Dataset

YNL-MP dataset, Version 1.1 [12 Oct, 2020]
Copyright 2020, The University of Tokyo

Latest Update (12 Oct, 2020)

add test9 (hug motion)

change the calibration parameters [mm] to [m]

This dataset contains eight test data. In each test, 1-5 subjects were recorded by 8 cameras (4 view points) at 60Hz. For the 3D ground-truth, 1-2 of them were measured by optical motion capture system at 200Hz simultaneously. In addition, to evaluate our work, we share the results. The results include joint position (trc), joint angle (bvh), and joint position without considering range of motion.
The dataset is shared only for research purposes. Redistribution of the dataset or its modified version is strictly prohibited without a written permission by the copyright holder.

Download YNL-MP dataset (18.9GB)

Media

東京大学・NTTドコモ (Press release): 複数人ビデオモーションキャプチャ技術を開発 -装置や場所の制約を受けずスポーツの試合会場などで手軽にモーションキャプチャが可能に- (Japanese), Jan. 2020 [Link] [Link]
日本経済新聞: NTTドコモ、映像で骨や筋肉の動き計測　東大と開発 (Japanese), Jan. 2020 [Link]
Impress Watch: 試合会場丸ごと“モーションキャプチャ”。東大とドコモ (Japanese), Jan. 2020 [Link]
ITmedia NEWS (Seamless): カメラ映像から複数人の動きを骨格・筋レベルで同時検出　東大とドコモがモーションキャプチャ技術発表 (Japanese), Jan. 2020 [Link]
『子供の科学 4月号』: 複数人のモーションキャプチャをカメラ映像だけで行う!? (Japanese), Mar. 2020 [Link]
日本経済新聞: 大学はスポーツ研究所 -選手の能力向上、ウエアも開発- (Japanese), Mar. 2020 [Link]
『NTTドコモテクニカル・ジャーナル vol.28』: カメラで人の動きを三次元デジタル化「VMocap」 (Japanese), Apr. 2020 [Link]

Related publications

Takuya Ohashi, Yosuke Ikegami, Kazuki Yamamoto, Wataru Takano, Yoshihiko Nakamura. Video Motion Capture from the Part Confidence Maps of Multi-Camera Images by Spatiotemporal Filtering Using the Human Skeletal Model. IROS. 2018.

[Project page]

[Paper]

[Video]

Citation

@article{ohashi20vmocap,
  title = {{Synergetic Reconstruction from 2D Pose and 3D Motion for Wide-Space Multi-Person Video Motion Capture in the Wild}},
  author = {Takuya Ohashi and Yosuke Ikegami and Yoshihiko Nakamura},
  journal = {Image and Vision Computing},
  volume = {104},
  pages = {104028},
  year = {2020}
}

Contact

Email: {ohashi,ikegami,nakamura}@ynl.t.u-tokyo.ac.jp
Nakamura and Yamamoto Lab