Classification-based Multi-task Learning for Efficient Pose Estimation Network

강동오(카카오엔터프라이즈), 노명철(카카오엔터프라이즈), 김한샘(카카오엔터프라이즈), 김용현(가우스랩스), 이성환(고려대)

International Conference on Pattern Recognition (ICPR)



Human pose estimation is an interesting and underlying topic in various fields such as action recognition and human-computer interaction. Although many methods have been developed recently, they are still far from perfect in accuracy and speed at a time. In this paper, we propose a Classification-based Pose Estimation Network with Multi-task Learning (CPENML) based on the low-resolution feature map to improve accuracy and inference time simultaneously. The proposed CPENML consists of two ideas. Firstly, novel proposed keypoint and offset estimation tasks based on classification achieve better performance than regression. Secondly, the proposed Multi-Scale Network (MSN) makes robust feature maps and balances the keypoint and offset tasks to maximize performance. To prove the effectiveness of the proposed method, we conduct ablation studies on the COCO dataset for proposed ideas. Compared to benchmarks, we demonstrate the superiority of our proposed method on COCO dataset in terms of inference time and accuracy.