We propose pose-guided multiplane image (MPI) synthesis which can render an animatable character in real scenes with photorealistic quality. We use a portable camera rig to capture the multi-view images along with the driving signal for the moving subject. Our method generalizes the image-to-image translation paradigm, which translates the human pose to a 3D scene representation --- MPIs that can be rendered in free viewpoints, using the multi-views captures as supervision. To fully cultivate the potential of MPI, we propose depth-adaptive MPI which can be learned using variable exposure images while being robust to inaccurate camera registration. Our method demonstrates advantageous novel-view synthesis quality over the state-of-the-art approaches for characters with challenging motions. Moreover, the proposed method is generalizable to novel combinations of training poses and can be explicitly controlled. Our method achieves such expressive and animatable character rendering all in real time, serving as a promising solution for practical applications.
We aim to render an animatable character that can be controlled by a driving input. To this end, we devise a portable data capture setup to ease the multi- view capture in open scenes. Once we finish the data capture for the moving character, we train a neural network conditioned on character poses to predict the multiplane images that explain the multi-view observations. During inference, we can render realistic characters given a driving input.
You can interpolate between two poses using the pose guided MPI.
You can also combine parts of two poses (head and hand poses) to generate novel poses. In the first example, we show the case we combine the left hand of the first pose and the right hand of the second pose. In the second example, we show the case we combine the head of the first pose and the body of the second pose.
You can extrapolate new poses and generate the novel MPI.