We attribute a single 2D image of an object (left) with depth by transporting information from a 3D shape deformation subspace learned by analyzing a network of related but different shapes (middle). For visualization, we color code the estimated depth with values increasing from red to blue (right).
Estimating Image Depth Using Shape Collections
Hao Su, Qixing Huang, Niloy J. Mitra, Yangyan Li, Leonidas Guibas
Siggraph 2014


Images, while easy to acquire, view, publish, and share, they lack critical depth information. This poses a serious bottleneck for many image manipulation, editing, and retrieval tasks. In this paper we consider the problem of adding depth to an image of an object, effectively ‘lifting’ it back to 3D, by exploiting a collection of aligned 3D models of related objects shape. Our key insight is that, even when the imaged object is not contained in the shape collection, the network of shapes implicitly characterizes a shape-specific deformation subspace that regularizes the problem and enables robust diffusion of depth information from the shape collection to the input image. We evaluate our fully automatic approach on diverse and challenging input images, validate the results against Kinect depth readings, and demonstrate several imaging applications including depth-enhanced image editing and image relighting.


We reconstruct a 3D point cloud from an image object by utilizing a collection of related shapes. In the preprocessing stage, we jointly align the input shape collection and learn structure-preserving deformation models for the shapes. Then, in the reconstruction stage, we lift a single image to 3D in three steps. The first step initializes the camera pose in the coordinate system associated with the aligned shapes and extracts a set of similar shapes. The second step performs image-image matching to build dense correspondences between the image object and the similar shapes, and generate an initial 3D point cloud. The final step jointly aligns the initial point cloud and the selected shapes by simultaneously optimizing the depth of each image pixel, the camera pose, and the shape deformations.

We have evaluated our approach on five categories of objects. This figure shows representative results in each category. For every object we show the input 2D image, the extracted similar shapes, the reconstructed point cloud and finally the ground truth Kinect scan, shown on the right column for reference.

Top: point clouds viewed from different angles. Bottom: surface reconstruction results with symmetry-based completion.

Using the inferred depth information, we can build the normal map and simulate different lighting conditions. Leftmost column is the input image and the three columns on the right are the simulated illumination. A directional light source moves from left to right (top row), or up to down (bottom row).

Simulated images by rotating cameras around the y-axis (up-direction).


Code, data:

Please visit webpage maintained by Hao.


We thank the reviewers for their comments and suggestions on the paper. This work was supported in part by NSF grants IIS 1016324 and DMS 1228304, AFOSR grant FA9550-12-1-0372, NSFC grant 61202221, the Max Plack Center for Visual Computing and Communications, Google and Motorola research awards, a gift from HTC corporation, the Marie Curie Career Integration Grant 303541, the ERC Starting Grant SmartGeometry (StG-2013-335373), and gifts from Adobe.


AUTHOR = "Hao Su and Qixing Huang and Niloy J. Mitra and Yangyan Li and Leonidas Guibas",
TITLE = "Estimating Image Depth Using Shape Collections",
JOURNAL = "Transactions on Graphics (Special issue of SIGGRAPH 2014)",
YEAR = "2014", 
numpages = {11},  

paper (43MB) paper (13MB)
back to publications
back to homepage