Code now available as rxkinfu.

KinectFusion is an impressive new algorithm for real-time dense 3D mapping using the Kinect [IKH+11] [NDI+11]. It is geared towards games and augmented reality, but could also be of great use for robot perception. However, the algorithm is currently limited to a relatively small volume fixed in the world at start up (typically a ~3m cube). This limits applications for perception.

We are developing moving volume KinectFusion with additional algorithms that allow the camera to roam freely. Our aim is to use this in perception for rough-terrain robot locomotion—a walking robot needs to know about the ground under its feet, but the legs and feet themselves would obstruct downward facing cameras. The system would also be useful in other applications including free-roaming games and awareness aids for hazardous environments or the visually impaired.


Moving Volume KinectFusion reconstruction from data collected walking on natural rocks with a hand-held Kinect. More videos below.

Our approach allows the algorithm to handle a volume that moves arbitrarily on-line. The figure below shows how a volume remapping is applied to hold the sensor pose fixed relative to the volume. Raycast images before and after a remapping show a third step coming into view as the volume moves forward. A reconstruction of the volume and camera poses shows that the volume-to-volume transform is calculated to maintain the camera at the rear center of the volume.

We based our implementation on the open-source kinfu code that has recently been added to the Point Cloud Library (PCL) from Willow Garage, and we have submitted our code for inclusion there as well.

Moving volume KinectFusion both tracks global camera motion and simultaneously builds a spatial map of the local surroundings. However, this is not a true SLAM algorithm as it does not explicitly close large-scale loops and will inevitably incur drift over time. Rather, it can be considered a 6D visual odometry approach which tracks relative camera motion. Of course the significant additional benefit beyond visual odometry alone is that a map of local environment surfaces is also always available.

Remapping—sometimes called reslicing for the 3D case—has been studied for medical images [HSS+95], but speed is often sacrificed for accuracy. Efforts have been made to improve the speed [FR04], but generally reslicing has not been done in real time. Here we require a fast parallel algorithm which is tuned for common-case KinectFusion data.

Our CUDA algorithm is hybridized in two ways. First, if the rotation component of the transform is smaller than a threshold we use a fast and exact memory shift algorithm. Otherwise we use a more traditional resampling based on trilinear interpolation. Second, during resampling we take advantage of the fact that in the common case much of the volume is either uninitialized or marked “empty”: we do a nearest-neighbor lookup first, and only if that is within the truncation band do we continue with a more expensive interpolation.

Using a battery-powered Kinect we collected several indoor and 18 rocky terrain datasets comprising an estimated 662m path length. (Though the Kinect cannot cope with direct sunlight it does work outdoors on a reasonably overcast day.) The richness of 3D depth features makes our approach work well on rocky terrain—no camera tracking failures were incurred, and reconstructed surfaces appear to be high quality. In our BMVC 2012 paper we present performance and tracking accuracy measurements for our algorithm on 6 datasets, comparing it with the original kinfu implementation and with ground truth and reference results for Engelhard et al’s RGB-D SLAM [EEH+11] where applicable.


Hill ogv|mp4|webm

A hike up a rocky slope (~25.1m). A subset of the moving volumes are illustrated below.

Stairs ogv|mp4|webm

A climb up two staircases in a hallway (~11.3m). A subset of the moving volumes are illustrated below.

Hallway ogv|mp4|webm

A stroll forward through a hallway (~11.2m). A subset of the moving volumes are illustrated below (only the camera path and volume frame bases are shown here to reduce clutter).

Mobile Data Collection

The Kinect is normally constrained by a tether to wall power. Though some projects have used it on mobile robots, we are not aware of others who have adapted it for free-roaming handheld use. To this end we developed a simple system where the Kinect is powered by a lithium polymer battery, a tablet is attached to its back to provide a heads-up display, and a closed-lid laptop computer carried in a shoulder bag runs data collection software that stores all RGB and depth images to an SSD. This setup allows one person to conveniently hold and aim the Kinect, simultaneously control and monitor the data capture with the tablet, and walk freely without the constraint of any power cord.

Our system allows the collection of data in environments and scenarios that have usually been beyond the reach of the Kinect. As our group is especially interested in bipedal locomotion on rocky terrain, we are particularly interested to capture data outdoors. While it is generally understood that the Kinect does not work well in outdoor sunlight, we have found that it works fine on a moderately overcast day.

Related Publications

Henry Roth, Marsette Vona. Moving Volume KinectFusion, British Machine Vision Conference (BMVC), September, 2012.

Research Context

Two other groups are also now developing approaches to translate the KinectFusion volume [HF12] [WDK+12]. A key distinction of our method is the ability to rotate the volume in addition to translation. Since the volume is rectilinear this can be useful to control its orientation, e.g. to maximize overlap of the camera frustum or to align the volume with task-relevant directions, such as the average ground surface normal in locomotion.

Online terrain modeling for robot locomotion has been studied in the past for wheeled robots for applications including planetary exploration [KK92] [GMM02] [OMW+07], and more recently for some legged robots [GFF08] [Howard08] [CTS+09] [WMB+10] [NCK11] [SHG12]. In most cases the map is based on a 2.5D grid or height map, and is used for traversability analysis but not detailed reasoning about foot contact on irregular 3D surfaces.


IKH+11 Shahram Izadi, David Kim, Otmar Hilliges, David Molyneaux, Richard Newcombe, Pushmeet Kohli, Jamie Shotton, Steve Hodges, Dustin Freeman, Andrew Davison, and Andrew Fitzgibbon. KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera. ACM symposium on User interface software and technology (UIST), 2011.
NDI+11 Richard A. Newcombe, Andrew J. Davison, Shahram Izadi, Pushmeet Kohli, Otmar Hilliges, Jamie Shotton, David Molyneaux, Steve Hodges, David Kim, and Andrew Fitzgibbon. KinectFusion: Real-time dense surface mapping and tracking. IEEE International Symposium on Mixed and Augmented Reality (ISMAR), 2011.
HSS+95 Joseph V. Hajnal, Nadeem Saeed, Elaine J. Soar, Angela Oatridge, Ian R. Young, and Graeme M. Bydder. A Registration and Interpolation Procedure for Subvoxel Matching of Serially Acquired MR Images. Journal of Computer Assisted Tomography, vol. 19, no. 2, 1995.
FR04 J. Fischer and A. del Río. A fast method for applying rigid transformations to volume data. International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision (WSCG), 2004.
EEH+11 Nikolas Engelhard, Felix Endres, Juergen Hess, Juergen Sturm, and Wolfram Burgard. Real-time 3D visual SLAM with a hand-held RGB-D camera. RGB-D Workshop on 3D Perception in Robotics at the European Robotics Forum, 2011.
HF12 Francisco Heredia and Raphael Favier. Kinfu Large Scale. module in PCL, 2012.
WDK+12 Thomas Whelan, John McDonald, Michael Kaess, Maurice Fallon, Hordur Hohannson, John J. Leonard. Kintinuous: Spatially Extended KinectFusion. RGB-D Workshop at RSS, 2012.
KK92 In So Kweon and Takeo Kanade. High-Resolution Terrain Map from Multiple Sensor Data. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, no. 2, 1992.
GMM02 Steven B. Goldberg, Mark W. Maimone, and Larry Matthies. Stereo Vision and Rover Navigation Software for Planetary Exploration. IEEE Aerospace Conference, 2002.
OMW+07 Visual terrain mapping for Mars exploration. Clark F. Olson, Larry H. Matthies, John R. Wright, Rongxing Li, Kaichang Di. Computer Vision and Image Understanding, vol. 105, 2007.
Howard08 Andrew Howard. Real-time stereo visual odometry for autonomous ground vehicles. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2008.
WMB+10 David Wooden, Matthew Malchano, Kevin Blankespoor, Andrew Howard, Alfred A. Rizzi, and Marc Raibert. Autonomous Navigation for BigDog. IEEE International Conference on Robotics and Automation (ICRA), 2010.
SHG12 Annett Stelzer, Heiko Hirschmüller, and Martin Görner. Stereo-vision-based navigation of a six-legged walking robot in unknown rough terrain. The International Journal of Robotics Research, 2012.
CTS+09 Joel Chestnutt, Yutaka Takaoka, Keisuke Suga, Koichi Nishiwaki, James Kuffner, Satoshi Kagami. Biped Navigation in Rough Environments using On-board Sensing. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2009.
NCK11 Koichi Nishiwaki, Joel Chestnutt, and Satoshi Kagami. Autonomous Navigation of a Humanoid Robot on Unknown Rough Terrain. International Symposium on Robotics Research (ISRR), 2011.
GFF08 Jens-Steffen Gutmann, Masaki Fukuchi, and Masahiro Fujita. 3D Perception and Environment Map Generation for Humanoid Robot Navigation. International Journal of Robotics Research, vol. 27, issue 10, 2008.