Change Detection Dataset— Description of the dataset
Summary
This dataset contains the image sequences of city streets captured by a vehicle-mounted camera at two different time points. We make them publicly available for the researchers who are interested in the problem of the image-based detection of temporal changes of 3D scene structures. Although we own its copyright, you can freely use it for research purposes. We request that you cite the following paper if you publish research results utilizing these data:
Ken Sakurada, Takayuki Okatani, Koichiro Deguchi, Detecting Changes in 3D Structure of a Scene from Multi-view Images Captured by a Vehicle-mounted Camera, Proc. Computer Vision and Pattern Recognition, 2013.
@inproceedings{sakurada2013detecting,
title={Detecting changes in 3D structure of a scene from multi-view images captured by a vehicle-mounted camera},
author={Sakurada, Ken and Okatani, Takayuki and Deguchi, Koichiro},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={137--144},
year={2013}
}
Download
Examples


Description
The dataset currently contains the data of two city streets, Kamaishi and Takata. (These are the names of the cities.) Each street dataset consists of two image sequences t0 & t1, which are captured at two different times (about three months apart).
Each image sequence contains cylindrical panoramic images (5000 x 2500 pixels) along with their camera poses. The panoramic images are named as ‘panorama/*.jpg’ in the corresponding directory. These images are renedered by (Ladybug SDK 1.5 Release 7 – Windows (64-bit), ladybugRenderOffScreenImage(…, LADYBUG_PANORAMIC, …) ) and are produced by the equirectangular projection. Please refer to [1] for the details of the transformation between the image coordinates and the ladybug camera coordinates. Their camera poses are obtained by our SfM code from these panoramic images. They are stored in the text file ‘panorama/cam_detail.txt’ in the following format:
r11^1 r12^1 r13^1 t1^1
r21^1 r22^1 r23^1 t2^1
r31^1 r32^1 r33^1 t3^1
0 0 0 1
r11^2 r12^2 r13^2 t1^2
r21^2 r22^2 r23^2 t2^2
r31^2 r32^2 r33^2 t3^2
0 0 0 1...
where rij is the (i,j) component of the rotation matrix, ti is the ith entry of the translational vector, and ^k indicates they are the parameters of the k-th viewpoint. The global coordinates X is transferred to the local coordinates of k-th viewpoint by X^k = R^k X + t^k.
These camera poses are computed independently for each of t0 and t1. In order to make a comparison between t0 and t1, we need camera poses registered in a single coordinate system. These can be obtained by performing additional bundle adjustment over t0 and t1, which are stored in ‘panoramic/T0.txt’ for t0 and ‘panoramic/T1.txt’ for t1. Their format are the same as ‘cam_detail.txt’
In the dataset, there are also perspective images (640 x 480 pixels) cropped from these panoramic images. (The results shown in our CVPR paper were obtained by using some of them.) There are two image sets for each street of each time; one is a set of images looking at the left side of the street and the other is those looking at the right side. Thus, there are four image sets in total for each street, i.e., t0-left, t0-right, t1-left, and t1-right; you are to compare t0-* and t1-*.
The internal camera parameters are identical for all of these perspective images and are given in ‘intrinsic_param.txt’ in the following format:
f 0 cx
0 f cy
0 0 1.
The external camera parameters for the four images sets (t0-left, t0-right, t1-left, t1-right) are stored in ‘t0/perspective_left/T0_left.txt,’ ‘t0/perspective_right/T0_right.txt,’ ‘t1/perspective_left/T1_left.txt,’ and ‘t1/perspective_right/T1_right.txt’ in the same format as ‘cam_detail.txt.’ They were computed from the camera poses T0 and T1 for the panoramic images in the following way:
T0_l^k = T_l T0^k T1_l^k = T_l T1^k T0_r^k = T_r T0^k T1_r^k = T_r T1^k
where T_l and T_r are the transformation matrices from the ladybug camera coordinates to the left and right perspective camera coordinates, respectively, and are given by
T_l =
1 0 0 0
0 0 -1 0
0 1 0 0
0 0 0 1
and
T_r =
-1 0 0 0
0 0 -1 0
0 -1 0 0
0 0 0 1
Ground truth
Some of the perspective images have ground truths of temporal changes, which are manually obtained by ourselves. They are stored in “gt_mask_*.jpg.”
Directory structure
Change_detection_dataset
|-README.txt
|-intrinsic_param.txt
|--Kamaishi
| |--t0
| | |--panorama // *.jpg, cam_detail.txt, T0.txt
| | |--perspective_left // *.jpg, T0_left.txt
| | --perspective_right // *.jpg, T0_right.txt
| |--t1
| | |--panorama // *.jpg, cam_detail.txt, T1.txt
| | |--perspective_left // *.jpg, T1_left.txt
| | --perspective_right // *.jpg, T1_right.txt
| |
| | --ground_truth // gt_*.jpg, gt_mask_*.jpg
|--Takata
|--t0
| |--panorama // *.jpg, cam_detail.txt, T0.txt
| |--perspective_left // *.jpg, T0_left.txt
| --perspective_right // *.jpg, T0_right.txt
|--t1
| |--panorama // *.jpg, cam_detail.txt, T1.txt
| |--perspective_left // *.jpg, T1_left.txt
| --perspective_right // *.jpg, T1_right.txt
|
| --ground_truth // gt_*.jpg, gt_mask_*.jpg
We welcome your questions, comments and suggestions. Please send them
to sakurada@vision.is.tohoku.ac.jp or okatani@vision.is.tohoku.ac.jp
Ken Sakurada and Takayuki Okatani
Tohoku University, Japan
June 2013
Reference
[1] Torii Akihiko, Michal Havlena, and Tomas Pajdla, “From google street view to 3d city models.” ICCV Workshops, 2009.