1.Data Colloction

2.Data Annotation

3.Dataset Format

4.Evaluation Metrics

1.Data Colloction

Equipment
LiDAR(Velodyne 128p)
- 10Hz capture frequency
- 100° Horizontal FOV, 40° Vertical FOV
- 245m maximum detection range
- <=3cm detection distance accuracy
Camera
- JPEG images in RBG format of 1920x1080 resolution
Calibration
To achieve a high quality multi-sensor dataset, we provide intrinsic and extrinsic matrices for each sensor, with the following coordinates.
- LiDAR Coordinate System
The origin of the LiDAR Coordinate System is located at the center of the LiDAR sensor, the x-axis is positive forwards, the y-axis is positive to the left, and the z-axis is positive upwards.
- Camera Coordinate System
The origin of the Camera Coordinate System is placed at the center of the lens, the x-y plane is parallel to the image plane, and the z-axis is positive forwards.
- Image Coordinate System
The Image Coordinate System is a 2D coordinate system where the origin is at the top-left of the image, and the x-axis and the y-axis are along the image width and height respectively.

2.Data Annotation

Data Sampling
We select 22325 images and 22325 point clouds from the data collected by the vehicle-side sensors.
3D Annotation
We provide the 2D and 3D bounding boxes of the obstacle objects as well as their category attribute, occlusion state, and truncature state in the annotation. Note that the 3D bounding boxes are located in the Virtual LiDAR Coordinate System. The format of the annotation is as follows.
- Type. 15 object classes, including Car, Pedestrian, Cyclist, etc. PedestrianIgnore means pedestrian objects with less than 15*15 pixels or more than 4/5 occluded. CarIgnore means car objects with less than 15*15 pixels or more than 4/5 occluded. OtherIgnore means other objects with less than 15*15 pixels or more than 4/5 occluded.

- Truncated. Interger (0, 1, 2) indicating truncated state. 0 = non-truncated, 1 = transversely truncated, 2 = longitudinally truncated.
- Occluded. Integer (0, 1, 2) indicating occlusion state. 0 = fully visible, 1 = partly occlued, 2 = largely occluded.
- Alpha. Observation angle of object, ranging [-pi, pi]
- 2D box. 2D bounding box of object in image.
- 3D box. 3D bounding box of object in Virtual LiDAR Coordinate System, including the (height, width, length, x_loc, y_loc, z_loc) of object in meters, and (rotation_y) of the angle at which an object rotates about the y-axis in Virtual LiDAR Coordinate System.

3.Dataset format

4.Evaluation Metrics

We evaluate 3D object detection performance using mean average precision (mAP) based on IoU. The mAP is computed using 3D object detector outputs, which includes size, position, and confidence of the 3D bounding box for targets.