1. Data Collection

2.Data Annotation

3.Dataset Format

4.Evaluation Metrics

1. Data Collection

Equipment
The data acquisition infrastructure is equipped with one 300-beam LiDAR and one Camera.
LiDAR(300p):
- 10Hz capture frequency
- 100° Horizontal FOV, 40° Vertical FOV
- 280m maximum detection range
- <=3cm detection distance accuracy
Camera:
- 1'' CMOS sensor of global exposure of 4096x2160 resolution
- 25Hz capture frequency
- JPEG images in RBG format of 1920x1080 resolution
Calibration
To achieve a high-quality multi-sensor dataset, we provide intrinsic and extrinsic matrices for each sensor, with the following coordinates.
- Virtual LiDAR Coordinate System
The origin of the Virtual LiDAR Coordinate System is located at the center of the LiDAR sensor, the x-y plane is parallel to the ground plane, the z-axis is positive upwards.
- Camera Coordinate System
The origin of the Camera Coordinate System is placed at the center of the lens, the x-y plane is parallel to the image plane, and the z-axis is positive forwards.
- Image Coordinate System
The Image Coordinate System is a 2D coordinate system where the origin is at the top-left of the image, and the x-axis and the y-axis are along the image width and height respectively.

2.Data Annotation

Data Sampling
We select 10084 images and 10084 point clouds from the data collected by the Infrastructure-side sensors.
3D Annotation

We provide the 2D and 3D bounding boxes of the obstacle objects as well as their category attribute, occlusion, and truncature in the annotation. Note that the 3D bounding boxes are located in the Virtual LiDAR Coordinate System. The format of the annotation is as follows.
- Type. 15 object classes, including Car, Pedestrian, Cyclist, etc. PedestrianIgnore means pedestrian objects with less than 15*15 pixels or more than 4/5 occluded. CarIgnore means car objects with less than 15*15 pixels or more than 4/5 occluded. OtherIgnore means other objects with less than 15*15 pixels or more than 4/5 occluded.

- Truncated. Interger (0, 1, 2) indicating truncated state. 0 = non-truncated, 1 = transversely truncated, 2 = longitudinally truncated.
- Occluded. Integer (0, 1, 2) indicating occlusion state. 0 = fully visible, 1 = partly occlued, 2 = largely occluded.
- Alpha. Observation angle of object, ranging [-pi, pi]
- 2D box. 2D bounding box of object in the image.
- 3D box. 3D bounding box of object in Virtual LiDAR Coordinate System, including the (height, width, length, x_loc, y_loc, z_loc) of objects in meters, and (rotation_y) of the angle at which an object rotates about the y-axis in Virtual LiDAR Coordinate System.

3. Dataset Format

4. Evaluation Metrics

We evaluate 3D object detection performance using mean average precision (mAP) based on IoU. The mAP is computed using 3D object detector include, which includes size, position, and confidence of the 3D bounding box for targets.