-
Notifications
You must be signed in to change notification settings - Fork 37
Description
I download some clips from nvidia/PhysicalAI-Autonomous-Vehicle-Cosmos-Drive-Dreams, and run Cosmos-Drive-Dreams/cosmos-drive-dreams-toolkits/render_from_rds_hq.py , I set the camera to ftheta, apply hdmap and lidar control, get a synthesized clip like bellow. After undistortion by Cosmos-Drive-Dreams/cosmos-drive-dreams-toolkits/rectify_ftheta_to_pinhole.py, there is still distortion like curved traffic signs.
ClipID="0079aad7-0fc5-4722-804d-e7c8c1b84263_570745200000_570765200000"
hdmap and lidar control:
{
"hdmap": {
"control_weight": 0.5,
"input_control": "placeholder",
"ckpt_path": "checkpoints/nvidia/Cosmos-Transfer1-7B-Sample-AV/hdmap_control.pt"
},
"lidar": {
"control_weight": 0.5,
"input_control": "placeholder",
"ckpt_path": "checkpoints/nvidia/Cosmos-Transfer1-7B-Sample-AV/lidar_control.pt"
}
}
hdmap control video:
0079aad7-0fc5-4722-804d-e7c8c1b84263_570745200000_570765200000_0.mp4
lidar control video:
0079aad7-0fc5-4722-804d-e7c8c1b84263_570745200000_570765200000_0.mp4
the caption is:
{
"Sunny": "The video shows a daytime highway scene from inside a vehicle. The road is brightly illuminated by the sun, with clear visibility of multiple lanes marked by white lines. Several vehicles ahead are visible, their brake lights glowing red. A large green road sign overhead indicates directions to various destinations such as Osnabr\u00fcck, Oldenburg, Bremen-Centrum, and Hanover. The surroundings are well-lit and open, with no pedestrians or other significant objects visible outside the vehicle. The environment appears calm, with minimal activity aside from the vehicles on the road.",
}
I follow the Cosmos-Drive-Dreams SDG Pipeline (scripts/generate_video_single_view.py) and get the bellow:
raw generation:
undistorted (rectify_ftheta_to_pinhole.py):
There are two questions:
- Does the open sourced model only support producing raw data with distortion?
- Is the distortion in the 2nd image above coming from the intrinsic parameter inaccuracy? or from the training by raw dash cam image with distortion?