ml-hypersim - Hypersim:用于整体室内场景理解的真实感合成数据集

Created at: 2020-11-17 05:46:50
Language: Python
License: NOASSERTION

Hypersim 数据集

Hypersim 数据集

对于许多基本的场景理解任务,很难或不可能从真实图像中获得每像素地面实况标签。我们通过引入 Hypersim 来应对这一挑战,Hypersim 是一种用于整体室内场景理解的逼真合成数据集。为了创建我们的数据集,我们利用了由专业艺术家创建的大型合成场景存储库,我们生成了 461 个室内场景的 77,400 张图像,带有详细的每像素标签和相应的地面实况几何。我们的数据集:(1) 完全依赖于公开可用的 3D 资产;(2) 包括每个场景的完整场景几何、材质信息和光照信息;(3) 包括每个图像的密集每像素语义实例分割和完整的相机信息;(4) 将每个图像分解为漫反射、漫射照明,

Hypersim 数据集是根据Creative Commons Attribution-ShareAlike 3.0 Unported License 获得许可的

 

引文

如果你发现 Hypersim 数据集或 Hypersim 工具包对你的研究有用,请引用以下论文

@inproceedings{roberts:2021,
    author    = {Mike Roberts AND Jason Ramapuram AND Anurag Ranjan AND Atulit Kumar AND
                 Miguel Angel Bautista AND Nathan Paczan AND Russ Webb AND Joshua M. Susskind},
    title     = {{Hypersim}: {A} Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding},
    booktitle = {International Conference on Computer Vision (ICCV) 2021},
    year      = {2021}
}

 

下载 Hypersim 数据集

要获取我们的图像数据集,你可以运行以下下载脚本。在Windows中,你需要修改脚本,以便它不依赖于

curl
unzip
命令行实用程序。

python code/python/tools/dataset_download_images.py --downloads_dir /Volumes/portable_hard_drive/downloads --decompress_dir /Volumes/portable_hard_drive/evermotion_dataset/scenes

请注意,我们的数据集大约为 1.9TB。我们已将数据集划分为数百个单独的 ZIP 文件,其中每个 ZIP 文件的大小在 1GB 到 20GB 之间。我们的下载脚本包含每个 ZIP 文件的 URL。Thomas Germer慷慨地贡献了一个替代下载脚本,可用于从每个 ZIP 存档中下载文件的子集。

另请注意,我们从公开发布中手动排除了包含人物和突出徽标的图像,因此我们的公开发布包含 74,619 张图像,而不是 77,400 张图像。我们列出了我们手动排除的所有图像

ml-hypersim/evermotion_dataset/analysis/metadata_images.csv

要获取每个场景的真实三角形网格,你必须在此处购买资产文件。

 

使用 Hypersim 数据集

Hypersim 数据集由一组合成场景组成。每个场景都有一个形式名称,

ai_VVV_NNN
其中
VVV
是卷号,
NNN
是卷内的场景号。对于每个场景,有一个或多个名为 {
cam_00
,
cam_01
, ...} 的相机轨迹。每个相机轨迹都有一个或多个名为 {
frame.0000
,
frame.0001
, ...} 的图像。每个场景都根据以下数据布局存储在自己的 ZIP 文件中:

ai_VVV_NNN
├── _detail
│   ├── metadata_cameras.csv                     # list of all the camera trajectories for this scene
│   ├── metadata_node_strings.csv                # all human-readable strings in the definition of each V-Ray node
│   ├── metadata_nodes.csv                       # establishes a correspondence between the object names in an exported OBJ file, and the V-Ray node IDs that are stored in our render_entity_id images
│   ├── metadata_scene.csv                       # includes the scale factor to convert asset units into meters
│   ├── cam_XX                                   # camera trajectory information
│   │   ├── camera_keyframe_orientations.hdf5    # camera orientations
│   │   └── camera_keyframe_positions.hdf5       # camera positions (in asset coordinates)
│   ├── ...
│   └── mesh                                                                            # mesh information
│       ├── mesh_objects_si.hdf5                                                        # NYU40 semantic label for each object ID (available in our public code repository)
│       ├── mesh_objects_sii.hdf5                                                       # semantic instance ID for each object ID (available in our public code repository)
│       ├── metadata_objects.csv                                                        # object name for each object ID (available in our public code repository)
│       ├── metadata_scene_annotation_tool.log                                          # log of the time spent annotating each scene (available in our public code repository)
│       ├── metadata_semantic_instance_bounding_box_object_aligned_2d_extents.hdf5      # length (in asset units) of each dimension of the 3D bounding for each semantic instance ID
│       ├── metadata_semantic_instance_bounding_box_object_aligned_2d_orientations.hdf5 # orientation of the 3D bounding box for each semantic instance ID
│       └── metadata_semantic_instance_bounding_box_object_aligned_2d_positions.hdf5    # position (in asset coordinates) of the 3D bounding box for each semantic instance ID
└── images
    ├── scene_cam_XX_final_hdf5                  # lossless HDR image data that requires accurate shading
    │   ├── frame.IIII.color.hdf5                # color image before any tone mapping has been applied
    │   ├── frame.IIII.diffuse_illumination.hdf5 # diffuse illumination
    │   ├── frame.IIII.diffuse_reflectance.hdf5  # diffuse reflectance (many authors refer to this modality as "albedo")
    │   ├── frame.IIII.residual.hdf5             # non-diffuse residual
    │   └── ...
    ├── scene_cam_XX_final_preview               # preview images
    |   └── ...
    ├── scene_cam_XX_geometry_hdf5               # lossless HDR image data that does not require accurate shading
    │   ├── frame.IIII.depth_meters.hdf5         # Euclidean distances (in meters) to the optical center of the camera
    │   ├── frame.IIII.position.hdf5             # world-space positions (in asset coordinates)
    │   ├── frame.IIII.normal_cam.hdf5           # surface normals in camera-space (ignores bump mapping)
    │   ├── frame.IIII.normal_world.hdf5         # surface normals in world-space (ignores bump mapping)
    │   ├── frame.IIII.normal_bump_cam.hdf5.     # surface normals in camera-space (takes bump mapping into account)
    │   ├── frame.IIII.normal_bump_world.hdf5    # surface normals in world-space (takes bump mapping into account)
    │   ├── frame.IIII.render_entity_id.hdf5     # fine-grained segmentation where each V-Ray node has a unique ID
    │   ├── frame.IIII.semantic.hdf5             # NYU40 semantic labels
    │   ├── frame.IIII.semantic_instance.hdf5    # semantic instance IDs
    │   ├── frame.IIII.tex_coord.hdf5            # texture coordinates
    │   └── ...
    ├── scene_cam_XX_geometry_preview            # preview images
    |   └── ...
    └── ...

数据集拆分

我们在

ml-hypersim/evermotion_dataset/analysis/metadata_images_split_scene_v1.csv
. 我们将此拆分称为数据集的v1拆分。我们通过以场景的粒度而不是图像或相机轨迹对数据进行随机分区来生成这种分割,以最大限度地减少非常相似的图像出现在不同分区中的可能性。为了最大限度地提高可重复性,我们只在我们的拆分中包含公开发布的图像。

在 Hypersim 数据集中,有少量跨场景的资产重用。这种资产重用很难通过分析原始场景资产中的元数据来检测,但在手动浏览我们渲染的图像时很明显。在生成拆分时,我们不会尝试解决此问题。因此,我们的训练图像中的单个对象偶尔也会出现在我们的验证和测想像中。

坐标约定

除非另有明确说明,否则我们将位置存储在资产坐标中(以及资产单位中的长度)。通过资产坐标,我们指的是世界空间坐标由艺术家定义的系统,当他们最初创建的资产。一般来说,资产单位与米不同。要将以资产单位表示的距离转换为以米表示的距离,请使用 中

meters_per_asset_unit
定义的比例因子
ai_VVV_NNN/_detail/metadata_scene.csv

我们将方向存储为 3x3 旋转矩阵,将点从对象空间映射到世界空间,假设点存储为 [x,y,z] 列向量。我们存储相机方向的惯例是相机的正 x 轴指向右侧,正 y 轴指向上方,正 z 轴指向远离相机正在观察的位置。

无损高动态范围图像

每个相机轨迹的图像都存储为无损高动态范围 HDF5 文件,

ai_VVV_NNN/images/scene_cam_XX_final_hdf5
并保存在和
ai_VVV_NNN/images/scene_cam_XX_geometry_hdf5
.

我们的

depth_meters
图像包含到相机光学中心的欧几里得距离(以米为单位)(也许这些图像的更好名称是
distance_from_camera_meters
)。换句话说,这些图像不包含平面深度值,即相机空间中的负 z 坐标。Simon Niklaus慷慨地贡献了一个独立的代码片段,用于将我们的
depth_meters
图像转换为平面深度图像。因为我们的
depth_meters
图像包含以米为单位的距离,但是我们的相机位置存储在资产坐标中,所以
depth_meters
在执行涉及相机位置(或存储在资产坐标中的任何其他位置,例如边界框)的计算之前,你需要将我们的图像转换为资产单位职位)。

我们的

position
图像包含在资产坐标中指定的世界空间位置。

我们的

color
diffuse_illumination
diffuse_reflectance
residual
图像以非常低的误差遵循以下等式:

color == (diffuse_reflectance * diffuse_illumination) + residual

请注意,我们的

color
diffuse_illumination
diffuse_reflectance
residual
图像没有应用任何色调映射。为了将这些图像用于下游学习任务,我们建议将你自己的色调映射运算符应用于图像。我们在
ml-hypersim/code/python/tools/scene_generate_images_tonemap.py
.

有损预览图像

我们在

ai_VVV_NNN/images/scene_cam_XX_final_preview
和 中包含有损预览图像
ai_VVV_NNN/images/scene_cam_XX_geometry_preview
。我们不建议将这些图像用于下游学习任务,但它们对于调试和手动浏览数据很有用。

相机轨迹

每个相机轨迹都存储为

ai_VVV_NNN/_detail/cam_XX
以下文件中的密集相机姿势列表。

camera_keyframe_orientations.hdf5
包含一个 Nx3x3 的相机方向数组,其中 N 是轨迹中的帧数,每个方向表示为一个 3x3 旋转矩阵,该矩阵将点从相机空间映射到世界空间,假设点存储为 [x, y,z] 列向量。Hypersim 数据集中的惯例是相机的正 x 轴指向右侧,正 y 轴指向上方,正 z 轴指向远离相机正在观察的位置。

camera_keyframe_positions.hdf5
包含一个 Nx3 的相机位置数组,其中 N 是轨迹中的帧数,每个位置以 [x,y,z] 顺序存储。这些位置在资产坐标中指定。

我们图像的相机内在参数(即等距柱状针孔相机、60 度水平视野、方形像素)在 中全局定义

ml-hypersim/evermotion_dataset/_vray_user_params.py

我们建议浏览

ml-hypersim/code/python/tools/scene_generate_images_bounding_box.py
以更好地了解我们的相机姿势约定。在此文件中,我们生成了一个图像,该图像具有覆盖在先前渲染图像之上的每个实例 3D 边界框。此过程包括加载先前渲染的图像、为该图像加载适当的相机姿势、形成适当的投影矩阵以及将每个边界框的世界空间角投影到图像中。

3D 边界框

我们为

ai_VVV_NNN/_detail/mesh
. 我们将每个边界框表示为一个位置、一个旋转矩阵和每个边界框维度的长度。我们将此信息存储在以下文件中。

metadata_semantic_instance_bounding_box_object_aligned_2d_extents.hdf5
包含一个长度为 Nx3 的数组,其中 N 是语义实例的数量,每行表示以 [x,y,z] 顺序存储的每个边界框维度的长度。这些长度以资产单位指定。

metadata_semantic_instance_bounding_box_object_aligned_2d_orientations.hdf5
包含一个 Nx3x3 方向数组,其中 N 是语义实例的数量,每个方向表示为一个 3x3 旋转矩阵,该矩阵将点从对象空间映射到世界空间,假设点存储为 [x,y,z ] 列向量。

metadata_semantic_instance_bounding_box_object_aligned_2d_positions.hdf5
包含一个 Nx3 的边界框中心位置数组,其中 N 是语义实例的数量,每个位置以 [x,y,z] 顺序存储。这些位置在资产坐标中指定。

我们根据以下算法计算每个边界框的旋转矩阵。我们总是将旋转矩阵的正 z 轴设置为指向上方,即与世界空间重力矢量对齐。然后我们在世界空间 xy 平面中计算一个 2D 最小面积边界框。一旦我们计算了最小面积的边界框,对于旋转矩阵的正 x 轴,我们有 4 种可能的选择。为了做出这个选择,我们考虑从边界框的几何中心到用于计算边界框的点的质心的向量。我们选择与这个向量最接近的方向(在我们的 4 个可能的选择中)作为我们旋转矩阵的正 x 轴。最后,我们将正 y 轴设置为在世界空间 xy 平面中旋转 +90 度的正 x 轴(即,所以我们的旋转矩阵的行列式为 1)。该算法鼓励具有语义相似方向的相似对象将被分配相似的旋转矩阵(即,它们的旋转矩阵的差异将具有小的矩阵范数)。

我们的代码可用于计算其他类型的边界框(例如,在世界空间中轴对齐、最小体积),但我们的公开版本中不包括这些其他类型的边界框。

我们建议浏览

ml-hypersim/code/python/tools/scene_generate_images_bounding_box.py
以更好地了解我们的边界框约定。在此文件中,我们生成了一个图像,该图像具有覆盖在先前渲染图像之上的每个实例 3D 边界框。此过程包括加载先前渲染的图像,加载该图像的适当边界框,并将每个边界框的世界空间角投影到图像中。

网格注释

我们将网格注释包含在

ml-hypersim/evermotion_dataset/scenes/ai_VVV_NNN/_detail/mesh
. 每个场景的导出 OBJ 文件(可以通过购买原始场景资产获得)将每个场景划分为低级“对象”的平面列表。我们手动将这些低级对象分组为语义上有意义的实例,并使用我们的自定义场景注释工具为每个实例分配一个 NYU40 语义标签。我们将网格注释信息存储在以下文件中。

mesh_objects_si.hdf5
包含一个长度为 N 的数组,其中 N 是导出的 OBJ 文件中低级对象的数量,并且
mesh_objects_si[i]
是带有
object_id == i
.

mesh_objects_sii.hdf5
包含一个长度为 N 的数组,其中 N 是导出的 OBJ 文件中低级对象的数量,并且
mesh_objects_sii[i]
是具有
object_id == i
.

metadata_objects.csv
包含 N 个文本条目,其中 N 是导出的 OBJ 文件中低级对象的数量,并且
metadata_objects[i]
是带有
object_id == i
. 此文件建立在导出的OBJ文件的对象名称,并用作索引对象ID之间的对应关系
mesh_objects_si.hdf5
mesh_objects_sii.hdf5

metadata_scene_annotation_tool.log
包含注释每个场景所花费时间的日志。

渲染成本

我们将渲染数据集中每个图像的成本包含在

ml-hypersim/evermotion_dataset/analysis/metadata_rendering_tasks.csv
. 我们包含此渲染元数据,因此可以在下游应用程序中联合分析每个图像的边际价值和边际成本。

在我们的管道中,我们将每个图像的渲染分为 3 次。每个相机轨迹中每个图像的每个渲染通道对应一个特定的渲染“任务”,并且

metadata_rendering_tasks.csv
每个任务都指定了成本。为了计算在场景
frame.IIII
中的相机轨迹
cam_XX
中渲染图像的总成本
ai_VVV_NNN
,我们将行的
vray_cost_dollars
cloud_cost_dollars
列相加,其中
job_name is in {ai_VVV_NNN@scene_cam_XX_geometry, ai_VVV_NNN@scene_cam_XX_pre, ai_VVV_NNN@scene_cam_XX_final} and task_id == IIII

 

Hypersim 工具包

Hypersim 工具包是一组用于从 V-Ray 场景生成逼真合成数据集的工具。通过建立在 V-Ray 之上,使用 Hypersim Toolkit 生成的数据集可以利用高级渲染效果(例如,滚动快门、运动和散焦模糊、色差)以及来自在线市场的大量高质量 3D 内容。

Hypersim 工具包由在两个不同抽象层次上运行的工具组成。该Hypersim低级别的工具包涉及操纵个别的V-Ray场景文件。该Hypersim高级工具包涉及操纵场景的集合。你可以使用 Hypersim Low-Level Toolkit 输出带有丰富注释的地面实况标签,以编程方式指定相机轨迹和自定义镜头畸变模型,并以编程方式将几何体插入场景中。你可以使用 Hypersim 高级工具包生成偏向场景显着部分的无碰撞相机轨迹,并以交互方式将语义标签应用于场景。

 

免责声明

该软件依赖于几个开源项目。一些依赖项目具有在 GPL 下许可的部分,但该软件不依赖于那些 GPL 许可的部分。GPL 许可的部分可以从那些依赖项目的构建中省略。

V-Ray Standalone 和 V-Ray AppSDK 可在此处根据各自的条款获得。本软件的作者不对第三方网站的内容负责。

 

安装必备应用程序、工具和库

Anaconda Python 快速入门

如果你使用的是 Anaconda,则可以使用我们的

requirements.txt
文件安装所有必需的 Python 库。

conda create --name hypersim-env --file requirements.txt
conda activate hypersim-env

可选的 Python 库(见下文)可以单独安装。例如,

pip install mayavi
conda install -c conda-forge opencv
conda install -c anaconda pillow

Hypersim 低级工具包

Hypersim High-Level Toolkit

Optional components

The following components are optional, so you only need to install these prerequisites if you intend to use a particular component.

Configuring the Hypersim Python tools for your system

You need to rename

ml-hypersim/code/python/_system_config.py.example -> _system_config.py
, and modify the paths contained in this file for your system.

Installing V-Ray Standalone and the V-Ray AppSDK

Make sure the

bin
directory from V-Ray Standalone is in your
PATH
environment variable. Also make sure that the
bin
directory from the V-Ray AppSDK is in your
DYLD_LIBRARY_PATH
environment variable. For example, I add the following to my
~/.bash_profile
file.

export PATH=$PATH:/Applications/ChaosGroup/V-Ray/Standalone_for_mavericks_x64/bin
export DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH:/Applications/ChaosGroup/V-Ray/AppSDK/bin

Manually copy

vray.so
from the AppSDK directory so it is visible to your Python distribution.

Manually copy the following files and subdirectories from the AppSDK

bin
directory to the
ml-hypersim/code/python/tools
directory. For example,

cp /Applications/ChaosGroup/V-Ray/AppSDK/bin/libcgauth.dylib          /Users/mike/code/github/ml-hypersim/code/python/tools
cp /Applications/ChaosGroup/V-Ray/AppSDK/bin/libvray.dylib            /Users/mike/code/github/ml-hypersim/code/python/tools
cp /Applications/ChaosGroup/V-Ray/AppSDK/bin/libvrayopenimageio.dylib /Users/mike/code/github/ml-hypersim/code/python/tools
cp /Applications/ChaosGroup/V-Ray/AppSDK/bin/libvrayosl.dylib         /Users/mike/code/github/ml-hypersim/code/python/tools
cp /Applications/ChaosGroup/V-Ray/AppSDK/bin/libVRaySDKLibrary.dylib  /Users/mike/code/github/ml-hypersim/code/python/tools
cp -a /Applications/ChaosGroup/V-Ray/AppSDK/bin/plugins               /Users/mike/code/github/ml-hypersim/code/python/tools

You can verify that the V-Ray AppSDK is installed correctly by executing the following command-line tool.

python code/python/tools/check_vray_appsdk_install.py

If the V-Ray AppSDK is installed correctly, this tool will print out the following message.

[HYPERSIM: CHECK_VRAY_APPSDK_INSTALL] The V-Ray AppSDK is configured correctly on your system.

Building the Hypersim C++ tools

You need to rename

ml-hypersim/code/cpp/system_config.inc.example -> system_config.inc
, and modify the paths contained in this file for your system. Then you need to build the Hypersim C++ tools. The easiest way to do this is to use the top-level makefile in
ml-hypersim/code/cpp/tools
.

cd code/cpp/tools
make

If you intend to use the Hypersim Scene Annotation Tool, you need to build it separately.

cd code/cpp/tools/scene_annotation_tool
make

If you intend to compute bounding boxes around objects, you need to build the following tool separately.

cd code/cpp/tools/generate_oriented_bounding_boxes
make

 

Using the Hypersim Toolkit

The Hypersim Low-Level Toolkit consists of the following Python command-line tools.

  • ml-hypersim/code/python/tools/generate_*.py
  • ml-hypersim/code/python/tools/modify_vrscene_*.py

The Hypersim High-Level Toolkit consists of the following Python command-line tools.

  • ml-hypersim/code/python/tools/dataset_*.py
  • ml-hypersim/code/python/tools/scene_*.py
  • ml-hypersim/code/python/tools/visualize_*.py

The Hypersim High-Level Toolkit also includes the Hypersim Scene Annotation Tool executable, which is located in the

ml-hypersim/code/cpp/bin
directory, and can be launched from the command-line as follows.

cd code/cpp/bin
./scene_annotation_tool

The following tutorial examples demonstrate the functionality in the Hypersim Toolkit.

  • 00_empty_scene
    In this tutorial example, we use the Hypersim Low-Level Toolkit to add a camera trajectory and a collection of textured quads to a V-Ray scene.

  • 01_marketplace_dataset
    In this tutorial example, we use the Hypersim High-Level Toolkit to export and manipulate a scene downloaded from a content marketplace. We generate a collection of richly annotated ground truth images based on a random walk camera trajectory through the scene.

 

Generating the full Hypersim Dataset

We recommend completing the

00_empty_scene
and
01_marketplace_dataset
tutorial examples before attempting to generate the full Hypersim Dataset.

Downloading scenes

In order to generate the full Hypersim Dataset, we use Evermotion Archinteriors Volumes 1-55 excluding 20,25,40,49. All the Evermotion Archinteriors volumes are available for purchase here.

You need to create a

downloads
directory, and manually download the Evermotion Archinteriors RAR and 7z archives into it. Almost all the archives have clear filenames that include the volume number and scene number, and do not need to be renamed to avoid confusion. The exception to this rule is Evermotion Archinteriors Volume 11, whose archives are named {
01.rar
,
02.rar
, ...}. You need to manually rename these archives to {
AI11_01.rar
,
AI11_02.rar
, ...} in order to match the dataset configuration file (
_dataset_config.py
) we provide.

Running our pipeline on multiple operating systems

Some of our pipeline steps require Windows, and others require macOS or Linux. It is therefore desriable to specify an output directory for the various steps of our pipeline that is visible to both operating systems. Ideally, you would specify an output directory on a fast network drive with lots of storage space. However, our pipeline generates a lot of intermediate data, and disk I/O can become a significant bottleneck, even on relatively fast network drives. We therefore recommend the quick-and-dirty solution of generating the Hypersim Dataset on portable hard drives that you can read and write from Windows and macOS (or Linux).

You need to make sure that the absolute path to the dataset on Windows is consistent (i.e., always has the same drive letter) when executing the Windows-only steps of our pipeline. We recommend making a note of the absolute Windows path to the dataset, because you will need to supply it whenever a subsequent pipeline step requires the

dataset_dir_when_rendering
argument.

If you are generating data on portable hard drives, we recommend running our pipeline in batches of 10 volumes at a time (i.e., roughly 100 scenes at a time), and storing each batch on its own 4TB drive. If you attempt to run our pipeline in batches that are too large, the pipeline will eventually generate too much intermediate data, and you will run out of storage space. In our experience, the most straightforward way to run our pipeline in batches is to include the optional

scene_names
argument when executing each step of the pipeline.

The

scene_names
argument works in the following way. We give each scene in our dataset a unique name,
ai_VVV_NNN
, where
VVV
is the volume number, and
NNN
is the scene number within the volume (e.g., the name
ai_001_002
refers to Volume 1 Scene 2). Each step of our pipeline can process a particular scene (or scenes) by specifying the
scene_names
argument, which accepts wildcard expressions. For example,
ai_001_001
specifies Volume 1 Scene 1,
ai_001_*
specifies all scenes from Volume 1,
ai_00*
specifies all scenes from Volumes 1-9,
ai_01*
specifies all scenes from Volumes 10-19, and so on. We include the argument
--scene_names ai_00*
in our instructions below.

Handling scenes and camera trajectories that have been manually excluded

When preparing the Hypersim Dataset, we chose to manually exclude some scenes and automatically generated camera trajectories. Most of the scenes we excluded are simply commented out in our

_dataset_config.py
file, and therefore our pipeline never processes these scenes. However, for some scenes, we needed to run some of our pipeline in order to decide to exclude them. These scenes are un-commmented in our
dataset_config.py
file, and therefore our pipeline will process these scenes by default. There is no harm in running our pipeline for these scenes, but it is possible to save a bit of time and money by not rendering images for these manually excluded scenes and camera trajectories.

The camera trajectories we manually excluded from our dataset are listed in

ml-hypersim/evermotion_dataset/analysis/metadata_camera_trajectories.csv
. If the
Scene type
column is listed as
OUTSIDE VIEWING AREA (BAD INITIALIZATION)
or
OUTSIDE VIEWING AREA (BAD TRAJECTORY)
, then we consider that trajectory to be manually excluded from our dataset. If all the camera trajectories for a scene have been manually excluded, then we consider the scene to be manually excluded. We recommend excluding these scenes and camera trajectories in downstream learning applications for consistency with other publications, and to obtain the cleanest possible training data.

Using our mesh annotations

我们每个场景的网格注释在 处检查

ml-hypersim/evermotion_dataset/scenes/ai_VVV_NNN/_detail/mesh
,其中
VVV
是体积编号,
NNN
是体积内的场景编号。因此,你可以使用我们的自动管道生成实例级语义分割图像,而无需手动注释任何场景。

运行完整的管道

为了处理 Hypersim 数据集中的第一批场景(第 1-9 卷),我们执行以下管道步骤。我们通过重复执行这些步骤来处理后续批次,替换上述

scene_names
参数。有关
01_marketplace_dataset
每个管道步骤的更多详细信息,请参阅教程示例。

dataset_dir_when_rendering
执行这些管道步骤时,你必须替换自己的路径,并且必须是绝对路径。你还必须替换你自己的
dataset_dir
and
downloads_dir
,但这些参数不需要是绝对路径。你必须等到每个渲染过程完成,并且所有数据都已从云端下载完毕,然后才能继续下一个管道步骤。

# pre-processing

# unpack scene data
python code/python/tools/dataset_initialize_scenes.py --dataset_dir /Volumes/portable_hard_drive/evermotion_dataset --downloads_dir downloads --dataset_dir_to_copy evermotion_dataset --scene_names "ai_00*"

# export scene data from native asset file into vrscene file (not provided)

# correct bad default export options
python code/python/tools/dataset_modify_vrscenes_normalize.py --dataset_dir /Volumes/portable_hard_drive/evermotion_dataset --platform_when_rendering windows --dataset_dir_when_rendering Z:\\evermotion_dataset --scene_names "ai_00*"

# generate a fast binary triangle mesh representation
python code/python/tools/dataset_generate_meshes.py --dataset_dir /Volumes/portable_hard_drive/evermotion_dataset --scene_names "ai_00*"
# generate an occupancy map (must be run on macOS or Linux)
python code/python/tools/dataset_generate_octomaps.py --dataset_dir /Volumes/portable_hard_drive/evermotion_dataset --scene_names "ai_00*"
# generate camera trajectories (must be run on macOS or Linux)
python code/python/tools/dataset_generate_camera_trajectories.py --dataset_dir /Volumes/portable_hard_drive/evermotion_dataset --scene_names "ai_00*"
# modify vrscene to render camera trajectories with appropriate ground truth layers
python code/python/tools/dataset_modify_vrscenes_for_hypersim_rendering.py --dataset_dir /Volumes/portable_hard_drive/evermotion_dataset --platform_when_rendering windows --dataset_dir_when_rendering Z:\\evermotion_dataset --scene_names "ai_00*"
# cloud rendering

# output rendering job description files for geometry pass
python code/python/tools/dataset_submit_rendering_jobs.py --dataset_dir /Volumes/portable_hard_drive/evermotion_dataset --render_pass geometry --scene_names "ai_00*"

# render geometry pass in the cloud (not provided)

# output rendering job description files for pre pass
python code/python/tools/dataset_submit_rendering_jobs.py --dataset_dir /Volumes/portable_hard_drive/evermotion_dataset --render_pass pre --scene_names "ai_00*"

# render pre pass in the cloud (not provided)

# merge per-image lighting data into per-scene lighting data
python code/python/tools/dataset_generate_merged_gi_cache_files.py --dataset_dir /Volumes/portable_hard_drive/evermotion_dataset --scene_names "ai_00*"

# output rendering job description files for final pass
python code/python/tools/dataset_submit_rendering_jobs.py --dataset_dir /Volumes/portable_hard_drive/evermotion_dataset --render_pass final --scene_names "ai_00*"

# render final pass in the cloud (not provided)
# post-processing

# generate tone-mapped images for visualization
python code/python/tools/dataset_generate_images_tonemap.py --dataset_dir /Volumes/portable_hard_drive/evermotion_dataset --scene_names "ai_00*"

# generate semantic segmentation images
python code/python/tools/dataset_generate_images_semantic_segmentation.py --dataset_dir /Volumes/portable_hard_drive/evermotion_dataset --scene_names "ai_00*"

# generate 3D bounding boxes (must be run on macOS or Linux)
python code/python/tools/dataset_generate_bounding_boxes.py --dataset_dir /Volumes/portable_hard_drive/evermotion_dataset --bounding_box_type object_aligned_2d --scene_names "ai_00*"