Verifying and Analyzing Perception Datasets with Dataset Insights
Unity's Datasets Insights is a Python package that provides a variety of tools for downloading, processing, and analyzing datasets generated using the Perception package. In addition to a Python library, the package comes with a sample Jupyter notebook that helps you load datasets and verify some of their most commonly needed statistics.
In this guide, we will go through the steps involved in opening Perception datasets and verifying them using the provided Jupyter notebook. This includes both datasets that are generated locally and those generated with Unity Simulation. To learn how to generate datasets locally follow Phase 1 of the Perception Tutorial.
- 🟢 Action: Download and install Docker Desktop
Locally generated datasets
- 🟢 Action: Open a command line interface (Command Prompt on Windows, Terminal on Mac OS, etc.) and type the following command to run the Dataset Insights Docker image:
docker run -p 8888:8888 -v <path to synthetic data>:/data -t unitytechnologies/datasetinsights:latest
, where the path to data is what we looked at earlier. You can copy the path using the Copy Path button in thePerception Camera
UI.
ℹ️ If you get an error about the format of the command, try the command again with quotation marks around the folder mapping argument, i.e.
"<path to synthetic data>:/data"
.
This will download a Docker image from Unity. If you get an error regarding the path to your dataset, make sure you have not included the enclosing <
and >
in the path and that the spaces are properly escaped.
- 🟢 Action: The image is now running on your computer. Open a web browser and navigate to
http://localhost:8888
to open the Jupyter notebook:
- 🟢 Action: To make sure your data is properly mounted, navigate to the
data
folder. If you see the dataset's folders there, we are good to go. - 🟢 Action: Navigate to the
datasetinsights/notebooks
folder and openPerception_Statistics.ipynb
. - 🟢 Action: Once in the notebook, remove the
/<GUID>
part of thedata_root = /data/<GUID>
path. Since the dataset root is already mapped to/data
, you can use this path directly.
This notebook contains a variety of functions for generating plots, tables, and bounding box images that help you analyze your generated dataset. Certain parts of this notebook are currently not of use to us, such as the code meant for downloading data generated through Unity Simulation (coming later in this tutorial).
Each of the code blocks in this notebook can be executed by clicking on them to select them, and then clicking the Run button at the top of the notebook. When you run a code block, an asterisk (*) will be shown next to it on the left side, until the code finishes executing.
Below, you can see a sample plot generated by the Dataset Insights notebook, depicting the number of times each of the 10 foreground objects appeared in the dataset. As shown in the histogram, there is a high level of uniformity between the labels, which is a desirable outcome.
- 🟢 Action: Follow the instructions laid out in the notebook and run each code block to view its outputs.
Datasets generated with Unity Simulation
For these datasets we recommend using a slightly different command to open the notebook, as we do not need to mount a specific dataset folder. Instead, we mount a folder which will hold our downloaded datasets.
- 🟢 Action: Open the notebook using the command below:
docker run -p 8888:8888 -v <download path>/data:/data -t unitytechnologies/datasetinsights:latest
In the above command, replace <download path>
with the location on your computer in which you wish to download your data.
Once the Docker image is running, the rest of the workflow is quite similar to what we did for locally generated data. The only difference is that we need to uncomment certain lines of code from the notebook to download the dataset.
- 🟢 Action: Open a web browser and navigate to
http://localhost:8888
to open the notebook. - 🟢 Action: Navigate to the
datasetinsights/notebooks
folder and openPerception_Statistics.ipynb
. - 🟢 Action: In the
data_root = /data/<GUID>
line, the<GUID>
part will be the location inside your<download path>
where the data will be downloaded. Therefore, you can just remove it so as to have data downloaded directly to the path you previously specified:
- 🟢 Action: In the block of code titled "Unity Simulation [Optional]", uncomment the lines that assign values to variables, and insert the correct values, based on information from your Unity Simulation run.
We have previously learned how to obtain the run_execution_id
and project_id
. You can remove the value already present for annotation_definition_id
and leave it blank. What's left is the access_token
.
- 🟢 Action: Return to your command-line interface and run the
usim inspect auth
command.
MacOS:
USimCLI/mac/usim inspect auth
If you receive errors regarding authentication, your token might have timed out. Repeat the login step (usim login auth
) to login again and fix this issue.
A sample output from usim inspect auth
will look like below:
Protect your credentials. They may be used to impersonate your requests.
access token: Bearer 0CfQbhJ6gjYIHjC6BaP5gkYn1x5xtAp7ZA9I003fTNT1sFp
expires in: 2:00:05.236227
expired: False
refresh token: FW4c3YRD4IXi6qQHv3Y9W-rwg59K7k0Te9myKe7Zo6M003f.k4Dqo0tuoBdf-ncm003fX2RAHQ
updated: 2020-10-02 14:50:11.412979
The access_token
you need for your Dataset Insights notebook is the access token shown by the above command, minus the 'Bearer '
part. So, in this case, we should copy 0CfQbhJ6gjYIHjC6BaP5gkYn1x5xtAp7ZA9I003fTNT1sFp
into the notebook.
- 🟢 Action: Copy the access token excluding the
'Bearer '
part to the corresponding field in the notebook.
Once you have entered all the information, the block of code should look like the screenshot below (the actual values you input will be different):
- 🟢 Action: Continue to the next code block and run it to download all the metadata files from the generated dataset. This includes JSON files and logs but does not include images (which will be downloaded later).
You will see a progress bar while the data downloads:
The next couple of code blocks (under "Load dataset metadata") analyze the downloaded metadata and display a table containing annotation-definition-ids for the various metrics defined in the dataset.
- 🟢 Action: Once you reach the code block titled "Built-in Statistics", make sure the value assigned to the field
rendered_object_info_definition_id
matches the id displayed for this metric in the table output by the code block immediately before it. The screenshot below demonstrates this (note that your ids might differ from the ones here):
Follow the rest of the steps inside the notebook to generate a variety of plots and stats.
Keep in mind that this notebook is provided just as an example, and you can modify and extend it according to your own needs using the tools provided by the Dataset Insights framework.