PolySpatial Hybrid apps on visionOS

Hybrid apps combine the capabilities of RealityKit and Metal apps. Hybrid apps can make use of the Metal volume camera mode, which can be used alongside Bounded and Unbounded mode to control whether rendering with Metal is active. When an active VolumeCamera is using a Metal output configuration, Unity will render the scene exactly the same way it would if the App Mode were set to Metal. Otherwise, RealityKit is used to render Unity content along with pass-through video, just as it would if the App Mode were set to RealityKit.

Like the Unbounded VolumeCamera in RealityKit with PolySpatial app mode, a VolumeCamera that has been set to Metal mode does not have an output size. Content is not clipped or limited in any way based on where it is located in relation to the VolumeCamera. In fact, the transform of a Metal volume camera has no effect on the content in any way. The CullingMask field of the VolumeCamera is also ignored, along with the Dimensions field.

Instead of using the VolumeCamera transform and properties, regular Unity Cameras are used for rendering. The camera's CullingMask, Transform, etc. all affect what users will see, just as they do in visionOS apps configured for Metal mode and apps built for other XR platforms. As with other XR platforms, Hybrid visionOS apps use the XROrigin component to determine how the real-world tracking space aligns with the Unity scene. Refer to Metal-based Apps on visionOS for more information.

The distinguishing feature of Hybrid apps are that they can switch between RealityKit to Metal rendering at runtime to display various kinds of content in the same Immersive Space. Setting any camera to Metal configuration mode allows Unity to render directly to the GPU with compositor surfaces, while switching back to a Bounded or Unbounded configuration mode returns your app to PolySpatial and RealityKit-based rendering within the corresponding volume window(s).

Pass-through Video in Hybrid Mode

While pass-through video was not available when using Metal rendering on visionOS 1, visionOS 2 introduced the ability to use Metal rendering with the Mixed immersion style. When using Mixed immersion, set your Unity camera's ClearFlags to Solid Color and ensure that the alpha value of this color is 0 in order to display pass-through video. You can switch ClearFlags back to Skybox at any time to render the skybox instead of pass-through. When your app activates Metal, the system presents a CompositorLayer which Unity uses as a render surface. When the VolumeCamera is deactivated or its mode is changed to Bounded or Unbounded, the CompositorLayer is dismissed, and pass-through becomes visible again. RealityKit content in both Hybrid App Mode and RealityKit App Mode have the same constraints. For example, custom shaders must be implemented using ShaderGraph, and ARKit features require the use of an Unbounded VolumeCamera. Refer to RealityKit Apps on visionOS for more information.

Hybrid Mode Constraints

Hybrid mode provides a unique opportunity to leverage the full capabilities of Unity on visionOS. Users don't need to compromise on Unity-specific rendering features that are not possible with RealityKit, and can still take advantage of RealityKit features like volumes.

Unfortunately, this flexibility comes with some constraints - some due to the platform, and some as a result of how Unity works under the hood. We're always working on making Unity better, and we'll point out which of these constraints may be improved in future releases of Unity and PolySpatial. As a general rule, any capabilities or limitations that apply to the Metal or RealityKit App Modes apply to Hybrid apps when one or the other mode is active. For example, ARKit data is still not available unless your app is currently using an ImmersiveSpace, which means either Metal is active or an Unbounded VolumeCamera is in use.

Below are some constraints that may apply to Hybrid mode.

A VolumeCamera configured for Metal cannot be active at the same time as one configured for Unbounded RealityKit. This is a limitation imposed by the platform. Unbounded RealityKit content and Metal content both require an ImmersiveSpace on visionOS. Only one ImmersiveSpace can be used at a time, and RealityKit content cannot share an ImmersiveSpace with a CompositorLayer.
Metal rendering cannot be combined with the Stereo Render Target feature because they both require the use of an XR Display Subsystem. It may be possible to combine the use of these features in future package releases.
There can be performance overhead when operating in this mode. See Hybrid Mode Performance for more information.
Transitioning in and out of Metal rendering is not instant, and may not be done in a seamless manner. See Transitioning Between Modes for more information.
You may need to ensure the first volume camera in the scene and volume camera configuration assigned to Default Volume Configuration setting in Player Settings / PolySpatial are the same. See Default Volume Configuration for more information.
You may need to duplicate a scene, so that there exists one version for Metal and one version for RealityKit. Using material swap sets may alleviate this issue. See Using the same scene for Metal and RealityKit for more information.

It is possible to display any combination of windows and volumes on top of a Metal space. Volumes and windows will always be drawn on top of Metal content, regardless of depth.

Transitioning Between Modes

The transition in and out of Metal rendering is not instant, and can't always be done in a seamless manner. In practice, this means users will always see at least a second or two of pass-through in between Unbounded RealityKit and Metal mode. The easiest way to learn what the transitions look like is to just build the samples and try switching modes. Short of that, it will help to walk through a concrete example.

Here is what a user will see when using a Hybrid app built with Unity, which uses a volume as its start-up scene:

The user taps the app icon on the Home Screen to launch the app. The home screen disappears.
A square window appears while the app is loading, showing the app's icon in the center.
A volume appears with some 3D content, including an affordance to switch to Metal mode. The user activates this affordance.
The volume disappears and the user is prompted with a dialog asking them to be mindful of their surroundings, which they can dismiss by clicking "OK" or "Don't Show Again." After responding to the dialog, the user will see pass-through for a moment before the Metal scene fades into view. Clicking "Don't Show Again" will skip the dialog next time, but not the moment of pass-through before the Metal layer is visible. If any other apps were running, their windows and volumes disappear at the same time as the app's volume. It is possible for your app's volume to persist across the transition, although even if it does, it will disappear temporarily if/when the safety dialog is presented.
At this point, the app can show additional windows or volumes on top of the Metal layer, but it is not possible to render RealityKit objects in an immersive space (in other words, RealityKit objects cannot be rendered outside a volume). Shared mode apps will not be visible. ARKit data is available, and the system may prompt the user for additional authorizations. These prompts and other OS dialogs will appear on top of the Metal layer.
The user gazes at an affordance in the Metal scene and pinches their fingers to initiate a transition back to RealityKit. In this case, the app will transition to an Unbounded RealityKit scene. The Metal layer fades out and RealityKit content immediately fades into view. During the transition, pass-through video is visible, although depending on the scene content, virtual content may remain visible as a semi-transparent overlay throughout the transition, with its opacity reduced briefly to 0% and rising back to 100% over the course of about one second.
In this Unbounded RealityKit scene, ARKit features are available, and the app can open additional windows or volumes. During the transition, any open windows or volumes will remain visible and fixed to the same location as long as the safety dialog was previously dismissed with "Don't Show Again." The safety dialog will hide all windows and volumes while it is being presented. At this point, it would be possible to dismiss the RealityKit ImmersiveSpace by disabling its VolumeCamera or changing its output configuration, at which point apps in the shared space would become visible alongside any windows or volumes opened by the app.
If the user switches back into metal mode again, a small window with the text "Loading..." will appear briefly. Unbounded RealityKit content will fade away and the Metal layer will fade back into view. If the user didn't click "Don't Show Again" earlier, the safety dialog is presented after the Loading window appears. After the user dismisses the safety dialog, the loading window is visible again briefly on top of pass-through as the Metal layer fades into view. See below for more information on the loading window.
Finally, if the user switches back to RealityKit mode and activates another affordance to dismiss the Metal space, the Metal layer will fade out of view and other apps that were running in the shared space will be visible again.

Note

The loading window is required to "bridge the gap" between dismissing the RealityKit ImmersiveSpace and opening the Metal ImmersiveSpace. If no windows or volumes are visible during this transition, the app will be backgrounded and the Home Screen appears. This window is not required when transitioning into a fully immersive space, but is always required when transitioning from an ImmersiveSpace with one of the other ImmersionStyles (Mixed or Progressive) into a fully immersive space or between fully immersive spaces when no windows or volumes are open.

At this point, we've covered all the different transitions that can occur. When transitioning from a mixed or progressive immersive space to a fully immersive space (or if no immersive space was previously open), the user will see a safety dialog, or in its absence a brief fade-out-and-then-in between virtual content and pass-through. If there was no RealityKit volume open prior to this transition, a loading window should be used to bridge the gap. When transitioning from Metal to Unbounded RealityKit, the user will also briefly see pass-through, even if the Unbounded RealityKit scene will completely block pass-through when it becomes visible.

Note

The scenario above assumes that you have not modified the Xcode project generated by Unity. Different settings for the RealityKit ImmersiveSpace ImmersionStyle or other modifications to the SwiftUI scene may result in different behavior. Unfortunately, there is no way to avoid fading in and out of pass-through and presenting the user with a safety dialog when using a fully immersive space.

Hybrid Mode Performance

Currently, the chief drawback of Hybrid mode is performance overhead. In the Metal and RealityKit app modes, we are able to conserve resources by either running Unity in batch mode (a.k.a "headless" mode, which doesn't open windows or render to the screen) for RealityKit mode, or by disabling PolySpatial for Metal mode.

Hybrid mode apps require Unity's full rendering pipeline and PolySpatial. Even if your scene only has one VolumeCamera set to Metal mode, PolySpatial will still be tracking object changes and sending commands to the RealityKit backend. These commands result in a scene full of Entities that are not visible. These invisible entities add a small performance cost and moderate memory overhead that scales with the complexity of your scene.

Likewise, when Metal is not active, and your app is only using RealityKit for rendering, Unity is doing a small amount of extra work by sending commands to the GPU as if it were going to render to the screen. While this can be minimized by disabling any Cameras in your scene, even if there is a camera rendering, the bulk of the work is avoided because the GPU will refuse to execute those commands. In both cases, we are working on future improvements to reduce the overhead of PolySpatial when Unity is the active renderer, and to reduce the overhead of Unity's graphics pipeline when RealityKit is the active renderer.

Default Volume Configuration

As is the case when using the RealityKit App Mode, the Default Volume Configuration in Player Settings is responsible for determining the start-up scene for your app. This start-up scene can either be a Bounded volume, an Unbounded RealityKit volume, or a Metal space.

It is recommended that the Volume Camera Output Configuration asset in the first active VolumeCamera in your app and in the Default Volume Configuration setting in Player Settings / PolySpatial are the same. When launching directly into Metal, users will be prompted with the safety dialog right away, whereas launching directly into either RealityKit mode will have visible pass-through until the app transitions to Metal mode. Apps that use a Bounded volume as their start-up scene will display the typical square window with the app icon in the center, while apps that use an Unbounded volume will not display anything until Unity has finished loading the scene and sent commands to RealityKit to place objects in the scene.

Using the same scene for Metal and RealityKit

Depending on the content, it may make sense to create a separate version of your scene for RealityKit, and one for Metal rendering with Unity. However, this may not always be the case. Most of the default materials and shaders will work equally well in RealityKit as in Metal, and camera/lighting settings configured for Metal generally won't have an impact on rendering the scene in RealityKit. Things can get a little more complicated when using baked lighting or complex rendering features in Metal, but it's a good idea to at least see what your scene looks like by simply switching your VolumeCamera's mode to Metal or RealityKit. For simple material tweaks, material swap sets can be used to define one set of materials for Metal mode, and one set of materials for RealityKit mode.

You will also want to configure your input scripts to use both SpatialPointerDevice and VisionOSSpatialPointerDevice, as described in the Input section of the documentation.

Input in Hybrid Mode

The gaze/pinch gesture is available at all times in Hybrid mode. The same limitations input device pose and selection ray data have with a Bounded VolumeCamera apply here. Both SpatialPointerDevice and VisionOSSpatialPointerDevice are used, as described in the Input section of the documentation. ARKit data is available as long as Metal is active, or an Unbounded VolumeCamera is in use.