3D Vision Technology Explained: Stereo vs ToF vs Structured Light

Image Source: depositphotos.com

Depth cameras have become a standard component in robotics, automation, and computer vision development. But “depth camera” is not a single technology. It is a category that includes at least three fundamentally different approaches to measuring distance — each with distinct operating principles, performance characteristics, and trade-offs.

Choose the wrong technology for your application and you will encounter problems that no amount of software tuning can fix. The limitation is in the physics, not the code.

This guide explains how each major depth sensing technology works, where each one performs well, where it struggles, and how to choose based on your specific application requirements. Understanding 3D vision camera technology explained at this level is the foundation of making informed hardware decisions.

How Depth Cameras Work: The Shared Goal

Every depth camera has the same fundamental goal: produce a depth map — an image where each pixel contains a distance value rather than a brightness or color value. That depth map, combined with the camera’s intrinsic parameters, can be converted into a 3D point cloud representing the geometry of the scene.

The three technologies — stereo vision, time-of-flight, and structured light — achieve this goal through different physical mechanisms. Those mechanisms determine accuracy, range, resolution, power requirements, computational cost, and susceptibility to environmental interference.

Stereo Vision

How It Works

Stereo vision mimics human binocular vision. Two cameras are mounted at a fixed horizontal distance — called the baseline — and capture simultaneous frames of the same scene. Because the cameras are at different positions, objects appear at slightly different locations in the two images. This offset is called disparity.

For any pixel in one image, the depth can be calculated from the disparity to the corresponding pixel in the other image:

Depth = (focal length × baseline) / disparity

This is the fundamental stereo equation. Depth is inversely proportional to disparity — objects closer to the camera produce larger disparity values. Objects far away produce small disparity, and beyond a certain range, disparity becomes too small to measure reliably.

Passive Stereo

Passive stereo relies on natural image texture to find corresponding points between the two camera views. The stereo matching algorithm looks for distinctive features — edges, corners, texture patterns — and matches them across the two images.

This works well on textured scenes but fails on textureless surfaces. A blank white wall, a uniform metal surface, or any scene without distinctive local features produces matching failures and depth voids in the output.

Stereolabs ZED cameras use passive stereo with GPU-accelerated neural depth estimation. The GPU runs learned matching models that can partially compensate for textureless regions — but only when a GPU is available. This makes ZED cameras powerful but computationally dependent.

Active Stereo

Active stereo adds a projector — typically an IR laser — that illuminates the scene with a dot pattern, speckle pattern, or structured pattern. This gives the stereo matching algorithm reliable texture even on surfaces with no natural features.

The Orbbec Gemini 335 and Gemini 335L use active and passive stereo in combination. The MX6800 ASIC handles depth computation onboard, without requiring the host CPU or GPU for the stereo matching process.

  • Orbbec Gemini 335 — Key Specifications: - Technology: Active + Passive Stereo IR - Baseline: 50mm - Depth range: 0.10m to 20m+ - Depth accuracy (RMSE): <1.5% at 2m - Depth FoV: 90° H × 65° V - IP rating: IP5X - Depth engine: Orbbec MX6800 ASIC (onboard processing) - RGB resolution: 1920×1080 - Multi-camera sync: Supported
  • Orbbec Gemini 335L — Key Specifications: - Technology: Active + Passive Stereo IR - Baseline: 95mm (longer baseline = better accuracy at range) - Depth accuracy (RMSE): <0.8% at 2m, <1.6% at 4m - IP rating: IP65 (dust-tight, water-resistant — industrial environments) - Multi-camera sync: Supported

The longer baseline of the Gemini 335L improves depth accuracy at medium range, which is why larger baseline cameras are preferred when accuracy at 2–6m matters more than compact form factor.

Where Stereo Vision Works Well

  • Indoor and outdoor applications (active stereo handles both)
  • Long-range sensing (0.10m to 20m+)
  • Mobile robots and AMRs where range is critical
  • Applications requiring ROS integration and on-host point cloud processing
  • Budget-conscious deployments — stereo is generally lower cost than ToF

Where Stereo Vision Struggles

  • Passive stereo fails on textureless surfaces
  • Depth noise increases at longer ranges (disparity resolution becomes limiting)
  • Narrow stereo baseline reduces accuracy at range
  • Multi-camera active stereo setups can experience IR interference between units without sync management

Time-of-Flight (ToF)

How It Works

Time-of-flight cameras measure depth by measuring how long it takes for light to travel from the camera to a surface and back. Since the speed of light is constant, distance equals travel time multiplied by the speed of light, divided by two (for the round trip).

Distance = (speed of light × time of flight) / 2

In practice, direct measurement of nanosecond-scale light travel time requires specialized hardware. Modern ToF cameras use an indirect method called iToF (indirect Time-of-Flight): the camera emits amplitude-modulated infrared light and measures the phase shift between the emitted and received signals. Phase shift is proportional to distance.

The Orbbec Femto Bolt and Femto Mega use iToF technology developed in collaboration with Microsoft, implementing the same sensor technology as the Microsoft Azure Kinect. The IR illuminator operates at 850nm wavelength.

Depth Measurement Modes

ToF cameras typically offer multiple depth modes that trade field of view for resolution or range:

  • NFOV (Narrow Field of View): Higher depth resolution in a narrower scene. Better for accuracy-critical applications.
  • WFOV (Wide Field of View): Covers a larger area but with lower per-pixel resolution.
  • Binned modes: Combine adjacent pixels to increase frame rate at the cost of spatial resolution.

Orbbec Femto Bolt — Key Specifications: - Technology: iToF (Microsoft-proven sensor technology) - Depth sensor: 1 Megapixel (up to 1024×1024) - Depth range: 0.25m to 5.46m (mode-dependent) - Depth FoV: 120° (WFOV), narrower in NFOV modes - Depth accuracy: <11mm + 0.1% of distance (systematic error) - Depth precision: ≤17mm (random error std. dev.) - RGB: 4K (3840×2160) with HDR - RGB frame rate: Up to 30fps - IMU: 6DoF integrated - Connectivity: USB-C 3.2 Gen 1 - Form factor: 115mm × 40mm × 65mm - SDK: Azure Kinect SDK compatible + Orbbec SDK

The Femto Bolt is Azure Kinect-compatible. Applications built on the Azure Kinect Sensor SDK can switch to the Femto Bolt using Orbbec’s provided SDK wrapper without significant code changes. This matters for the large installed base of Azure Kinect developers following Microsoft’s discontinuation of that product.

Multi-Path Interference

iToF cameras are susceptible to multi-path interference — a phenomenon where IR light reflects off multiple surfaces before reaching the sensor. The sensor receives a mixture of signals from different paths, producing a phase measurement that corresponds to an incorrect distance.

Multi-path interference is most common in corners, next to walls, and around highly reflective objects. It tends to produce depth values that are too large at concave scene features.

This is a fundamental limitation of the iToF operating principle and cannot be fully corrected in software.

Sunlight Sensitivity

ToF cameras emit IR light and must detect the returned signal against a background of ambient IR. Direct sunlight contains substantial IR radiation that can overwhelm the camera’s signal — particularly at the 850nm wavelength used by most iToF cameras.

This makes ToF cameras less suitable for outdoor applications in direct sunlight, and generally limits their use to indoor or semi-outdoor (indirect sunlight) conditions.

Where ToF Works Well

  • Fixed indoor installations with known scene geometry
  • Human-interaction applications: body tracking, gesture recognition, patient positioning
  • Applications requiring 1-megapixel depth resolution in a single shot
  • Systems already using Azure Kinect SDK — Femto Bolt is a drop-in replacement
  • Volumetric capture and spatial computing applications

Where ToF Struggles

  • Outdoor use in direct sunlight
  • Scenes with highly reflective surfaces (multi-path interference)
  • Long-range applications beyond 5–6m
  • Deployments where power budget is constrained (ToF illuminators consume more power than stereo)

Structured Light

How It Works

Structured light cameras project a known pattern — typically a grid, stripe pattern, or dot pattern — onto the scene using a projector. A separate camera observes how that pattern deforms as it falls on surfaces at different depths.

The deformation of the projected pattern is a function of the surface geometry. By comparing the observed pattern to the expected (flat) reference pattern, the system can calculate the 3D shape of the scene.

The Orbbec Femto Bolt’s technology classification involves both ToF and structured light concepts. True structured light cameras — like the Microsoft Azure Kinect’s original structured light sensor or precision 3D scanners — project patterns that are analyzed geometrically. This is distinct from active stereo, which projects pattern to assist texture matching, and distinct from iToF, which projects modulated light to measure phase.

Resolution and Accuracy Advantages

Structured light systems can achieve sub-millimeter accuracy at close range. High-end structured light scanners — used in metrology, quality control, and dental scanning — are among the most accurate 3D measurement tools available.

At close range with stable lighting conditions, structured light outperforms both stereo and ToF on absolute depth accuracy.

Range and Ambient Light Limitations

Structured light cameras have limited range compared to active stereo. The projected pattern must be visible against the ambient background. In bright ambient light, particularly outdoors, the projected pattern washes out and depth measurement becomes unreliable.

At longer ranges, the pattern becomes finer and harder to detect, reducing effective range compared to stereo systems.

Near-Range Performance

For close-range applications — pick-and-place at 0.3–1.0m, 3D scanning, face recognition, gesture interfaces — structured light delivers excellent depth accuracy. This is why structured light was the technology of choice for early consumer depth cameras and continues to dominate precision scanning applications.

Where Structured Light Works Well

  • Close-range precision scanning and measurement
  • Quality inspection at sub-millimeter tolerances
  • Face recognition and gesture interfaces
  • 3D reconstruction in controlled lighting conditions
  • Medical and dental scanning

Where Structured Light Struggles

  • Outdoor use — projected pattern washes out in sunlight
  • Long-range applications — effective range typically under 3–4m for consumer/prosumer hardware
  • High-speed dynamic scenes — some structured light systems require multiple frames for a single depth measurement
  • Large FOV coverage — projector output limits scene size

Technology Comparison Matrix

Characteristic

Active Stereo (Gemini 335/335L)

ToF (Femto Bolt)

Structured Light

Operating principle

Disparity from two cameras + IR projector

Phase shift of modulated IR light

Pattern deformation analysis

Typical range

0.10m – 20m+

0.25m – 5.5m

0.1m – 3m (consumer)

Depth accuracy

<0.8–1.5% RMSE at 2m

<11mm + 0.1% (systematic)

Sub-mm at close range

Outdoor use

Yes (active stereo handles sunlight)

Limited — sunlight saturates IR

No — pattern washes out

Textureless surfaces

Active IR handles — passive fails

Not affected

Handles well

Reflective surfaces

Performs with active IR

Multi-path interference

Struggles

Computational load

Onboard ASIC (Gemini 335)

Onboard sensor

Variable

Depth resolution

High (1280×800 typical)

Up to 1024×1024 (1MP)

High (scanner-dependent)

Power consumption

Low–moderate

Moderate–high (IR illuminator)

Moderate

Cost

Low–moderate

Moderate

Moderate–high

Best for

Robotics, AMRs, outdoor, long range

Indoor body tracking, Azure Kinect migration

Precision scanning, quality control

Choosing the Right Technology

The choice between stereo, ToF, and structured light is not about which is “best” — it is about which matches your application’s constraints.

  • Use active stereo (Gemini 335 / Gemini 335L) when: - Your application involves outdoor operation or variable lighting - Range beyond 3m is required - You need a compact, power-efficient solution for a mobile robot or AMR - Processing budget is limited and onboard depth computation matters - Cost is a factor in high-volume deployment
  • Use ToF (Femto Bolt / Femto Mega) when: - You are migrating from Microsoft Azure Kinect and need SDK compatibility - Your application requires 1-megapixel depth resolution in a single shot - The deployment is indoor with controlled lighting - Body tracking, gesture recognition, or volumetric capture is the use case - You need the widest possible depth FoV in a compact housing
  • Use structured light when: - Depth accuracy at sub-millimeter scale is required at close range - The application is a controlled indoor scanning or inspection environment - Range is under 2–3m and ambient lighting is controlled

For most mobile robotics, AMR, and collaborative robot applications, active stereo covers the widest range of conditions — outdoor, variable lighting, long range — at the lowest computational and power cost.

Orbbec’s Technology Portfolio

Orbbec manufactures across all three depth sensing categories, which makes it possible to evaluate each technology directly from a single vendor with consistent SDK support.

The Gemini 330 series (Gemini 335, Gemini 335L) covers active stereo for robotics and outdoor applications. The Femto series (Femto Bolt, Femto Mega) covers iToF for indoor, body tracking, and Azure Kinect migration. Both lines run on the Orbbec SDK, with ROS and ROS 2 support and wrappers for common robotics frameworks.

For developers evaluating which depth technology fits a specific application, Orbbec’s documentation portal provides technical depth on operating principles, application examples, and integration guidance. The 3D vision camera technology explained section covers sensor operating principles, SDK integration, and application-specific recommendations across the full product range.

Conclusion

Stereo vision, time-of-flight, and structured light are distinct technologies with different physical operating principles. Each has scenarios where it excels and constraints that make it unsuitable for others.

Understanding these trade-offs is what allows an engineer to make a confident hardware selection before committing to a design. The wrong choice surfaces as a constraint that cannot be engineered around — whether that is a ToF camera overwhelmed by outdoor sunlight, a passive stereo camera unable to process a textureless bin, or a structured light scanner asked to operate beyond its practical range.

Map your application’s requirements — range, accuracy, lighting conditions, power budget, computational architecture, surface types — against the technology characteristics described here. The right choice should become clear.

For full technical documentation covering each technology in depth, visit Orbbec’s documentation.

Technical specifications for Orbbec Gemini 335, Gemini 335L, Femto Bolt, and Femto Mega cited from Orbbec product pages and published datasheets. Stereo disparity equation and ToF operating principles are standard references in the computer vision literature.