Point cloud rendering of a car

[ NODAR Technology ]

Technology Overview

How NODAR's Ultra-Wide Baseline Stereo Vision Achieves 1,000-Meter 3D Sensing

The Science Behind Ultra-Wide Baseline Stereo Vision

NODAR's Hammerhead software enables standard stereo cameras to produce dense, accurate 3D point clouds at ranges up to 1,000 meters — far beyond the ~5-meter limit of conventional stereo vision. It does this by widening the distance between two cameras (the baseline) from the typical few centimeters to 0.5–3 meters or more, then solving the calibration problem that previously made wide-baseline stereo impractical for outdoor deployment. The result is a passive, software-defined 3D sensing system that runs on off-the-shelf cameras, requires no LiDAR, and automatically recalibrates every frame without human intervention. This page describes the technical principles behind Hammerhead and GridDetect, NODAR's two core software products.

Hammerhead

Hammerhead is NODAR's stereo vision software that converts raw images from two standard cameras into dense 3D point clouds at ranges up to 1,000 meters, using triangulation-based depth measurement rather than inferred or neural-network-based estimation.

Triangulation is a reliable method for estimating range, like a tape measure. It is a direct measurement and is not inferred from tangential information (such as monocular depth estimation). The range to an object can be expressed explicitly as a function of the angles from two cameras and the distance between the cameras.

How Does Increasing the Stereo Baseline Improve Range at Long Distance?

Increasing the distance between two stereo cameras — the baseline — directly improves range resolution. This relationship is what allows NODAR's ultra-wide baseline stereo vision (0.5–3m+ baselines) to achieve accurate 3D sensing at distances exceeding 1,000 meters, far beyond the ~5-meter limit of standard stereo cameras.

In fact, the depth uncertainty (sometimes called range resolution) is proportional to the baseline length. Therefore, for the same camera resolution and optics, increasing the baseline by a factor of 10, decreases the depth uncertainty by a factor of 10. A photo of a 0.5-m-baseline stereo camera is shown next to a 0.05-m-baseline stereo camera in Fig. 1.

Chart showing the ratio of valid points in different weather conditions
Chart showing the ratio of valid points in different weather conditions

Fig. 1. Ultra-wide-baseline stereo camera with 0.5-m baseline vs. standard stereo camera with 0.05-m baseline.

The depth uncertainty, Δz​, for a stereo vision camera is derived in Fig. 2, and is equal to:

Chart showing the ratio of valid points in different weather conditions

Eq. 1

where IFOV is the instantaneous field of view, R is the range to the object, 𝛅 is the disparity measurement resolution (𝛅=0.1 pixels for Hammerhead software), and B is the baseline length. Fig. 2 shows graphically that as the baseline increases, the depth uncertainty improves.

Chart showing the ratio of valid points in different weather conditions

Fig. 2. Depth uncertainty improves as the baseline length gets bigger.

The equation for depth uncertainty, Eq. 1, also shows that it increases as the square of the range, i.e., R², which is the main reason why previous stereo vision cameras were limited to less than about 5-meter range. The depth uncertainty is plotted as a function of range in Fig. 3 for both LiDAR and stereo vision systems. Note that the substitution, IFOV = 1/f, was used. The range resolution is the distance between two objects before they blend together in range, i.e., how well can the system resolve two closely-spaced objects in range. To be clear, range resolution is not signal-dependent and is not the same as range precision or range accuracy. Precision is derived from resolution by averaging multiple returns and depends on SNR, which is range-dependent.

The range resolution of LiDAR is the point spread function in range, that is, it is the spread of returns from a flat normal target, and is equal to the bandwidth of the LiDAR transceiver. Typically, LiDARs have 4-ns pulses with a matching 250-MHz optoelectronic receiver, which corresponds to a range resolution of 60 cm. Fig. 3 shows a horizontal line for LiDAR (orange line), showing that the range resolution is the same for all ranges, because the resolution only depends on the laser pulse width (more precisely, the LiDAR transceiver bandwidth).

Chart showing the ratio of valid points in different weather conditions
Chart showing the ratio of valid points in different weather conditions

Fig. 3. Range resolution vs. range.

The range resolution of stereo vision is given by the depth uncertainty equation, Eq. 1, with 𝛅 set equal to one pixel, the minimum feature size that can be discerned optically. Fig. 3 shows the range resolution of three different stereo cameras with different focal lengths and baselines. Standard stereo vision with 0.1-m baseline (yellow line) has better range resolution than LiDAR up to 5-m range, ultra-wide stereo vision with 1-m baseline (blue line) has better range resolution than LiDAR up to 55 m (blue line), and ultra-wide stereo vision with 3-m baseline (green line) has better range resolution than LiDAR up to 140 m (green). Increasing the baseline of stereo vision cameras opens long-range sensing applications.

The range resolution of stereo vision is given by the depth uncertainty equation, Eq. 1, with 𝛅 set equal to one pixel, the minimum feature size that can be discerned optically. Fig. 3 shows the range resolution of three different stereo cameras with different focal lengths and baselines. Standard stereo vision with 0.1-m baseline (yellow line) has better range resolution than LiDAR up to 5-m range, ultra-wide stereo vision with 1-m baseline (blue line) has better range resolution than LiDAR up to 55 m (blue line), and ultra-wide stereo vision with 3-m baseline (green line) has better range resolution than LiDAR up to 140 m (green). Increasing the baseline of stereo vision cameras opens long-range sensing applications.

Fig. 3.5. Example Hammerhead depth map and 3D point cloud outputs for a car driving on the highway with cameras in the upper left and right windshield: 1.1-m baseline, 30° FOV, 5.4 MP cameras (Sony IMX490).

Fig. 3.5. Example Hammerhead depth map and 3D point cloud outputs for a car driving on the highway with cameras in the upper left and right windshield: 1.1-m baseline, 30° FOV, 5.4 MP cameras (Sony IMX490).

How Does Hammerhead Calibrate Stereo Cameras Automatically Without Checkerboards?

Hammerhead solves the primary obstacle that previously made wide-baseline stereo vision impractical for outdoor deployment: automatic, frame-by-frame calibration without human intervention or calibration targets. Prior systems required daily manual recalibration using checkerboard patterns because vehicle vibration causes angular camera shifts that grow with the square of the baseline length — a 10x wider baseline produces 100x more angular disturbance from the same force. Stereo vision relies on measuring tiny angles to determine object locations, making it highly sensitive to disruptions. Even minute angular shifts of just 0.01° in the cameras could result in significant range errors or render the stereo matcher unable to find reliable correspondences, preventing it from reporting any range data.

In fact, the angular disturbance of cameras caused by environmental factors increases with the square of the baseline length, effectively limiting practical stereo vision systems to baseline lengths of just a few tens of centimeters. While longer baseline systems have been documented in the literature, they are predominantly confined to indoor, static applications and require frequent recalibration. Fig. 4 shows that the same force on the end of a narrow- and wide-baseline stereo camera causes the slope at the end of the wider beam to deflect 100 times more than the narrower beam. For more details, we provide an application note that explains the necessity of frame-to-frame calibration [here].

Car graphic showing distances

Fig. 4. An ultra-wide baseline stereo camera (bottom) and a short-baseline stereo camera (top) with 10x the baseline length. The angular displacement of the beam from the same force is 100x for the ultra-wide baseline, making it much more sensitive to calibration issues. Hammerhead compensates for large calibration errors in software.

Fig. 4. An ultra-wide baseline stereo camera (bottom) and a short-baseline stereo camera (top) with 10x the baseline length. The angular displacement of the beam from the same force is 100x for the ultra-wide baseline, making it much more sensitive to calibration issues. Hammerhead compensates for large calibration errors in software.

The inconvenient truth is that previous vehicle-mounted stereo camera systems required daily calibration to address subtle shifts in camera positions caused by the shocks and vibrations of regular driving. This calibration process was time-consuming, typically involving capturing 20-40 images of checkerboards or calibration patterns positioned at various angles and distances relative to the cameras. Moreover, systems with ultra-wide baselines, particularly those using telephoto lenses, demanded calibration boards of impractically large size. Hammerhead solves these practical calibration issues and enables ultra-wide baseline stereo vision.

Hammerhead autocalibration uniquely solves three challenges needed for commercial wide-baseline stereo vision:

  1. Natural scenes. Products cannot be shipped with engineers and calibration checkerboards. The stereo camera must be able to automatically calibrate from natural scenes without human intervention.

  2. Bandwidth. Shock, and engine and road vibration are at higher frequencies (100-300 Hz) than the frame rate of the camera (~30 Hz), and hence every frame must be calibrated independently. This is necessary for on- and off-road robotic applications. In the past, automatic calibration every frame was computationally prohibitive, but NODAR’s breakthrough algorithms efficiently solve the calibration problem in real time.

  3. Accuracy. Hammerhead offers sub-pixel reprojection error for accurate and best-in-class ranging at long ranges.

Hammerhead achieves all three requirements simultaneously. Previous autocalibration algorithms use the “keypoint approach:” keypoints are found in the left and right images and a rigid geometry assumption allows for the estimate of the camera extrinsic camera parameters. Keypoint approaches fail for several reasons:

  1. Incorrect matches. Especially in urban environments, with repeated structures, such as windows, keypoints can be incorrectly matched. Even one incorrect match leads to unusable results.

  2. Incorrect locations. Rounded corners, which are common in natural scenes, do not have a sharp features and keypoints (SIFT, ORB, etc) select and incorrectly match different parts of rounded corners from the left to right images.

  3. Poor accuracy. Some keypoint types are not accurate to sub-pixel accuracy and are inherently inaccurate.

  4. Poor keypoint distribution. Keypoints are often clustered at the horizon or in certain spots of the image, and do not provide even sampling across the image, which reduces the estimation accuracy of the extrinsic camera parameters.

Hammerhead does not use keypoints to determine camera parameters and does not assume infinitely sharp points (like corners in a checkerboard), but rather looks at all pixels in the image to determine calibration.

How Does Hammerhead's Stereo Matcher Work?

Hammerhead's stereo matcher operates at full image resolution with ±0.1-pixel disparity error, processes frames seven times faster than HITnet (a leading KITTI benchmark algorithm), and produces no hallucinated range estimates — every output is a real measurement with an associated confidence value. It is optimized for outdoor robotic applications including adverse weather, low light, and high-dynamic-range conditions.

  1. Full resolution. Unlike most neural network approaches, it operates at the full resolution of the image without downsampling. This allows it to find “the needle in the haystack,” i.e., small objects in a large scene. Processing at full resolution gives extraordinary sharpness without the low-pass filter effect often seen in some neural network approaches.

  2. Fast. Hammerhead stereo matching is seven times faster than HITnet, one of the leading stereo matching algorithms on the KITTI leaderboard.

  3. Accurate. Hammerhead achieves best-in-class ±0.1-pixel disparity error (see datasheet).

  4. No hallucinations. Does not “make up” non-existent information like some neural networks. Every range estimate is real and has an associated confidence value.

  5. Low light. Excellent performance at night.

    1. Image processing, which uses robust feature matching, is highly sensitive and can measure distances to even the most subtle features. Additionally, averaging across pixels further enhances low-light performance.

    2. Superior calibration improves the matching of features in low SNR images. NODAR calibration makes it easier to find features that match in two camera images because (1) the correct windows are matched (see Fig. 4.5), and (2) it reduces the search space from 2D (CNNs) to 1D (along the epipole).

  6. Bad weather. Excellent performance in rain, snow, fog, and dust.

    1. Diffuse illumination from the sky for passive stereo vision sensors is advantageous compared to active LiDAR systems that project light from the transceiver, which suffers from blinding backscattering issues.

    2. Only in extreme rain, snow, and fog conditions does the visibility reduce below 100 meters. Excellent range estimates can be obtained as long as the visibility matches the range of interest. LiDAR, on the other hand, will often fail in light rain and fog conditions.
      (Check out the IEEE article here on NODAR's weather performance.)

  7. Dawn and Dusk. Excellent HDR performance even with the sun directly in the camera image.

    1. Hammerhead is compatible with rolling shutter cameras, which offer the highest dynamic range of CMOS sensors (compared to global shutters, common in stereo vision cameras).

Car graphic showing distances

Fig. 4.5. Correct calibration assures that the left and right matching windows are comparing the correct pixels, improving stereo matching performance.

Fig. 4.5. Correct calibration assures that the left and right matching windows are comparing the correct pixels, improving stereo matching performance.

Our stereo matching not only works with visible cameras, but with LWIR (thermal), MWIR, SWIR, and NIR cameras. This video shows a 1-m baseline LWIR stereo camera while driving on the highway, using FLIR ADK cameras.

What Does the Hammerhead Processing Pipeline Look Like?

Hammerhead is a binary library that converts raw images from two cameras into colorized 3D point clouds. The library mainly uses GPU resources and is currently compiled for Nvidia GPUs. Fig. 5 shows a block diagram of the inputs, outputs, and intermediate computational blocks. Hammerhead does both automatic calibration and stereo matching.

Car graphic showing distances

Fig. 5. Hammerhead software block diagram.

Fig. 5. Hammerhead software block diagram.

This software library is called “Hammerhead” because the hammerhead shark has the widest distance between the left and right eye in the animal kingdom, with some shark species having a full meter between the eyes. It was discovered in 2009 that Hammerhead sharks have binocular vision, giving them superior depth perception.

Fig. 5.5. Driving on a cobblestone road with standard stereo vision (left) and with Hammerhead (right).

Fig. 5.5. Driving on a cobblestone road with standard stereo vision (left) and with Hammerhead (right).

GridDetect

NODAR's GridDetect is a software library that converts 3D point clouds into occupancy maps for real-time collision avoidance and drivable space detection. It accepts input from Hammerhead or any other 3D sensor — including LiDAR, radar, or sonar — and compresses high-density point cloud data (Hammerhead produces ~50 million points per second) into a lightweight 2.5D representation that can be transmitted and processed with minimal compute resources.

Car graphic showing distances

Fig. 6. GridDetect - input and output.

How Does GridDetect Convert a Point Cloud Into an Occupancy Map?

  1. Re-projects 3D points into bird’s eye view (BEV) or top-down view.

Car graphic showing distances

2. Bin points into cells that are typically 20 x 20 cm.


3. Finds cells that are occupied. A cell is occupied when:
a. The object is too high, max(Y), and
b. The slope is too steep, max(dY/dz).
The user defines max(Y) and max(dY/dz) for the vehicle.

Car graphic showing distances
  1. Tracks occupied cells over time with a particle filter

    1. Generate particles proportional to the density of 3D points and corrected by
      i. Conversion from angle-angle-range (AAR) to XYZ space.
      ii. Range-dependent range resolution, i.e., ΔR.

    b. Propagate particles in time to the next frame. Threshold the number of particles in cells to declare as occupied or not. Go to step (a) and repeat.

Chart showing the ratio of valid points in different weather conditions

Occupancy map showing two occupied cells.

Chart showing the ratio of valid points in different weather conditions

Each particle has a random velocity (Vx and Vz). The old particles from the previous frames are shown in red.

Chart showing the ratio of valid points in different weather conditions

Threshold number of particles in cells to declare as occupied or not.

How Is GridDetect Different From Standard Occupancy Map Software?

GridDetect detects small objects at long range on uneven terrain — a capability standard occupancy map software lacks. Most occupancy map systems assume a flat ground plane, which fails beyond 50 meters on- and off-road surfaces. GridDetect compensates for terrain variation, enabling reliable detection of objects as small as a brick on a road at distances where ground-plane-assumption systems produce false negatives.

  1. This example shows that a standard occupancy map misses a brick on the road. GridDetect does not.

Car graphic showing distances
  1. Hills are not a problem for GridDetect.

Car graphic showing distances
  1. Physics-based object detection. Detects objects based on size and volume rather than a neural network approach that interprets pixel colors to determine whether an object is anomalous. We believe this to be a more robust approach with lower false negatives. For example, zebra crossings painted to look 3D to slow down drivers.

  1. Fast. Processes large point clouds into occupancy maps in real time.

What Does the GridDetect Processing Pipeline Look Like?

GridDetect is a binary library the converts 3D point clouds into occupancy maps. Fig. 7 shows a block diagram of the inputs, outputs, and intermediate computational blocks.

Car graphic showing distances

Fig. 7. GridDetect software block diagram.

How Does NODAR's Stereo Vision Compare to LiDAR?

NODAR's ultra-wide baseline stereo vision outperforms LiDAR across five dimensions critical to autonomous system deployment: range, point cloud density, weather resilience, hardware longevity, and supply chain reliability.

Range: NODAR's Hammerhead software achieves effective 3D sensing at ranges exceeding 1,000 meters. Most commercial LiDAR systems operate at 50–200 meters.

Point cloud density: Hammerhead generates approximately 50 million 3D points per second — equivalent to ten 128-channel LiDAR units operating simultaneously. Typical LiDAR systems produce 0.3–5 million points per second.

Weather resilience: Stereo vision uses passive ambient light and degrades gradually in rain, fog, and dust. LiDAR emits active laser pulses that backscatter off precipitation particles, causing rapid signal degradation in even light rain or fog.

Hardware longevity: NODAR's software runs on standard off-the-shelf cameras with no moving parts, supporting operational lifetimes exceeding 30 years. Mechanical LiDAR systems typically require replacement within 1–3 years due to rotating components.

Supply chain: Because Hammerhead is compatible with any standard camera, customers have multiple redundant hardware suppliers. LiDAR procurement relies on a small number of specialized manufacturers with known production constraints.


lidar

NODAR Stereo vision

Effective range

50-200 m

1,000+ m

Point cloud density

0.3–5M pts/sec

~50M pts/sec

Weather performance

Rapid degradation in light rain/fog

Gradual degradation; maintains range in moderate conditions

Sensing type

Active (laser emission)

Passive (ambient light)

Hardware lifespan

1–3 years (mechanical)

30+ years (no moving parts)

Data output

3D (XYZ)

6D (RGBXYZ)

💤 Stick with the status quo for basic 3D data, or 🚀 upgrade to 6D data for enhanced RGBXYZ insights.
Choose wisely for superior performance and longevity!

Hammerhead and GridDetect Together:

A Complete Stereo Vision Perception Stack

Hammerhead and GridDetect form a complete stereo vision perception pipeline: Hammerhead takes raw images from two cameras and produces dense, calibrated 3D point clouds at ranges up to 1,000 meters; GridDetect takes those point clouds and outputs real-time occupancy maps for collision avoidance and path planning. Together, they replace LiDAR-based perception stacks with a software-defined solution that runs on standard off-the-shelf cameras — across rail, automotive, aviation, mining, maritime, and other demanding applications.

Car graphic showing distances

Fig. 8. Drop-in perception with Hammerhead and GridDetect.

Fig. 8. Drop-in perception with Hammerhead and GridDetect.

🔍 A side-by-side comparison with our competition:

📡 Active sensing. Broadcasts location to the enemy.

Their 3D Sensor (Lidar)

Our 3D Sensor (Ultra-wide Baseline Stereo Vision)

📷 Passive sensing. Safer!

💥 Mechanical failures in 1-3 years

💪 Built to last a lifetime (160 years)

🔍 Short range of 50-200 meters

🎯 Long range of 1000+ meters

👎 Low density, 0.3 to 5M points/s

👏 High density, 50M points/s

❌ Rapid degradation in rain, fog, and dust

✅ Gradual degradation in rain, fog, and dust

🚫 Weak supply chain. Lidar companies often cannot provide the quantities needed, have production and quality problems, and most lidar companies cannot support their product for over 30+ years.

🏭 Strong supply chain. Our software works with any camera, which provides multiple redundant suppliers and parts that can be maintained over a service life of 30+ years.

⚠️ Performance limited by laser eye safety. Lidar cannot transmit more laser power than is safe to the human eye, setting an upper bound for performance.

📈 Performance improves exponentially with larger format CMOS cameras, faster processors, and better algorithms

💤 Only 3D data: XYZ

🚀 6D data: RGBXYZ

💤 Stick with the status quo for basic 3D data, or 🚀 upgrade to 6D data for enhanced RGBXYZ insights. Choose wisely for superior performance and longevity!

Evaluate Hammerhead and GridDetect in Your Environment

There are four ways to get started with NODAR's stereo vision software, depending on where you are in your evaluation:

  1. Try the software now — free. Download NODAR Viewer and explore public datasets to see Hammerhead point cloud and depth map outputs without any hardware.

  2. Get hardware fast. NODAR's Hardware Development Kit ships quickly, pairs with off-the-shelf cameras, and generates 3D point clouds out of the box.

  3. Send us your images. Have NODAR process your existing stereo camera data via NODAR Cloud — no local GPU required.

  4. License the SDK. Get an SDK license to install Hammerhead and GridDetect on your Nvidia Jetson Orin AGX and integrate with your own camera setup.

Not sure which option fits your use case? Talk to an engineer.