Ray marching is an alternative to analytic methods for discovering intersections between a light transport ray and the scene. The structure of the ray marching algorithm makes it amenable to rendering scenes constructed from operations on solid objects, amorphous surfaces, and participating media, which have no explicit surface.
Analytic intersection is extremely efficient. Renderers based on it can render simple scenes interactively on a CPU, which is great for learning and experimentation. Ray marching is less efficient. So, this chapter also introduces Graphics Processing Unit (GPU) programming to enable you to build a real-time ray marching renderer.
Most computational graphics rendering methods are based on ray-surface intersection. The three major ray-surface algorithms are:
Analytic ray intersection and ray marching use implicit surface definitions. For example, the unit-meter sphere is implicitly defined by the set of points satisfying
Analytic ray-surface intersection gives a closed-form solution by solving for the distance parameter $t$ for which the explicit equation of a ray,
Ray marching is an algorithm for numerical estimation of arbitrary ray-surface intersections. It works for arbitrary implicit surfaces (although most of the optimizations that make it practical for real-time use require a surface definition with certain properties).
The simplest form of ray marching begins at the ray origin and samples the implicit surface at regular intervals, terminating when the function crosses zero. This is of course both slow and innaccurate. Fortunately, any numerical root-finding strategy, such as the secant method or Newton-Raphson iteration, can be employed to accelerate it or improve its accuracy. For example, choosing the step size between successive samples based on the derivative of the surface gives faster convergence and more accuracy. Binary search near the zero crossing is a good way to identify the intersection with arbitrary accuracy for continuous implicit surface definitions.
Rasterization with a depth buffer is an amortization of analytic ray intersection over many rays that relies on both explicit and implicit surface definitions. It processes many rays for each surface, instead of many surfaces for each ray. Because of this, it relies on an auxillary image-space structure called the depth buffer to keep track of the currently-known closest surface location to the camera at each image sample. (There are also rasterization methods that perform sophisticated clipping of primitives against each other to avoid this structure, however their increased run time cost puts them out of favor today.)
Rasterization is extremely efficient for triangle meshes on massively parallel hardware, and algorithms are known for some other primitives such as lines, rectangles, and disks. However, because it requires both analytic implicit and explicit equations, rasterization is highly constrained to such primitives.
Consider an example of solving for the root of the equation that gives the intersection of a ray $P +\w t$ and a general sphere with center point $C$ and radius $r$,
The simplest implementation of ray marching iterates $t$ through the domain of function $g$ in fixed $\Delta t$ steps:
This “marches” through every point $X(t) = P + \w t$ on the ray until it finds a $t$ that yields an $X$ on the surface of the sphere.
Of course, this naive ray marching scheme is unlikely to exactly land on the exact value of $t$ for which $g(t) = 0\si{m}$, so the code will almost always loop forever. A better solution enforces a maximum distance bound and seeks the first value at which $g$ crosses from positive to nonpositive, at which point the ray must have passed through the surface:
We can further refine this approach using binary search within the final fixed interval $[t_{n-1}, t_{n}]$ in which $g$ crossed from positive to nonpositive values.
Ray marching with fixed steps has three advantages over analytic ray intersections:
Ray marching with fixed steps has two obvious disadvantages over
analytic ray tracing. The first is that it can miss the
intersection for a non-convex shape. The second is that it takes
at least linear time in the distance to the intersection for
rays that do intersect the surface and linear time in
maxDistance
We can mitigate both the performance disadvantage and the missed intersection disadvantage of fixed-step ray marching with a simple improvement that only slightly constrains the surface description.
We originally defined $g(t) = h(P + \w t) = 0\si{m}$ as the surface. The ray marching algorithm implicitly required only that $g(t) > 0\si{m}$ at the ray origin $P$ and $g$ was continuous. Now, let us require the surface to be an embedded manifold without boundary, i.e., the surface of a solid object, and refine our constraints so that $g$ and $h$ are signed distance estimators: require that $g(t) = h(P + \w t)$ is less than or equal to than the true distance to the surface from point $X(t) = P + \w t$.
Because we're working with surfaces in three dimensions and will wish to cast rays from many different origins (e.g., primary rays, shadow rays, mirror rays), it will be convenient to define a distance estimator on three-space instead of the ray parameter.
We now formally define function $h(X) = g(P + \w t)$ be a signed distance estimator for a surface. For the sphere, one function that satisfies this is simply the original implicit equation: $h_\mathrm{sphere}(X) = ||X - C|| - r$.
The $h(X) = 0\si{m}$ surface is sometimes called the isosurface, although technically it is the $h = 0\si{m}$ isosurface. Fluids are often modeled with a large number of particles, which are collectively rendered by forming a single distance estimator that is the sum of a large number of so-called blobby distance estimators. In this case, the term level set is sometimes used to describe the surface.
Given a signed distance estimator, $h(X)$, we now have an upper bound on how far is safe to step to avoid marching the ray over the intersection. Because $h$ returns a bound on the distance in any direction, we can move at least that far along the direction $\w$ of the ray. In other words, if we consider a sphere of radius $h(X)$ of possible locations to visit from point $X$, that sphere at most touches (i.e., is tangent to) the primitive (...which is also a sphere in our example) that we're tracing against, and may not even touch it.
That new virtual sphere of possible locations to visit can't possibly intersect the primitive more deeply because of the distance estimator guarantee. So, we can jump to testing the next point at the surface of the virtual sphere. This method for advancing the ray is called sphere tracing [Hart1996Sphere] . Note that the “sphere” in the name refers to the sphere within which the algorithm can safely march from its current location; the method can trace any primitive, and intersects the primitives with rays, not spheres.
Why is iteration necessary? Doesn't $h$ give the distance to
the surface immediately? If $h$ gave the exact distance to
the surface, then the while
The minStep
As before, we can extend this root finding method to perform binary search to refine the end hit point. Because $h$ is a signed distance estimator, we can even do so fairly efficiently. Once the ray passes through the surface we can re-approach it accurately from the inside.
For these definitions, let $\max(\vec{v}) = \max(x_v, y_v, z_v)$, $|\vec{v}|=(|x_v|, |y_v|, |z_v|)$, and let $\max(\vec{a}, \vec{b}) = (\max(x_a, x_b), \max(y_a, y_b), \max(z_a, z_b))$. The code samples are in GLSL using some common utilities and HLSL compatibility definitions:
The sphere about point $C$ with radius $r$:
The plane with point $C$ and normal $\n$:
The box with center $C$ and vector $\vec{b}$ from the center to the first-quadrant corner:
The hollow rounded box with center $C$, edge half-lengths in vector $\vec{b}$, and rounding radius $r$:
The torus about the $x$-axis with centroid $C$, minor radius $r$, and major radius $R$:
The “wheel” about the $x$-axis with centroid $C$, minor radius $r$, and major radius $R$ is the same as the torus above, but with a non-Euclidean definition of distance:
The cylinder with centroid $C$, radius $r$, and half-height $e$:
The gradient of any function on 3-space can be approximated by numerical differentiation:
If the distance estimator is not too conservative near the surface, or at least is conservative by the same factor in all directions, we can use this to find the surface normal.
For a point $X$ that is near but not on the surface, normalizing the gradient by dividing through by its length gives the surface normal vector to the nearby surface.
Placing $X$ exactly on the surface can be problematic for this computation, however. For example, for a sufficiently curvy surface, a particular $\epsilon$ value chosen may place all samples of $h$ at another location on the surface, for which $h = 0\si{m}$.
Reasonable four- and six-sample approximations of the normal are thus:
This section describes the ray casting portion of a simple GLSL ray marching tracer that runs on the GPU using the G3D Innovation Engine 10. See http://codeheartjs.com/examples/raytrace/ for a comparable CPU Javascript version that runs in a web browser, and http://codeheartjs.com/examples/fastraytrace/ for an optimized Javascript version. See my gpu-tracing-tutorial.zip for a working example of an analytic ray tracer in C++ and GLSL against which to compare this ray marcher's structure. You can inject the ray marcher directly into that program structure as well.
The GLSL ray marcher below uses rasterization to submit two giant
triangles that cover the entire viewport, under a 2D projection
matrix. It then launches the ray tracing shader kernel at each pixel,
passing it the 3D camera's orientation and projection matrix. The C++
syntax for setting up this call in OpenGL (from App::onGraphics3D
The GLSL code within the referenced trace.pix
The actual tracing code is below. This example defines the scene to be a single, yellow sphere at the origin with a radius of 1m. To make it easier to orient the viewer when investigating the scene with the default camera, I also placed a cube map skybox in the background.
For a full ray tracer, we'd shade the intersection ray and possibly spawn additional rays for reflection, refraction, and shadows.
This setup of using minimal OpenGL to launch a full-screen ray marcher is common in the modern demoscene. Many examples of artistically impressive demos and intros based around this technique can be found at pouet.net. Smaller single-shader examples with source code can be found at shadertoy.com and glslsandbox.com.
An advantage of modeling surfaces with distance estimators is that it is much easier to perform operations on whole shapes in that model than it is when they have explicit triangle representations or direct implicit definitions.
Operators on distance estimators are higher-order functions: they take functions as input and return functions as output. In a language such as Javascript or Scheme that has full support for closures, this can be implemented directly with functions. (As was done in the http://codeheartjs.com/examples/raytrace example from the previous section, which runs in a web browser.)
In a language such as Java or C++, classes and inheritance can mimic the design pattern. A simple substitution-compiled language like GLSL with provides none of these language features for abstracting computation. They can be emulated using macros and overloading or one can directly expand that code by hand, for example,
Union selects the closer of two surfaces. This is equivalent to a set sum on the points (but not a distance sum of the estimators).
Intersection selects the farther of two surfaces.
The set difference (subtraction) of two surfaces turns one inside out and then intersects its outside it with the other's inside.
Any distance function can be tiled across space with period $\vec{v}$ across the dimensions.
mod
%
Transforming a shape is equivalent (from the reference frame of the distance estimator) to inversely transforming the points tested against it. In doing so, we need to be careful about how we have scaled space, however. To transform a shape by an invertible $4\times4$ rotation-translation-dilation (dilation means a uniform scale along all axes) matrix $M$,
M
Because rotation, translation, and non-zero dilation all have trivial inverses and determinants it is not usually necessary to apply the full inverse and determinant operations. For example scaling a primitive by scalar factor $s > 0$ is simply:
Note that if a transformation of a simple primitive is desired, it may be more efficient to let $C = (0, 0,0)$ in the original primitive's definition so that it is centered at the origin and then perform a single transformation to the desired reference frame.
With some care to preserve the conservative property of distance estimation, shapes can be blended for a kind of smooth union. Hoffman and Hopcroft [1985] derive the general case of this, which is commonly referred to as a union using a “smooth min” (smin) function in place of a typical min function.
There are many smin functions, with a variety of performance and quality tradeoffs. Iñigo Quilez [2008] and I recommend this fast polynomial version:
Performance is always a concern in rendering. The performance of ray marching can be increased at the cost of image quality by increasing the minimum step size and accepted distance from the surface, and by decreasing the maximum iterations permitted and resolution. However, it can also be increased by algorithmic improvements that do not affect image quality. Instead, the cost is some of the elegant simplicity of the naive method. However, compared to analytic ray casting and rasterization, optimized ray marching remains remarkably straightforward and accessible to implement.
Keinert et al. [2014] make an observation that allows decreasing the number of iterations before binary search by up to a factor of two. Hart's sphere tracing uses the distance estimator $h$ to create a series of “unbounding” spheres as the ray approaches a surface. The $h(X)=0\si{m}$ isosurface cannot lie within any of these spheres. Therefore, if the unbounding spheres at $X$ and $Y$ overlap, the surface cannot intersect the line segment $XY$.
This observation allows speculatively advancing the ray by $\Delta t = k \cdot h(X)$ for $1 \leq k \leq 2$ under the following process. Let the new point be $Y = X + \hat{\omega}\Delta t$. If $h(Y) \geq k \cdot h(X)$, then there was no intersection on $XY$ and the speculative advancement was conservative and can be accepted. Otherwise the ray can still be advanced from $X$ by $\Delta t = h(X)$. Keinert et al. call this over-relaxed sphere tracing (figure~\ref{fig:relaxation}. Naive sphere tracing advances the ray with only $k=1$, often discovering that the next point redundantly covers the interval just traversed.
Over-relaxation is obviously most useful when the distance estimator in space is overly conservative for distance to the surface along the ray. There is a tradeoff between setting $k$ close to 2 to gain the largest possible step and setting $k$ close to 1 to reduce the number of times the speculative advancement fails. Because success doubles the step distance and failure doubles the number of evaluations of distance estimator $h$ per iteration, over-relaxed sphere tracing can double or halve the performance compared to naive sphere tracing. Keinert et al. observe that $k=1.2$ gives a net 25% reduction in tracing time for the complex urban environment shown in the figure below, which appeared in the Mercury demoscene production The Timeless.
When working with implicit surfaces, operations on distance estimators make it possible to build complex scenes by layering operations on simple primitives. Doing so offers great expressive power and programming elegance at a potentially high performance cost. This is because rays that pass nowhere near a surface must still evaluate the distance function for it.
The parse tree of a complicated set of operations on distance estimators inherently forms a scene graph tree. Performance can be increased by avoiding descending branches of that tree that do not affect the result. This kind of pruning is a common computer science operation and yields order-of-growth increases, unlike the kind of small constant factor that can be obtained by micro-optimizing within a single distance estimator.
The easiest way to prune the tree is to abort computation within a
distance estimator when an early, inexpensive test determines that
the ray is very far from the surface. A simple bounding sphere
accomplishes this and correctly returns a still-conservative
value. For example, let expensiveDE
C
r
X
X
Square root is one of the most expensive operations to perform on a
GPU. We can slightly increase the performance of the slow case of boundingDE
It is also a good idea to branch to the full, expensive distance
estimator if X
Introducing a false silhouette around each object through the bounding sphere can exacerbate this problem. We can separate the bounding radius from the test value to avoid that:
Geometric planes are are large surfaces that can cover a large amount of the screen. A ray caster following any ray near the plane will be constrained to march in steps no larger than the elevation of the ray above the plane, which may be quite small compared to the true distance to the next intersection with the scene.
We can obtain a good speedup for scenes that containh ground planes by performing the (efficient) analytic intersection with the plane and then using the ray marcher only for other surfaces. Spheres are a border case. The distance estimator is extremely fast and yields good estimates along the ray on the interior and far from the sphere. Yet, for rays that pass near the silhouette, an analytic solution is far faster than a ray marched one.
The figures below shows on top a rendering of a scene composed of implicit surfaces. On the bottom is a visualization of the number of invocations of the scene distance estimator at each pixel, where brighter pixels are more expensive. This scene uses the analytic ground plane and bounding sphere optimizations.
Note that in the above figures, silhouettes and surfaces seen at glancing angles are far more expensive than the interiors of objects, which are almost as efficient to trace as the ground plane itself. The screw-like shape and the heightfield are expensive because they have very conservative distance estimators. The silhouettes of shadows are also expensive--those are places where the ray for the visibility query to the light passes close to a surface and thus slows down as it approaches.
Soft shadows, ambient occlusion, and in some cases antialiased pixels and depth of field can all be approximated efficiently using distance estimators because they track how close a point is to all surfaces in the scene. Under analytic ray tracing, these instead must be estimated using tens or hundreds of rays per pixel.
Once objects are surrounded in bounding spheres, traditional spatial subdivision data structures can be applied to them. For example, it is simple to compute a bounding volume hierarchy on the bounding spheres.
However, using a data structure on a GPU in graphics mode is a bit tricky because GLSL executes function invocation by inlining. Thus there is no true recursion for traversing a tree in a natural way. This also makes it hard to implement reflection and refraction at the same surface under Whitted ray tracing, since that creates a tree of rays. (Path tracing, photon tracing, and one of reflection or refraction without the other can all be implemented using loops.)
There are two ways to build an explicit stack to work around the lack of recursion. One is to build the stack global memory using OpenGL 4 image buffer operations. This has awkward syntax and can be slow. Another is to build the stack in local memory. This can be accomplished for recursive rays on recent hardware using code like the listing below. A similar explicit stack can be used to traverse data structures.
Beware that local memory is significantly slower than register memory (where other local variables are stored), although significantly faster than global memory. Local memory is frequently implemented as a portion of the L1 cache. Also beware that on older hardware without local memory support, relative indexing into an array can compile to a chain of conditionals because there are no relative memory operations on arrays.
Some register-only tree traversal strategies developed the introduction of local and global memory write operations are appropriate for older hardware and may find renewed value for performance even on newer targets [Horn2007Stack] [Popov2007Stackless] .
Analytic and numerical roots can be mixed within a scene, as we did for the ground plane. Our simple ray tracer produces depth buffer values compatible with OpenGL's triangle rasterization, so implicit surfaces may be mixed freely with all primary ray strategies.
I wrote these three heavily-documented examples to demonstrate how to set up an analytic and a ray marching renderer. These are on the Shadertoy website, which uses WebGL, a limited version of the full OpenGL supported by G3D. So, the code is a little less efficient and clear than the best case, but has the advantage of running in most current web browsers.
These three examples are by Iñigo Quilez, a graphics professional who's worked at Pixar and Oculus Story Studio. He's long contributed to the demoscene and online graphics community. These are three of the most impressive demos on the Shadertoy website and good (if a bit hard for the novice to parse) examples of how to use the implicit surface techniques described in this document.
Implicit surfaces were first rendered using numerical root finding by Blinn [1982] . They've since been advanced significantly through research by many others. Gomes et al. give a good survey in their book [2009] .
The academic graphics community has largely worked in parallel with the demoscene community, which has been ray tracing these in real time with similar techniques since at least the early 2000's, but which has only recently begun to document and share their methods widely. Quilez has been a leading voice in spreading information about demoscene techniques through his personal website, the Shadertoy website, and recently some public presentations [Quilez20084k] and online videos [Quilez2014Video] . Swoboda's GDC 2012 presentation [2012] overviews the material from this document and then describes some of the performance issues on modern GPUs.
I've simplified some of the details of sphere tracing and distance estimators in this explanation. See Hart's SIGGRAPH course notes [1993] or Liktor's survey paper [2008] for more details on the mathematical constraints and ways of choosing optimal step sizes based on derivatives. See Keinert et al.'s [2014] paper for the current state of the art.
One of the most interesting applications of numerical methods for ray intersection is rendering fractals, which by definition have difficult surfaces to intersect (since they have unbounded area!) [Hart1989Fractal] .
An alternative approach for rendering surfaces that do not admit analytic intersection solutions is to convert them to an approximation that does, for example, a triangle mesh. A popular method for doing so is Lorensen's and Cline's marching cubes algorithm [1987] . Conversely, it is often desirable to convert an existing triangle mesh into a signed distance function for integration with a ray marcher (e.g., to achieve warping and smooth blending). Akleman and Chen [1999] give one implementation of such an operation. Swoboda [2012] discusses both triangle-to-implicit conversion and reverse in the context of modern GPUs.
The particular form of implicit surface defined by distance from a point set is very popular for modeling fluids. Fedkiw worked extensively with these and the book that he coauthored is a good reference [Osher2003LevelSet] . There's an entire subfield of rendering called point-based graphics that directly ray traces such representations, sometimes using implicit surface methods.
[Akleman1999Distance] |
Ergun Akleman and Jianer Chen in 1999 Shape Modeling International, p. 72-79, March1, 1999. Official URL: http://doi.ieeecomputersociety.org/10.1109/SMA.1999.749326 10.1109/SMA.1999.749326 |
[Blinn1982Blobs] |
James F. Blinn ACM Transactions on Graphics 1:3, p. 235-256, ACM, New York, NY, USA, July1982. Official URL: http://doi.acm.org/10.1145/357306.357310 10.1145/357306.357310 |
[Gomes2009Implicit] |
Abel Gomes, Irina Voiculescu, Joaquim Jorge, Brian Wyvill, and Callum Galbraith Springer Publishing Company, Incorporated, 2009. |
[Hart1989Fractal] |
J. C. Hart, D. J. Sandin, and L. H. Kauffman in Proceedings of the 16th Annual Conference on Computer Graphics and Interactive Techniques, p. 289-296, ACM, New York, NY, USA, 1989. Official URL: http://doi.acm.org/10.1145/74333.74363 Free URL: http://graphics.cs.illinois.edu/sites/default/files/rtqjs.pdf 10.1145/74333.74363 |
[Hart1993Course] |
John C. Hart p. 1-16, SIGGRAPH'93 Course Notes: Design, Visualization and Animation of Implicit Surfaces, 1993. Free URL: http://graphics.cs.illinois.edu/sites/default/files/rtis-tr.pdf |
[Hart1996Sphere] |
John C. Hart The Visual Computer 12:10, p. 527-545, Springer, 1996. Free URL: http://bit.ly/1KJcnsO |
[Hoffman1985Blend] |
C. Hoffman and J. Hopcroft The Visual Computer, p. 92-100, 1985. Free URL: http://www.cs.purdue.edu/homes/cmh/distribution/papers/Geometry/geo1.pdf |
[Horn2007Stack] |
Daniel Reiter Horn, Jeremy Sugerman, Mike Houston, and Pat Hanrahan in Proceedings of the 2007 Symposium on Interactive 3D Graphics and Games, p. 167-174, ACM, New York, NY, USA, 2007. Official URL: http://doi.acm.org/10.1145/1230100.1230129 10.1145/1230100.1230129 |
[Keinert2014Sphere] |
Benjamin Keinert, Henry Schäafer, Johann Korndörfer, Urs Ganse, and Marc Stamminger in Proceedings of Smart Tools & Apps for Graphics, Andrea Giachetti (ed.), p. 1-8, EuroGraphics, 2014. Official URL: http://diglib.eg.org/EG/DL/LocalChapterEvents/ItalChap/STAG2014/001-008.pdf Free URL: http://lgdv.cs.fau.de/get/2234 |
[Liktor2008Implicit] |
Gábor Liktor Computer Graphics and Geometry Journal 10:3, 2008. Free URL: http://www.cescg.org/CESCG-2008/papers/TUBudapest-Liktor-Gabor.pdf |
[Lorensen1987MarchingCubes] |
William E. Lorensen and Harvey E. Cline in Proceedings of the 14th Annual Conference on Computer Graphics and Interactive Techniques, p. 163-169, ACM, New York, NY, USA, 1987. Official URL: http://doi.acm.org/10.1145/37401.37422 Free URL: https://www.cs.virginia.edu/johntran/GLunch/marchingcubes.pdf 10.1145/37401.37422 |
[Osher2003LevelSet] |
Stanley Osher and Ronald P. Fedkiw Springer, New York, N.Y., 2003. Official URL: http://opac.inria.fr/record=b1099358 |
[Popov2007Stackless] |
Stefan Popov, Johannes Gunther, Hans-Peter Seidel, and Philipp Slusallek Computer Graphics Forum 26:3, p. 415-424, 2007. Official URL: http://dblp.uni-trier.de/db/journals/cgf/cgf26.html#PopovGSS07 |
[Quilez20084k] |
Iñigo Quilez Talk at NVScene, August22, 2008. Free URL: http://www.iquilezles.org/www/material/nvscene2008/rwwtt.pdf |
[Quilez2008Distance] |
Iñigo Quilez 2008. Official URL: http://www.iquilezles.org/www/articles/distfunctions/distfunctions.htm |
[Quilez2014Video] |
Iñigo Quilez YouTube Video, July14, 2014. Free URL: https://www.youtube.com/watch?v=0ifChJ0nJfM |
[Swoboda2012Procedural] |
Matt Swoboda Talk at the Game Developers Conference, 2012. Free URL: http://directtovideo.wordpress.com/2012/03/15/get-my-slides-from-gdc2012/ |