This is a continuation of my previous post on the graphics pipeline and some fundamental tricks.

Now that we’ve totally mastered how shaders work, let’s use them in some non-traditional ways.

### Reconstructing World Position in the Fragment

Previously we’ve learned these 2 relevant things:

• Getting a Screen space (0,1) uv to sample from a screen buffer (like color, depth, normals, your own RT etc.):

• vert:
`o.uv = ComputeScreenPos(o.position); //where o.position = mul(UNITY_MATRIX_MVP, v.vertex);`

• frag:
`float2 uv = i.uv.xy / i.uv.w;`

• Getting a Screen space (0,1) i.position for the geometry we’re currently drawing:

• vert:
`o.position = mul(UNITY_MATRIX_MVP, v.vertex);`

• frag:
`i.position.xy /= _ScreenParams.xy; //_ScreenParams.xy == width and height in px;`

Now we’ll learn how to make a ray that samples from the pre-existing rendered geometry, in View space. Then we’ll convert it easily to World space and even back to Object space:

The object space would be useful for example if you want to see if a pre-existing fragment would be masked by a cubemap that originates at your current geometry’s position: `float mask = texCUBE(_CubeTex, objPos).a;//(in the fragment)`. With this trick we pretend we have geometry with our current mesh’s centre, but the underlying geometry’s surface, in object space.

Since we also have our current geometry’s Screen space position, we can convert that as well to view space and world space and object space. So you can compare or blend with the equivalent from the underlying pre-existing geometry we sampled.

### Solid Texturing

With the worldspace position (of either the pre-existing geometry’s fragment or the current one) you we can do what is called Triplanar Texture Projection or Solid Texturing. This is what you most often find on procedural meshes you obtain from voxels, which don’t have sensical UV coorinates, so you just say “everything on top is grass, everything on the sides is ground”.

Here’s how I did that:

Now profit! You can similarly sample from a bumpmap and warp the normals for the lighting step.

You can find the Simplex Noise (and other noises) on GitHub.

## Command Buffers

Before we look at more advanced stuff like deferred lighting and post FX, let’s briefly look at the `CommandBuffer`. Unity made plenty of documentation and examples for it, so I’ll just explain what the key points mean.

Note: Notice that a CommandBuffer sends a program to the GPU that runs every frame, so unless you need to change the program or update the data structures from the C# side, you shouldn’t clear and run the Draw or Blit commands every frame from C#.

The Command Buffer allows you to insert render materials and geometry at any stage of Unity’s rendering pipeline. You set that with `CameraEvent.m_BeforeAlpha` etc.. To see what’s going on, go to Window > Frame Debugger and visualise your changes as well as see the default pipeline at work.

Note: Unity will hijack some (parts) of these render targets at certain points in the pipeline, and you won’t have access. For ex if you want any of your custom deferred shaders to write to the built in emission buffer, and do your own lights, you must use the `CameraEvent.AfterLighting` stage.

### Draw a mesh with the CB:

This is also one of the ways to implement your own lighting. Draw a cube or a sphere, for each light, with your deferred lighting material. This is what unity does internally, and it draws spheres. It ain’t the most efficient way to do deferred lighting, but it’s easy because the mesh defines the volume and provides the matrix.

### Blit a texture with the CB:

The `Blit` works by sending a fullscreen quad to the GPU, so that in the fragment you’ll have one pixel for every Screen space pixel, so you can do post processing or deferred processing.

That `_FullscreenRT` is a texture defined in the shader of your `fullscreenMaterial`:

This would also be the pro way of doing deferred lighting. You’d have a list of lights, you’d traverse these pixels once, and light them. A common AAA method is to use a “Tiled Frustum” where you divide this texture in a grid, and have each tile reference only the lights that affect it. And you also do this grid depth wise, so lights that are far away are drawn with simpler math.

### DrawProcedural with the CB:

This method draws custom data buffers with the vertex and triangles paradigm, or just with glPoints.

Remember back when I said you can have custom DirectX11 vertex shaders that run on data buffers? (`FragInput vert (uint id : SV_VertexID, uint inst : SV_InstanceID)`) This is how you run them in Unity with C#. In this case I defined a quad with 6 independent vertices, and the vertex shader will run 6 times for each particle in `particleCount`.

Alright, that’s enough for now. I’ll make a separate post explaining more advanced stuff like implementing lighting, and running Compute Shaders and managing memory and data structures.

# Graphics Pipeline fundamentals (Unity, OpenGL)

In my graphics programming internets travels I realized a lot of people find it hard either to understand or to clearly explain the graphics pipeline and some of the tricks you can do.

The general high level theory is simple, but the API naming or hidden math makes it tough to get in practice. It’s confusing or incomplete even in academic material or nvidia’s GPU Gems etc.

It’s gonna take a minute, but I’m going to explain what the ominous they don’t tell you and what they’re confusing you with.

In here and the next post, I’ll walk you through for-realsies how a mesh (or a data buffer) gets converted throughout the graphics pipeline. And I’ll give some sample code for stuff like procedural geometry, reconstructing worldspace position, or using custom data buffers.

This tutorial has some advanced topics but is still accessible to newbs. It however assumes you’ve tinkered with shader code before, and know of basic concepts like how renderers have triangle interpolation.

## The Mesh

First off, a Mesh is a class or structure that stores various coordinate arrays in Object Space:

• An array of vertices.
• ex: A quad can have least 4 verts: `{-1,1,0.5},{1,1,0.5},{1,-1,0.5},{-1,-1,0.5}` (clockwise notation, from top left)
• An array of triangles that holds index values from the vertex array. Every 3 values represent a triangle.

• in our ex with 4 verts you’ll reference each vertex once or twice: `{0,1,2,2,3,0}`

• Note: If you need to split your quad mesh’s triangles apart in the shader (move them independently) for some vfx, you will have to create 6 verts instead of 4.
• An array of normals. Each vertex has a corresponding normal (e.g. `Vector3(-1,1,-1)`).

• For our quad example we will have only 1 normal per corner!

• Note: Same as for triangles, if you need to process your mesh normals to tweak its smoothing, you will need more vertices (to double, triple etc. the normals)! Meshes you get from artists will (should) have this covered. Depending on the mesh format you can get multiple normals referencing the same vertex. (similar to how the triangles array works)
• An array of UVs (texture coords). These start at (0,0) for top left and end at (1,1) in bottom right.

• Minor Note: If you didn’t know how to get your shader to respond to the Unity Inspector’s material texture Scale and Offset values, you need to declare a `float4 _MainTex_ST;` next to `sampler2D _MainTex`. The `_MainTex_ST.xy` is Scale.xy, and `_MainTex_ST.zw` is Offset.xy.

## The Graphics Pipeline

Terminology here is loose with many synonyms. I’ll throw all of them in and clarify.

A mesh goes from Object Space (or Model Space) to World Space to Camera Space (or View Space or Eye Space) to Projection Space (Clip Space) to NDC Space (Normalized Device Coordinates) to Screen Space (or Window Space).

You probably heard some new terms just now. The main reasons the pipeline is more confusing than you hear in the big picture concepts, is that some math operations make more sense when split up, are easier in certain coordinate systems (for gpu matrix multiplication), and it makes the pipeline more flexible.

This is how virtually any usual graphics pipeline works, but I’m specifically writing examples in CG1 terminology, with the UnityCG library in particular:

Note: the matrices I’m listing are used starting with the Vertex program, and commonly some are merged in game engines (as a math shortcut); in Unity it’s the first 3 that are merged into `UNITY_MATRIX_MVP`.

• 1) Model->World is the World matrix (rotates, translates, scales vertices to their (Unity) world position). Unity and OpenGL call it the Model matrix and it’s merged in `UNITY_MATRIX_MVP`… But a less sadistic way to name it, for newcomers, would have been `MATRIX_WVP` not `_MVP`.

• 2) World -> Camera is the View matrix. This just converts the vert’s coords so they are relative to the camera. They are in -1 (bottom-left) to 1 (top-right), with the camera at (0,0) and z between (-1 (close) and 1 (far)).

• 3) Camera -> Clip space is the Perspective (or Orthographic (isometric)) Projection Matrix. The projection matrix doesn’t really do the projection divide here though. It builds up the vertex for the next step (frustum culling, and then perspective projection divide).

Here’s how the Projection Matrix multiplication looks in Unity (it’s the OpenGL standard):

 focalLength 0 0 0 x 0 focalLength /aspectRatio 0 0 * y 0 0 - (farPlane + nearPlane) /(farPlane - nearPlane) -(2 * farPlane * nearPlane) /(farPlane - nearPlane) z 0 0 -1 0 1

The Projection Matrix above incorporates FoV, and the near and far planes.
`focalLength = 1 / tan(FoV/2);`
`aspectRatio = windowHeight / windowWidth;`
Check out slide #6 here for a nice visual representation.

So the `o.pos` in your vertex function does not hold values between 0 and 1 or -1 and 1. It’s actually the result of the above matrix multiplication. [0,0] is at the centre of the camera (unless you have a fancy off-centre perspective matrix), but the values beyond that depend on the near/far plane and the camera size / aspect ratio.

Note:

• Unity will convert the above matrix to the appropriate API it's using when it compiles the shaders (e.g. DirectX).
• You can see or set the projection matrix with this editor script.
• If you set custom projection matrices from C#, use the GL.GetCPUProjectionMatrix which converts the projection matrix you give it, to the appropriate Graphics API being used (e.g. directx or opengl).

Note: Z is actually converted to 1/z (the inverse, or reciprocal of z). Z is not linear but 1/z is (this allows linear interpolation, and helps precision). Here’s more on why 1/z is used, and on the depth precision.

Before we continue I must point out that up until now (Clip Space (or Projection Space)) all the coords spaces were in what is called Homogenous space or Homogenous coordinates. In “4D”: `vertex.xyzw`.

We need the `w` because GPUs work with matrix multiplications, but matrix transforms are rotation, scale, and translation – which is an addition. So from our matrix math we know we can get addition out of matrix multiplication if we add a dimension:

 1 0 0 translateX x 0 1 0 translateY * y 0 0 1 translateZ z 0 0 0 1 1

Above we have a simple Translation matrix, multiplied by our `vertex.xyzw`. So normally `w` is 1 in a vector. Also, obviously you can combine multiple matrices into one (e.g. don’t just have 1s and 0s in our matrix above, maybe also include rotation and/or scale, or, say a `UNITY_MATRIX_MVP`).

After the Projection, w will be used for the perspective divide. The w will become z with this kind of matrix multiplication:

 1 0 0 0 x 0 1 0 0 * y 0 0 1 0 z 0 0 1 0 1

And we’re back:

Note: The following 3 steps are something that happens automatically, on the GPU (OpenGL and DirectX), after the Vertex program.

• 4) Frustum Culling. This is not a matrix, it’s a sampling operation. Up above we got “from Camera Space to Clip space by using the Projection matrix”. This “Projection” matrix contained the camera aspect ratio and nearPlane and FarPlane which are for ex 16:9 and [10,1000]. So the current vertex is discarded if any of its x, y, or z coordinates don’t fit the frustum (frustom culling).

Note: if the vertex being culled is part of a triangle (or other primitive (quad)) the rest of the vertices of which are within the frustum, OpenGL reconstructs it as one or more primitives by adding vertices within the frustum.

• 5) Clip -> NDC space is the perspective divide (by `vertex.w`) and normalization of the frustum-based coordinates. The frustum gets this perspective divide by w (distortion) which is not affine. Now the term we’re at is we’re in Normalized Device Coordinates.
• Even though we still have a w (and it’s normalized to 1), we’re not in homogenous corrds any more. If you were confused why the vertex2frag out structure’s `.pos` attribute is a vector4, it’s because the perspective divide happens just after the vertex program.
• In OpenGL View and NDC space coordinates go between (-1,-1,-1) at bottom-left-back and (1,1,1) at top-right-forward. In Direct3D the z goes between 0 and 1 instead of -1 to 1.
• The Z coordinate here goes into the Depth buffer, and/or encoded into the DepthNormals buffer. The depth in buffers is [0,1].
• Since we’ve done our perspective projection, the Depth buffer is not linear. So you can’t just do linear interpolation to fetch a depth point (I’ll explain later when I get to ray examples).

Note: Most literature would have garbled these last steps up into one and called it the “Projection matrix” or the “Projection step”, “where you map 3D coords to 2D”, which would have really confused you later when you’d be programming.

• 6) NDC space -> Screen space (Window space) (rasterization): Still after the Vertex and before the fragment, the GPU converts coords to viewport pixels. In other words, it transforms from Normalized Device Coordinates (or NDC) to Window Coordinates (raster pixels (fragments)). The pixel coordinates are relative to the lower-left corner of the screen [0,0], growing towards the upper-right [1920,1080].

The formula is:

xscreen = (xndc+1) * (screenWidth/2)+x,

yscreen = (yndc+1) * (screenHeight/2)+y.

The z is still between 0 and 1 from before.

So to clarify: the `o.pos` inside your vertex function is in Clip Space coordinates (e.g. {x:20, y:10, z:350}), and “the same” `i.pos` in your fragment function is in Screen coordinates (ie x and y in pixels and z in NDC).

Now if you want extra coords passed to the fragment in Screen space, you need to do the conversion to Screen space yourself in the vertex program (the auto conversion only applies to the `SV_POSITION`). Here’s an example:

At this point our `o.pos` is converted to Clip space by `MVP`. So `ComputeScreenPos` is a cross-platform unityGC function that takes the Clip Space coordinates of your vertex and does all that automatic stuff described above that happens to `o.pos` (up until NDC), setting the right w param for you to perspective-divide with yourself once you get to the fragment. Then your uv will be within [0,1] in NDC space (0 is bottom left).

This is the divide in the fragment: `float2 screenUV = i.uv.xy / i.uv.w;`.

And if in the frag for some reason you would want `screenUV.uv` to be the same as `i.pos` (in pixels), you’d also need to multiply `screenUV.xy` by window width. But normally we do the opposite: we divide `i.pos.xy` by screen width and height to get (0,1) values.

### The Tessellation and the Geometry Shaders

There’s actually more things between vert and frag (and before the interpolation that the frag is run on):

• the Hull program: is run once every vertex and it requites as input the entire triangle or quad or line or point (whatever type of weird geometry data you use). It does some data structure conversions and some magic.
• the Tessellation program: here you write what technique the GPU should use to subdivide your geometry. The best is usually a fractal one, then to also fade it by distance to camera. This function does not actually subdivide anything, it calculates positive barycentric coordinates.
• Note: you can technically subdivide triangles in the Geometry function below, but that’s a more general purpose function that can’t process/subdivide as much and as fast as here.
• the Domain program: Domain means triangle (or quad etc). It takes the 3 vertices of your triangle, and the one barycentric coordinate for the tessellation. It is run once for each barycentric point and actually spits out new vertex data.
• the Geometry program. This optional step is where you can use affine transformations to manipulate vertices or create more vertices within a triangle.

[UPDATE:] Luckily I don’t have to get any deeper into Tessellation because it seems since I wrote this article, Jasper Flick of CatlikeCoding has done a very nice patreon’d writeup. Check it out because there’s some magic to understand about how to link each stage so they all get what they expect.

With the geometry program you can also for ex have a mesh with verts that are just points, and use a Geometry program to spawn 4 quad verts around and instead of those points/vertices and generate billboards. Here is an example of just that.

Here’s a subset of that shader. I’ll explain the key points.

In DirectX11 you can actually do the points-to-quads conversion trick directly in the vertex program by manipulating custom data buffers.

Here’s how I did that for my particle sculpter:

The geometry shader example was calculating the camera space directions and then creating the new offsetted vertex point’s position based on those. What I did above was I treated my `quad_vert` as if it was in camera space (instead of object space) thus automatically aligning it to the camera, and then converting it with the matrix from Camera (or View) to World space, and then from World to Object space. (even though the C# struct was of a stantard object space quad). So don’t be afraid to play with the space matrixes kids!

Now we’ve finally entered the Fragment program before which the GPU has also done linear and perspective-correct interpolation on the vertices (and on texcoords, color etc.) to give us the pixel positions for the triangles.

Keep in mind that here you shouldn’t multiply any point by a matrix, unless you make it homogenous (w=1 and also reverse the perspective divide (ie take it back to clip space)).

The classic fragment function is like this:
`fixed4 frag(v2f_struct i) : COLOR //or SV_Target`
This means this shader has 1 Render Target and it returns a rgba color.

The render target can be changed from C# with the Graphics or CommandBuffer API. (I’ll show that later)

In the deferred renderer, you can actually have (the option to output to) multiple render targets (MRT). The function structure changes, we use (multiple) out parameters:

Now let’s continue from my vertex shader snippet from further above where I wanted a screenspace ray and did: `o.uv = ComputeScreenPos(o.pos);`.
Since it didn’t get a perspective divide (because it couldn’t have been interpolated to the fragment if it had), we need to do that now.
Fragment:

This `_CameraGBufferTexture0` is from the deferred renderer, but you can set custom textures to a shader using the Graphics or CommandBuffer Blit function, even in the forward stage.

My next post will apply some of this knowledge in some common but unconventional uses for shaders.

After this hopefully everyone can start researching and understanding more fun stuff like clouds, atmospheric scattering, light absorbtion, glass caustics, distance fields, fluid simulations, grass, hair, skin etc..

PS: There’s quite a bit of in-depth pipeline stuff here. If anything’s unclear, or I happened to cock anything up, let me know.

1. CG = “C for Graphics” programming language.

# Gamedev homework from 2012

I found 3 of my 1st semester 1-week assignments, from 2012, and they still work! An engine, a pathfinding alg, and a renderer!

# It Lives!

Right now the About post is the only thing filled in, but I’ll get a postin’ asap!

Finally set up and compiled my Github, Jekyll, Markdown, Hpstr, Static dev blog with many bells and whistles!

It took a little effort to bend jekyll and the theme to my will, but now it’s awesome.