You probably know that you can assign and write to render textures and 3D textures and even custom data buffers like RW Structured Buffers from Compute Shaders. These can hold spatial information trees, distance fields, flow maps, points, meshes etc. (For more reading look up UAVs (Unordered Access Views), SRV (Shader Resource Views), and shader Resource Binding registers.)

But with shader model 5.0 and d3d11 you can now do more or less the same in regular vertex fragment shaders. This is great because it allows you to easily bake data onto mesh textures or atlasses while they’re being rendered to screen anyway.

It’s got such a cool range of possibilities, that you can even do dumb simple stuff like sample the shader’s color under your mouse cursor, from inside the shader, and write it from inside the shader to a struct that you can simultaneously read back in C#, with no need to iterate through pixels or have access to any textures or anything like that.

So I’m’a show you how to UAV in unity shaders.

Number 1: Constructing Render Targets

Your shader will have a RWTexture2D (or even 3D if you wanna get fancy and bake some point clouds):

1
2
3
4
5
6
7
8
9
10
CGINCLUDE
#ifdef _ALSO_USE_RENDER_TEXTURE
	#pragma target 5.0
	uniform RWTexture2D<half4> _MainTexInternal : register(u2);
	
	sampler2D sp_MainTexInternal_Sampler2D;
	float4 sp_MainTexInternal_Sampler2D_ST;
#endif
//... other stuff
ENDCG

The register(u2) represents which internal gpu registrar to bind the data structure to. You need to specify the same in C#, and keep in mind this is global on the GPU.

Now you can use this _MainTexInternal as if it was a 2D array in your shader. Which means it will take ints as coords like so _MainTexInternal[int2(10,12)] - which means it won’t be filtered / smooth. However, you can form C# assign this same RenderTexture as a regular Sampler2D texture in the material/shader, with material.SetTexture as you would with any other texture, and then you can read from it with regular UVs.

So now let’s create that render texture in C# and assign it to the material. Do this in a ConstructRenderTargets() and call it from something like Start().

1
2
3
4
5
6
7
8
9
10
11
12
if(m_MaterialData.kw_ALSO_USE_RENDER_TEXTURE)
{
	m_paintAccumulationRT = new RenderTexture(rtWH_x, rtWH_y, 0, RenderTextureFormat.ARGB32, RenderTextureReadWrite.Linear);// Must be ARGB32 but will get automagically treated as float or float4 or int or half, from your shader code declaration.
	m_paintAccumulationRT.name = _MainTexInternal;
	m_paintAccumulationRT.enableRandomWrite = true;
	m_paintAccumulationRT.Create();
	
	m_MaterialData.material.SetTexture(m_MaterialData.sp_MainTexInternal, m_paintAccumulationRT);
	m_MaterialData.material.SetTexture(m_MaterialData.sp_MainTexInternal_Sampler2D, m_paintAccumulationRT);
	Graphics.ClearRandomWriteTargets();
	Graphics.SetRandomWriteTarget(2, m_paintAccumulationRT);//with `, true);` it doesn't take RTs
}

On that last line above, note the nuber 2. That’s the register index from the shader. So register(u2) corresponds to 2 here.

Number 2: Constructing Data Buffers

Let’s just create an array of some arbitrary MyStruct, that will exist in both the shader and in C#.

1
2
3
4
5
6
7
8
9
10
11
12
13
CGINCLUDE
#ifdef _ALSO_USE_RW_STRUCTURED_BUFFER
	#pragma target 5.0 // no need to re-declare this directive if you already did it 
	
	struct MyCustomData
	{
		half3 something;
		half3 somethingElse;
	}
	uniform RWStructuredBuffer<MyCustomData> _MyCustomBuffer : register(u1);
#endif
//... other stuff
ENDCG

So RWStructuredBuffer<MyCustomData> is our buffer. It has some limits of what can go inside, and it’s not 100% the C standard. But it’s still really useful and can hold tons of entries or just a few (as much as a texture, or as much as memory allows).

Now let’s construct the Compute Buffer in C#. Do this in a ConstructDataBuffers() and call it from somehting like Start().

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
//Needs to be defined the same as in the shader.
public struct MyCustomData
{
	Vector3 something;
	Vector3 somethingElse;
}
MyCustomData[] m_MyCustomDataArr;

void ConstructDataBuffers()
{
	if(m_MaterialData.kw_ALSO_USE_RW_STRUCTURED_BUFFER)
	{
		int memalloc = 24;
		m_MyCustomComputeBuffer = new ComputeBuffer(bufferLength, memalloc);//stride == sizeof(MyCustomDataStruct)
		Graphics.SetRandomWriteTarget(1, m_MyCustomComputeBuffer, true);
		m_MaterialData.material.SetBuffer(m_MaterialData.sp_MyCustomBuffer, m_MyCustomComputeBuffer);
		
		m_MyCustomDataArr = new MyCustomData[bufferLength];
	}
}

If y’all know how to iterate through memory, you know what that memalloc value is for. It’s the size of the struct in bytes. A float3 is 12 bytes, and the structure I created in the shader has 2 half3’s which equal to 1x float3 :) In the C# side, we don’t have half3s but we can define it as Vector3 which is resolved to float3 and is bound as a half3 on the GPU in our case. If you’re worried about conversion, there’s a Mathf.FloatToHalf() function.

Now to do stuff with the data of this struct from C#. Do this in Update() if you want/need, it’s fine.

1
2
3
4
5
6
7
8
9
10
11
12
13
void ProcessData(){
	if(m_MaterialData.kw_ALSO_USE_RW_STRUCTURED_BUFFER)
	{
		//here's how to read back the data from the shader
		m_MyCustomComputeBuffer.GetData(m_MyCustomDataArr);//obviously this way you will loose all the values you had in the array beforehand
		
		m_MyCustomDataArr[10].something = new Vector3(1,0,1);
		
		//now set it back to the GPU
		m_MyCustomComputeBuffer.SetData(m_MyCustomDataArr);
	}

}

Now to do stuff with this data buffer on the shader side:

1
2
// somewhere in vert or in frag or in a geometry function or wherever:
_MyCustomBuffer[my1DCoorinate].somethingElse = half3(0,1,0);

Done! Now go do cool stuff. And show me.


One of the things I did with this technique was a VR mesh painter where I have a custom SDF (signed distance field) volume to represent a 3D spray volume function intersecting with the world position on the fragment of the mesh I’m drawing on. You can also atlas your UVs so that you can have your RT as a global atlas and paint multiple objects to the same RT without overlaps.

You also need to realize that the objects are painted from the PoV of the camera, and so it might not hit fragments/pixels that are at grazing angles if you use say a VR controller, and you’re not aiming down the camera’s view direction. This results in sometimes grainy incomplete results depending on the mesh and angle. But you can fix that by doing the rendering with a different camera mounted to your hand and so you can render the painting passes of your obects only, with ColorMask 0, invisibly, to that camera. (or just using compute shaders isntead).

You can also do this whole thing but with Command Buffers and DrawMesh instead of Graphics.Set… I’ve done this a couple of times using Compute Shaders, but with the 5.0 vert frag shaders I had issues tricking unity’s API to work last I tried.

So perhaps another blog post should be about how to set up command buffers and compute shaders, and how to do something cool like turn a mesh into a pointcloud and do cool ungodly things to it :)

It’s called Voice of God, and the player shapes the ground. We won 3/5 awards in the ITU-Copenhagen site!


Among the prizes we got was a ham and NGJ tickets!

The theme was WAVES this year and we used your voice to make waves in the ground. Roughtly, your pitch (the frequency) is mapped to the screen from left to right (low pitch to high), and your loudness affects the amplitude.

You’re effectively controlling the rolling character via a winamp visualization. The cleaner your sound and the smoother your vocal range, the more effective you are.

We used unity 2D tools, the Animator, a rigidbody pool, and sound basetones and overtones merged into a world position function.

Voice of God itch.io page here. Github repo with build here or here. And the original GGJ submission’s page is here.

VoG Zeus yelling
Gameplay gif

It was a lot of fun but punishing to our vocal chords after the 2 days of testing and development :)

A good chunk of the level.
A good chunk of the level. (4 screens wide)

I’ll briefly explain how shader programs are created and invoked under the hood, with shader variant compilation.

(with specifics for unity3d and Standard, Uber, Custom shaders)

Here’s a few high level scenarios you’d experience:

  • You have a material with like 5 textures, what happens if you remove some of them at runtime? Does it become more efficient?

  • You want to switch a shader from Opaque to Transparent at runtime.

  • You want to create a material with any specific shader variant, programatically.

And here’s how things actually work:

If you examine StandardShaderGUI.cs (get it here), you see that a whole bunch of stuff happens when you change a material setting. Things to look for: SetKeyword, MaterialChanged, SetupMaterialWithBlendMode, Shader.EnableKeyword(), Shader.DisableKeyword().

Shaders are text files that are typically full of compile time directives like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
CGPROGRAM
#pragma compile_type __ _NORMALMAP //<- you define the keyword here
// the "__" is a wildcard for "OFF by default"
// can also have #pragma compile_type _DO_STUFF_OFF _DO_STUFF_ON _DO_OTHER_STUFF
// Replace "compile_type" with either "shader_feature" or "multi_compile". More on that further down.

#if defined(_NORMALMAP))
	foo = tex2dlod(_NormalMap, uvw);
	//or
	bar = tex2D(_NormalMap, uv);
	//etc.
#else
	//something else, or no else at all
#endif

//...
ENDCG

When you build the game, the engine goes through all your Materials and figures out what combination of settings they use, and parse and compile shader programs for each combination necessary.

Note: Normally the HLSL or CG you write in an engine is interpreted and compiled through a bunch of internal engine compiler layers and frameworks before it becomes “real” HLSL or “real” GLSL, Vulkan etc., according to the platform the build is for (pc/console etc). This is why for example Unity shaders have “Surface Shaders” (which is Latin for “fake-shader”). The CG in CGPROGRAM is real but is used as a middle language.

This means that if in your scene you have a material with the Standard Shader with a _NormalMap texture, and one without, 2 shader variants get created and are available at runtime. But this is because the Editor GUI C# script has noticed you left the texture slot empty. If you write your own shader without a custom editor script, it will always include the calculations for the texture (and treat it as black if left empty).

When you’re working with the Standard or Uber shader, and remove a texture at runtime from C#, it’s the same, the shader just uses values of 0 when it samples that textue. So you will need to also tell the Standard Shader or Uber Shader etc. itself to actually be swaped out on the GPU for another variant: myMaterial.EnableKeyword("_NORMALMAP");, or globally for every material: Shader.EnableKeyword("_NORMALMAP");.

Same if you made a copy newMat.CopyPropertiesFromMaterial(oldMat);, or just a newMat = new Material(Shader.Find("Standard")); programatically. By default all keywords are disabled. Or rather, if you see the multiple keywords on the #pragma, the first one is the one used by default, and usually it would be a wildcard __ or a _NORMALMAP_OFF, before the _NORMALMAP or _NORMALMAP_ON. If you’re dealing with an _OFF _ON setup, then in addition to enabling, you also should DisableKeyword() for the _OFF one.

So how do you swap shader types at runtime? (transparent to opaque, tessellated to not, etc)

Solution 1:

So one not-so-great strat would be to include in your scene the material variations you’ll need in the build. A better but still not super maintainable one would be including your variant materials in the Resources folder, which always gets included in the build.

Changing Shader Modes:

An artist once pointed out to me that she can animate the shader type of a material, from Opaque to Transparent. That’s funny, Unity, because you can’t do that (that easily)! Again, if you look at StandardShaderGUI.cs, there’s a lot of keywords and flags being set for switching between modes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
switch (blendMode)
{
	case BlendMode.Opaque:
		material.SetInt("_SrcBlend", (int)UnityEngine.Rendering.BlendMode.One);
		material.SetInt("_DstBlend", (int)UnityEngine.Rendering.BlendMode.Zero);
		material.SetInt("_ZWrite", 1);
		material.DisableKeyword("_ALPHATEST_ON");
		material.DisableKeyword("_ALPHABLEND_ON");
		material.DisableKeyword("_ALPHAPREMULTIPLY_ON");
		material.renderQueue = -1;
		break;
	case BlendMode.Cutout:
		material.SetInt("_SrcBlend", (int)UnityEngine.Rendering.BlendMode.One);
		material.SetInt("_DstBlend", (int)UnityEngine.Rendering.BlendMode.Zero);
		material.SetInt("_ZWrite", 1);
		material.EnableKeyword("_ALPHATEST_ON");
		material.DisableKeyword("_ALPHABLEND_ON");
		material.DisableKeyword("_ALPHAPREMULTIPLY_ON");
		material.renderQueue = 2450;
		break;
	case BlendMode.Fade:
		material.SetInt("_SrcBlend", (int)UnityEngine.Rendering.BlendMode.SrcAlpha);
		material.SetInt("_DstBlend", (int)UnityEngine.Rendering.BlendMode.OneMinusSrcAlpha);
		material.SetInt("_ZWrite", 0);
		material.DisableKeyword("_ALPHATEST_ON");
		material.EnableKeyword("_ALPHABLEND_ON");
		material.DisableKeyword("_ALPHAPREMULTIPLY_ON");
		material.renderQueue = 3000;
		break;
	case BlendMode.Transparent:
		material.SetInt("_SrcBlend", (int)UnityEngine.Rendering.BlendMode.One);
		material.SetInt("_DstBlend", (int)UnityEngine.Rendering.BlendMode.OneMinusSrcAlpha);
		material.SetInt("_ZWrite", 0);
		material.DisableKeyword("_ALPHATEST_ON");
		material.DisableKeyword("_ALPHABLEND_ON");
		material.EnableKeyword("_ALPHAPREMULTIPLY_ON");
		material.renderQueue = 3000;
		break;
}

So you’ll have to make a script with the above code in it, and provided you have the correct shader variants included at runtime, you’ll be able to switch a material from Opaque to Transparent or Cutout or Fade or whatever.

This code also works for the Uber shader BTW.


JAckie Chan Adventures - Uncle 'One more thing!'
One more thing!

There are actually 2 ways to compile different shader versions. When I wrote compile_type in #pragma compile_type __ _NORMALMAP, I lied. I meant:

  • #pragma shader_feature __ _NORMALMAP or

  • #pragma multi_compile __ _NORMALMAP.

(There’s actually more stuff (like multi_compile_fwdadd or UNITY_HARDWARE_TIER[X]) (read here)).

1). “Shader Feature” means when you build the game, the engine will only include the contents of a #if _SOMETHING #endif if a material with that keyword enabled, is used in the scene(s). Unity Standard Shader source uses shader_feature!

2). “Multi Compile” means that every possible combination of the keywords you defined in the #pragma, will be generated in shader variant files and actually included in the build. This can explode to a big number & build time! (don’t use unless on purpose, or just a small custom shader you’ve made)

Solution 2:

The proper-ish way to include other variants for shader_feature shaders is to use the Graphics Settings where in the Preloaded Shaders array you can drag in a preload shader collection file (make it rightclicking in the project panel) of the shader variant(s) you want.

I said “ish” because it’s not technically the proper way: it force-loads it in all scenes, but Unity isn’t clear on what else to do [details here] (they probably don’t have a proper solution implemented, it’s Unity after all ;) - the engine where one indie programmer can do a better job at an entire rendering pipeline in less time…).

Conclusion:

Use shader_feature, use the Resources folder or a Shader Preload Collection, and only use multi_compile sparingly.

Note: Protip: [ShaderVariantCollection] docs: “records which shader variants are actually used in each shader.”

This is a continuation of my previous post on the graphics pipeline and some fundamental tricks.

Now that we’ve totally mastered how shaders work, let’s use them in some non-traditional ways.

Shader Shenanigans

Reconstructing World Position in the Fragment

Previously we’ve learned these 2 relevant things:

  • Getting a Screen space (0,1) uv to sample from a screen buffer (like color, depth, normals, your own RT etc.):

    • vert:
      o.uv = ComputeScreenPos(o.position); //where o.position = mul(UNITY_MATRIX_MVP, v.vertex);

    • frag:
      float2 uv = i.uv.xy / i.uv.w;

  • Getting a Screen space (0,1) i.position for the geometry we’re currently drawing:

    • vert:
      o.position = mul(UNITY_MATRIX_MVP, v.vertex);

    • frag:
      i.position.xy /= _ScreenParams.xy; //_ScreenParams.xy == width and height in px;

Now we’ll learn how to make a ray that samples from the pre-existing rendered geometry, in View space. Then we’ll convert it easily to World space and even back to Object space:

1
2
3
4
5
6
7
//Vertex shader
{
	//...
	o.ray_vs = float4(mul(UNITY_MATRIX_MV, v.vertex).xyz * float3(-1,-1,1),1);
	//float3(-1,-1,1) just flips the coordinates (the canvas).
	//...
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
//Fragment shader
{
	//...
	//ray in view space
	i.ray_vs = i.ray_vs * (_ProjectionParams.z / i.ray_vs.z);
	//_ProjectionParams.z is camera's far clip plane. E.g. 3000.
	//i.ray_vs.z is between -1 and 1.
	
	//our reconstructed View position for whatever is already on the screen under the fragment we're about to draw:
	float4 vpos = float4(i.ray_vs.xyz * depth,1);
	//depth is between 0 and 1.
	//the result is (-1,1) from bottom left front, with camera at centre (0,0,0).
	
	//and now we have our unity world coordinates for this fragment!
	float3 wpos = mul(unity_CameraToWorld, vpos).xyz;
	//you can even get the object space if you want.
	float3 opos = mul(unity_WorldToObject, float4(wpos,1)).xyz;
	//these are builtin unity matrices which are the inverse of the M and V matrices.
	
	//...
}

The object space would be useful for example if you want to see if a pre-existing fragment would be masked by a cubemap that originates at your current geometry’s position: float mask = texCUBE(_CubeTex, objPos).a;//(in the fragment). With this trick we pretend we have geometry with our current mesh’s centre, but the underlying geometry’s surface, in object space.

Since we also have our current geometry’s Screen space position, we can convert that as well to view space and world space and object space. So you can compare or blend with the equivalent from the underlying pre-existing geometry we sampled.

Solid Texturing

With the worldspace position (of either the pre-existing geometry’s fragment or the current one) you we can do what is called Triplanar Texture Projection or Solid Texturing. This is what you most often find on procedural meshes you obtain from voxels, which don’t have sensical UV coorinates, so you just say “everything on top is grass, everything on the sides is ground”.

procedural caves
My procedural caves with solid texturing (left), or on my deferred decals (right).

Here’s how I did that:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
//Fragment
{
	//Get your Normals either from the mesh, or from a GBuffer of the pre-existing stuff.
	half4 normal = ...
	
	//Get your world position. Again, either of your geometry or the pre-existing stuff in the buffer.
	float3 wpos = ...
	
	//Our UVs are the world position. _LocationBump is just a float or float3 offset. Helps further move the noise seed.
	half3 uv = wpos.xyz + _LocationBump;
	//To make the canyon stratification wavy texture, I used Curled Simplex Noise to distort the texture lookup (the uvs). 
	half3 uv_curled = uv + uv * normalize(get_Curl(uv / _Freq, _Seed)) / _Amplitude;
	
	//Each plane has its own texture. Texture sampling repeats (unless set to Clamp), so we're fine sampling a world position which is protentially way larger than 1.
	float2 coord1 = i.uv_curled.xy *_MainTexZ_ST.xy + _MainTexZ_ST.zw;
	float2 coord2 = i.uv.zx *_MainTexY_ST.xy + _MainTexY_ST.zw;// we don't want to curl the top
	float2 coord3 = i.uv_curled.zy *_MainTexX_ST.xy + _MainTexX_ST.zw;
	
	float3 blend_weights = abs(wnormal);
	//This is how you tweak the width of the transition zone;
	blend_weights = blend_weights -_BlendZone;
	//Make the weights sum up to a total of 1
	blend_weights /= ((blend_weights.x + blend_weights.y + blend_weights.z)).xxx;
	
	//Sample our 3 textures. Tex 1 and 3 can be the same texture (in my canyon case).
	fixed4 col1 = tex2D(_MainTexZ, coord1.xy);
	fixed4 col2 = tex2D(_MainTexY, coord2.xy);
	fixed4 col3 = tex2D(_MainTexX, coord3.xy);
	
	//Now blend the colours
	half3 blended_color = (
				col1.xyz * blend_weights.xxx +
				col3.xyz * blend_weights.zzz
				) 
				+ col2.xyz * blend_weights.yyy;
	//...
}

Now profit! You can similarly sample from a bumpmap and warp the normals for the lighting step.

You can find the Simplex Noise (and other noises) on GitHub.

Command Buffers

Before we look at more advanced stuff like deferred lighting and post FX, let’s briefly look at the CommandBuffer. Unity made plenty of documentation and examples for it, so I’ll just explain what the key points mean.

Note: Notice that a CommandBuffer sends a program to the GPU that runs every frame, so unless you need to change the program or update the data structures from the C# side, you shouldn’t clear and run the Draw or Blit commands every frame from C#.

The Command Buffer allows you to insert render materials and geometry at any stage of Unity’s rendering pipeline. You set that with CameraEvent.m_BeforeAlpha etc.. To see what’s going on, go to Window > Frame Debugger and visualise your changes as well as see the default pipeline at work.

Note: Unity will hijack some (parts) of these render targets at certain points in the pipeline, and you won’t have access. For ex if you want any of your custom deferred shaders to write to the built in emission buffer, and do your own lights, you must use the CameraEvent.AfterLighting stage.

Draw a mesh with the CB:

1
2
3
4
//This is the Model to World space matrix that will be applied to our mesh in the vertex shader:
Matrix4x4 transMatrix = Matrix4x4.TRS(m_GO.transform.position, m_GO.transform.rotation, m_GO.transform.scale);
//Draw this mesh with our material
m_CB_arr[(int)CBs.BeforeAlpha].DrawMesh(m_CubeMesh, transMatrix, m_SomeMaterial, 0, 0);

This is also one of the ways to implement your own lighting. Draw a cube or a sphere, for each light, with your deferred lighting material. This is what unity does internally, and it draws spheres. It ain’t the most efficient way to do deferred lighting, but it’s easy because the mesh defines the volume and provides the matrix.

Blit a texture with the CB:

The Blit works by sending a fullscreen quad to the GPU, so that in the fragment you’ll have one pixel for every Screen space pixel, so you can do post processing or deferred processing.

1
2
3
4
5
int fullscreenRT = Shader.PropertyToID("_FullscreenRT");
m_CB_arr[(int)CBs.BeforeAlpha].GetTemporaryRT(fullscreenRT, mainCam.pixelWidth, mainCam.pixelHeight, 0,FilterMode.Trilinear, RenderTextureFormat.ARGB32);
//This is where you run your material on that texture.
m_CB_arr[(int)CBs.BeforeAlpha].Blit(fullscreenRT, BuiltinRenderTextureType.CameraTarget, fullscreenMaterial);
m_CB_arr[(int)CBs.BeforeAlpha].ReleaseTemporaryRT(fullscreenRT);

That _FullscreenRT is a texture defined in the shader of your fullscreenMaterial:

1
2
sampler2D _FullscreenRT;
float4 _FullscreenRT_ST;

This would also be the pro way of doing deferred lighting. You’d have a list of lights, you’d traverse these pixels once, and light them. A common AAA method is to use a “Tiled Frustum” where you divide this texture in a grid, and have each tile reference only the lights that affect it. And you also do this grid depth wise, so lights that are far away are drawn with simpler math.

DrawProcedural with the CB:

This method draws custom data buffers with the vertex and triangles paradigm, or just with glPoints.

1
m_CB_arr[(int)CBs.BeforeAlpha].DrawProcedural(Matrix4x4.identity, customBillboardMaterial, 0, MeshTopology.Triangles, 6, particleCount);

Remember back when I said you can have custom DirectX11 vertex shaders that run on data buffers? (FragInput vert (uint id : SV_VertexID, uint inst : SV_InstanceID)) This is how you run them in Unity with C#. In this case I defined a quad with 6 independent vertices, and the vertex shader will run 6 times for each particle in particleCount.



Alright, that’s enough for now. I’ll make a separate post explaining more advanced stuff like implementing lighting, and running Compute Shaders and managing memory and data structures.

In my graphics programming internets travels I realized a lot of people find it hard either to understand or to clearly explain the graphics pipeline and some of the tricks you can do.

The general high level theory is simple, but the API naming or hidden math makes it tough to get in practice. It’s confusing or incomplete even in academic material or nvidia’s GPU Gems etc.

It’s gonna take a minute, but I’m going to explain what the ominous they don’t tell you and what they’re confusing you with.

In here and the next post, I’ll walk you through for-realsies how a mesh (or a data buffer) gets converted throughout the graphics pipeline. And I’ll give some sample code for stuff like procedural geometry, reconstructing worldspace position, or using custom data buffers.

This tutorial has some advanced topics but is still accessible to newbs. It however assumes you’ve tinkered with shader code before, and know of basic concepts like how renderers have triangle interpolation.

The Mesh

First off, a Mesh is a class or structure that stores various coordinate arrays in Object Space:

  • An array of vertices.
    • ex: A quad can have least 4 verts: {-1,1,0.5},{1,1,0.5},{1,-1,0.5},{-1,-1,0.5} (clockwise notation, from top left)
  • An array of triangles that holds index values from the vertex array. Every 3 values represent a triangle.

    • in our ex with 4 verts you’ll reference each vertex once or twice: {0,1,2,2,3,0}

      • Note: If you need to split your quad mesh’s triangles apart in the shader (move them independently) for some vfx, you will have to create 6 verts instead of 4.
  • An array of normals. Each vertex has a corresponding normal (e.g. Vector3(-1,1,-1)).

    • For our quad example we will have only 1 normal per corner!

      • Note: Same as for triangles, if you need to process your mesh normals to tweak its smoothing, you will need more vertices (to double, triple etc. the normals)! Meshes you get from artists will (should) have this covered. Depending on the mesh format you can get multiple normals referencing the same vertex. (similar to how the triangles array works)
  • An array of UVs (texture coords). These start at (0,0) for top left and end at (1,1) in bottom right.

    • Minor Note: If you didn’t know how to get your shader to respond to the Unity Inspector’s material texture Scale and Offset values, you need to declare a float4 _MainTex_ST; next to sampler2D _MainTex. The _MainTex_ST.xy is Scale.xy, and _MainTex_ST.zw is Offset.xy.



The Graphics Pipeline

Terminology here is loose with many synonyms. I’ll throw all of them in and clarify.

A mesh goes from Object Space (or Model Space) to World Space to Camera Space (or View Space or Eye Space) to Projection Space (Clip Space) to NDC Space (Normalized Device Coordinates) to Screen Space (or Window Space).

You probably heard some new terms just now. The main reasons the pipeline is more confusing than you hear in the big picture concepts, is that some math operations make more sense when split up, are easier in certain coordinate systems (for gpu matrix multiplication), and it makes the pipeline more flexible.

This is how virtually any usual graphics pipeline works, but I’m specifically writing examples in CG1 terminology, with the UnityCG library in particular:

Note: the matrices I’m listing are used starting with the Vertex program, and commonly some are merged in game engines (as a math shortcut); in Unity it’s the first 3 that are merged into UNITY_MATRIX_MVP.

The Vertex Shader

  • 1) Model->World is the World matrix (rotates, translates, scales vertices to their (Unity) world position). Unity and OpenGL call it the Model matrix and it’s merged in UNITY_MATRIX_MVP… But a less sadistic way to name it, for newcomers, would have been MATRIX_WVP not _MVP.

  • 2) World -> Camera is the View matrix. This just converts the vert’s coords so they are relative to the camera. They are in -1 (bottom-left) to 1 (top-right), with the camera at (0,0) and z between (-1 (close) and 1 (far)).

  • 3) Camera -> Clip space is the Perspective (or Orthographic (isometric)) Projection Matrix. The projection matrix doesn’t really do the projection divide here though. It builds up the vertex for the next step (frustum culling, and then perspective projection divide).

Here’s how the Projection Matrix multiplication looks in Unity (it’s the OpenGL standard):

focalLength 0 0 0   x
0 focalLength /
aspectRatio
0 0 * y
0 0 - (farPlane + nearPlane) /
(farPlane - nearPlane)
-(2 * farPlane * nearPlane) /
(farPlane - nearPlane)
  z
0 0 -1 0   1

The Projection Matrix above incorporates FoV, and the near and far planes.
focalLength = 1 / tan(FoV/2);
aspectRatio = windowHeight / windowWidth;
Check out slide #6 here for a nice visual representation.

So the o.pos in your vertex function does not hold values between 0 and 1 or -1 and 1. It’s actually the result of the above matrix multiplication. [0,0] is at the centre of the camera (unless you have a fancy off-centre perspective matrix), but the values beyond that depend on the near/far plane and the camera size / aspect ratio.

Note:

    • Unity will convert the above matrix to the appropriate API it's using when it compiles the shaders (e.g. DirectX).
    • You can see or set the projection matrix with this editor script.
    • If you set custom projection matrices from C#, use the GL.GetCPUProjectionMatrix which converts the projection matrix you give it, to the appropriate Graphics API being used (e.g. directx or opengl).

Note: Z is actually converted to 1/z (the inverse, or reciprocal of z). Z is not linear but 1/z is (this allows linear interpolation, and helps precision). Here’s more on why 1/z is used, and on the depth precision.


Freakazoid
Intermission

Before we continue I must point out that up until now (Clip Space (or Projection Space)) all the coords spaces were in what is called Homogenous space or Homogenous coordinates. In “4D”: vertex.xyzw.

We need the w because GPUs work with matrix multiplications, but matrix transforms are rotation, scale, and translation – which is an addition. So from our matrix math we know we can get addition out of matrix multiplication if we add a dimension:

1 0 0 translateX   x
0 1 0 translateY * y
0 0 1 translateZ   z
0 0 0 1   1

Above we have a simple Translation matrix, multiplied by our vertex.xyzw. So normally w is 1 in a vector. Also, obviously you can combine multiple matrices into one (e.g. don’t just have 1s and 0s in our matrix above, maybe also include rotation and/or scale, or, say a UNITY_MATRIX_MVP).

After the Projection, w will be used for the perspective divide. The w will become z with this kind of matrix multiplication:

1 0 0 0   x
0 1 0 0 * y
0 0 1 0   z
0 0 1 0   1


And we’re back:

After the Vertex Shader

Note: The following 3 steps are something that happens automatically, on the GPU (OpenGL and DirectX), after the Vertex program.

  • 4) Frustum Culling. This is not a matrix, it’s a sampling operation. Up above we got “from Camera Space to Clip space by using the Projection matrix”. This “Projection” matrix contained the camera aspect ratio and nearPlane and FarPlane which are for ex 16:9 and [10,1000]. So the current vertex is discarded if any of its x, y, or z coordinates don’t fit the frustum (frustom culling).

Note: if the vertex being culled is part of a triangle (or other primitive (quad)) the rest of the vertices of which are within the frustum, OpenGL reconstructs it as one or more primitives by adding vertices within the frustum.

  • 5) Clip -> NDC space is the perspective divide (by vertex.w) and normalization of the frustum-based coordinates. The frustum gets this perspective divide by w (distortion) which is not affine. Now the term we’re at is we’re in Normalized Device Coordinates.
    • Even though we still have a w (and it’s normalized to 1), we’re not in homogenous corrds any more. If you were confused why the vertex2frag out structure’s .pos attribute is a vector4, it’s because the perspective divide happens just after the vertex program.
    • In OpenGL View and NDC space coordinates go between (-1,-1,-1) at bottom-left-back and (1,1,1) at top-right-forward. In Direct3D the z goes between 0 and 1 instead of -1 to 1.
    • The Z coordinate here goes into the Depth buffer, and/or encoded into the DepthNormals buffer. The depth in buffers is [0,1].
    • Since we’ve done our perspective projection, the Depth buffer is not linear. So you can’t just do linear interpolation to fetch a depth point (I’ll explain later when I get to ray examples).

Note: Most literature would have garbled these last steps up into one and called it the “Projection matrix” or the “Projection step”, “where you map 3D coords to 2D”, which would have really confused you later when you’d be programming.

  • 6) NDC space -> Screen space (Window space) (rasterization): Still after the Vertex and before the fragment, the GPU converts coords to viewport pixels. In other words, it transforms from Normalized Device Coordinates (or NDC) to Window Coordinates (raster pixels (fragments)). The pixel coordinates are relative to the lower-left corner of the screen [0,0], growing towards the upper-right [1920,1080].

The formula is:

xscreen = (xndc+1) * (screenWidth/2)+x,

yscreen = (yndc+1) * (screenHeight/2)+y.

The z is still between 0 and 1 from before.

So to clarify: the o.pos inside your vertex function is in Clip Space coordinates (e.g. {x:20, y:10, z:350}), and “the same” i.pos in your fragment function is in Screen coordinates (ie x and y in pixels and z in NDC).


Now if you want extra coords passed to the fragment in Screen space, you need to do the conversion to Screen space yourself in the vertex program (the auto conversion only applies to the SV_POSITION). Here’s an example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
	v2f vert(vI v)
	{
		v2f o;

		o.pos = mul(UNITY_MATRIX_MVP, v.vertex);
		//this will interpolate in the fragment to the up vector in world space for the geometry you are drawing:
		o.orientation = mul ((float3x3)_Object2World, float3(0,1,0));
		//this will interpolate in the fragment to the world position of the current geometry you're drawing:
		o.position_ws = mul(_ObjectToWorld, v.vertex);
		
		//o.uv = v.texcoord;
		//Say you want a screenspace ray instead of a TEXCOORD:
		o.uv = ComputeScreenPos(o.pos);
	}

At this point our o.pos is converted to Clip space by MVP. So ComputeScreenPos is a cross-platform unityGC function that takes the Clip Space coordinates of your vertex and does all that automatic stuff described above that happens to o.pos (up until NDC), setting the right w param for you to perspective-divide with yourself once you get to the fragment. Then your uv will be within [0,1] in NDC space (0 is bottom left).

This is the divide in the fragment: float2 screenUV = i.uv.xy / i.uv.w;.

And if in the frag for some reason you would want screenUV.uv to be the same as i.pos (in pixels), you’d also need to multiply screenUV.xy by window width. But normally we do the opposite: we divide i.pos.xy by screen width and height to get (0,1) values.


The Tessellation and the Geometry Shaders

There’s actually more things between vert and frag (and before the interpolation that the frag is run on):

  • the Hull program: is run once every vertex and it requites as input the entire triangle or quad or line or point (whatever type of weird geometry data you use). It does some data structure conversions and some magic.
  • the Tessellation program: here you write what technique the GPU should use to subdivide your geometry. The best is usually a fractal one, then to also fade it by distance to camera. This function does not actually subdivide anything, it calculates positive barycentric coordinates.
    • Note: you can technically subdivide triangles in the Geometry function below, but that’s a more general purpose function that can’t process/subdivide as much and as fast as here.
  • the Domain program: Domain means triangle (or quad etc). It takes the 3 vertices of your triangle, and the one barycentric coordinate for the tessellation. It is run once for each barycentric point and actually spits out new vertex data.
  • the Geometry program. This optional step is where you can use affine transformations to manipulate vertices or create more vertices within a triangle.

[UPDATE:] Luckily I don’t have to get any deeper into Tessellation because it seems since I wrote this article, Jasper Flick of CatlikeCoding has done a very nice patreon’d writeup. Check it out because there’s some magic to understand about how to link each stage so they all get what they expect.

With the geometry program you can also for ex have a mesh with verts that are just points, and use a Geometry program to spawn 4 quad verts around and instead of those points/vertices and generate billboards. Here is an example of just that.

Here’s a subset of that shader. I’ll explain the key points.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
//GS_INPUT - is the data structure that the Vertex shader outputs. We are getting just one 
//point at a time in this geometry shader, but we could get more (you'd need 3 for 
//tessellation).
//TriangleStream<FS_INPUT> - this is a list of the data structure (the vertex output) that we
//want interpolated and sent to the Fragment shader. The GPU will interpolate between them 
//like they were regular triangles from a mesh.
void GS_Main(point GS_INPUT p[1], inout TriangleStream<FS_INPUT> triStream)
{
	float3 up = float3(0, 1, 0);
	float3 look = _WorldSpaceCameraPos - p[0].pos;
	look.y = 0;
	look = normalize(look);
	float3 right = cross(up, look);
	
	float halfS = 0.5f * _Size;
	
	float4 v[4];
	//The point we get from the vertex shader is in world space
	//we use that as the centre of the quad and create new worldspace vertex positions
	v[0] = float4(p[0].pos + halfS * right - halfS * up, 1.0f);
	
	//...
	
	//Matrix math! _World2Object is the inverse of the M matrix (object to world)
	//Multiplying the builtin MVP matrix by it gives us a UNITY_MATRIX_VP
	float4x4 vp = mul(UNITY_MATRIX_MVP, _World2Object);
	FS_INPUT pIn;
	pIn.pos = mul(vp, v[0]);
	pIn.tex0 = float2(1.0f, 0.0f);
	//Here's where we append a point (or, rather, 4) to the global triangle stream.
	triStream.Append(pIn);	
	
	//...
}

In DirectX11 you can actually do the points-to-quads conversion trick directly in the vertex program by manipulating custom data buffers.

Here’s how I did that for my particle sculpter:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
struct Particle
{
	float3 position;
	//... and other stuff
}

// The buffer holding the particles. This Particle struct is also defined in C#,
// and is initialized from there using something like computeBuffer.SetData(arrayOfParticle);
StructuredBuffer<Particle> particleBuffer;

// The small buffer holding the 4 vertices for the billboard. Again, allocated and set from C#.
StructuredBuffer<float3> quad_verts;

// A custom DX11 vertex shader. Params come from Graphics.DrawProcedural(MeshTopology.x, n, particleCount); from C#.
// SV_VertexID: "n", the number of vertices to draw per particle, can make a point or a quad etc..
// SV_InstanceID: "particleCount", number of particles.
FragInput vert (uint id : SV_VertexID, uint inst : SV_InstanceID)
{
	FragInput fragInput = (FragInput)0;
	//When set up this way, the vertex program knows to run n times for the same particle point ("vertex") in the particle buffer.
	// Which means all we need to do is offset the current vertex according to our quad topology.
	float3 oPos = particleBuffer[inst].position + mul(unity_WorldToObject, mul(unity_CameraToWorld, quad_verts[id]));
	// Now we just do the standard conversion from Object to Clip space like it was any regular mesh vertex.
	fragInput.position = mul(UNITY_MATRIX_MVP,float4(oPos,1));
	return fragInput;
}

The geometry shader example was calculating the camera space directions and then creating the new offsetted vertex point’s position based on those. What I did above was I treated my quad_vert as if it was in camera space (instead of object space) thus automatically aligning it to the camera, and then converting it with the matrix from Camera (or View) to World space, and then from World to Object space. (even though the C# struct was of a stantard object space quad). So don’t be afraid to play with the space matrixes kids!


The Fragment Shader

Now we’ve finally entered the Fragment program before which the GPU has also done linear and perspective-correct interpolation on the vertices (and on texcoords, color etc.) to give us the pixel positions for the triangles.

Keep in mind that here you shouldn’t multiply any point by a matrix, unless you make it homogenous (w=1 and also reverse the perspective divide (ie take it back to clip space)).

The classic fragment function is like this:
fixed4 frag(v2f_struct i) : COLOR //or SV_Target
This means this shader has 1 Render Target and it returns a rgba color.

The render target can be changed from C# with the Graphics or CommandBuffer API. (I’ll show that later)

In the deferred renderer, you can actually have (the option to output to) multiple render targets (MRT). The function structure changes, we use (multiple) out parameters:

1
2
3
4
5
6
7
void fragDeferred (
	VertexOutputDeferred i,
	out half4 outDiffuse : SV_Target0,// RT0: diffuse color (rgb), occlusion (a)
	out half4 outSpecSmoothness : SV_Target1,// RT1: spec color (rgb), smoothness (a)
	out half4 outNormal : SV_Target2,// RT2: normal (rgb), --unused, very low precision-- (a) 
	out half4 outEmission : SV_Target3// RT3: emission (rgb), --unused-- (a)
)

Now let’s continue from my vertex shader snippet from further above where I wanted a screenspace ray and did: o.uv = ComputeScreenPos(o.pos);.
Since it didn’t get a perspective divide (because it couldn’t have been interpolated to the fragment if it had), we need to do that now.
Fragment:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
{
//a screenspace uv ray:
float2 uv = i.uv.xy / i.uv.w;
//as opposed to an object space uv ray (ie if you had done o.uv = v.texcoord; in the vertex):
//here i.uv would have sampled from 0 to 1 for each face of your cube etc.

//And now we can sample the color texture from what 
//is already rendered to the screen under the current pixel
float4 colBuff = tex2D(_CameraGBufferTexture0, uv);

float depth = tex2D(_CameraDepthTexture, uv);

//...
}

This _CameraGBufferTexture0 is from the deferred renderer, but you can set custom textures to a shader using the Graphics or CommandBuffer Blit function, even in the forward stage.



My next post will apply some of this knowledge in some common but unconventional uses for shaders.

After this hopefully everyone can start researching and understanding more fun stuff like clouds, atmospheric scattering, light absorbtion, glass caustics, distance fields, fluid simulations, grass, hair, skin etc..

PS: There’s quite a bit of in-depth pipeline stuff here. If anything’s unclear, or I happened to cock anything up, let me know.


  1. CG = “C for Graphics” programming language.