frag:
i.position.xy /= _ScreenParams.xy; //_ScreenParams.xy == width and height in px;
Now we’ll learn how to make a ray that samples from the pre-existing rendered geometry, in View space. Then we’ll convert it easily to World space and even back to Object space:
1
2
3
4
5
6
7
//Vertex shader{//...o.ray_vs=float4(mul(UNITY_MATRIX_MV,v.vertex).xyz*float3(-1,-1,1),1);//float3(-1,-1,1) just flips the coordinates (the canvas).//...}
//Fragment shader{//...//ray in view spacei.ray_vs=i.ray_vs*(_ProjectionParams.z/i.ray_vs.z);//_ProjectionParams.z is camera's far clip plane. E.g. 3000.//i.ray_vs.z is between -1 and 1.//our reconstructed View position for whatever is already on the screen under the fragment we're about to draw:float4vpos=float4(i.ray_vs.xyz*depth,1);//depth is between 0 and 1.//the result is (-1,1) from bottom left front, with camera at centre (0,0,0).//and now we have our unity world coordinates for this fragment!float3wpos=mul(unity_CameraToWorld,vpos).xyz;//you can even get the object space if you want.float3opos=mul(unity_WorldToObject,float4(wpos,1)).xyz;//these are builtin unity matrices which are the inverse of the M and V matrices.//...}
The object space would be useful for example if you want to see if a pre-existing fragment would be masked by a cubemap that originates at your current geometry’s position:
float mask = texCUBE(_CubeTex, objPos).a;//(in the fragment). With this trick we pretend we have geometry with our current mesh’s centre, but the underlying geometry’s surface, in object space.
Since we also have our current geometry’s Screen space position, we can convert that as well to view space and world space and object space. So you can compare or blend with the equivalent from the underlying pre-existing geometry we sampled.
Solid Texturing
With the worldspace position (of either the pre-existing geometry’s fragment or the current one) you we can do what is called Triplanar Texture Projection or Solid Texturing. This is what you most often find on procedural meshes you obtain from voxels, which don’t have sensical UV coorinates, so you just say “everything on top is grass, everything on the sides is ground”.
//Fragment{//Get your Normals either from the mesh, or from a GBuffer of the pre-existing stuff.half4normal=...//Get your world position. Again, either of your geometry or the pre-existing stuff in the buffer.float3wpos=...//Our UVs are the world position. _LocationBump is just a float or float3 offset. Helps further move the noise seed.half3uv=wpos.xyz+_LocationBump;//To make the canyon stratification wavy texture, I used Curled Simplex Noise to distort the texture lookup (the uvs). half3uv_curled=uv+uv*normalize(get_Curl(uv/_Freq,_Seed))/_Amplitude;//Each plane has its own texture. Texture sampling repeats (unless set to Clamp), so we're fine sampling a world position which is protentially way larger than 1.float2coord1=i.uv_curled.xy*_MainTexZ_ST.xy+_MainTexZ_ST.zw;float2coord2=i.uv.zx*_MainTexY_ST.xy+_MainTexY_ST.zw;// we don't want to curl the topfloat2coord3=i.uv_curled.zy*_MainTexX_ST.xy+_MainTexX_ST.zw;float3blend_weights=abs(wnormal);//This is how you tweak the width of the transition zone;blend_weights=blend_weights-_BlendZone;//Make the weights sum up to a total of 1blend_weights/=((blend_weights.x+blend_weights.y+blend_weights.z)).xxx;//Sample our 3 textures. Tex 1 and 3 can be the same texture (in my canyon case).fixed4col1=tex2D(_MainTexZ,coord1.xy);fixed4col2=tex2D(_MainTexY,coord2.xy);fixed4col3=tex2D(_MainTexX,coord3.xy);//Now blend the colourshalf3blended_color=(col1.xyz*blend_weights.xxx+col3.xyz*blend_weights.zzz)+col2.xyz*blend_weights.yyy;//...}
Now profit! You can similarly sample from a bumpmap and warp the normals for the lighting step.
Before we look at more advanced stuff like deferred lighting and post FX, let’s briefly look at the CommandBuffer. Unity made plenty of documentation and examples for it, so I’ll just explain what the key points mean.
Note: Notice that a CommandBuffer sends a program to the GPU that runs every frame, so unless you need to change the program or update the data structures from the C# side, you shouldn’t clear and run the Draw or Blit commands every frame from C#.
The Command Buffer allows you to insert render materials and geometry at any stage of Unity’s rendering pipeline. You set that with CameraEvent.m_BeforeAlpha etc.. To see what’s going on, go to Window > Frame Debugger and visualise your changes as well as see the default pipeline at work.
Note: Unity will hijack some (parts) of these render targets at certain points in the pipeline, and you won’t have access. For ex if you want any of your custom deferred shaders to write to the built in emission buffer, and do your own lights, you must use the CameraEvent.AfterLighting stage.
Draw a mesh with the CB:
1
2
3
4
//This is the Model to World space matrix that will be applied to our mesh in the vertex shader:Matrix4x4transMatrix=Matrix4x4.TRS(m_GO.transform.position,m_GO.transform.rotation,m_GO.transform.scale);//Draw this mesh with our materialm_CB_arr[(int)CBs.BeforeAlpha].DrawMesh(m_CubeMesh,transMatrix,m_SomeMaterial,0,0);
This is also one of the ways to implement your own lighting. Draw a cube or a sphere, for each light, with your deferred lighting material. This is what unity does internally, and it draws spheres.
It ain’t the most efficient way to do deferred lighting, but it’s easy because the mesh defines the volume and provides the matrix.
Blit a texture with the CB:
The Blit works by sending a fullscreen quad to the GPU, so that in the fragment you’ll have one pixel for every Screen space pixel, so you can do post processing or deferred processing.
1
2
3
4
5
intfullscreenRT=Shader.PropertyToID("_FullscreenRT");m_CB_arr[(int)CBs.BeforeAlpha].GetTemporaryRT(fullscreenRT,mainCam.pixelWidth,mainCam.pixelHeight,0,FilterMode.Trilinear,RenderTextureFormat.ARGB32);//This is where you run your material on that texture.m_CB_arr[(int)CBs.BeforeAlpha].Blit(fullscreenRT,BuiltinRenderTextureType.CameraTarget,fullscreenMaterial);m_CB_arr[(int)CBs.BeforeAlpha].ReleaseTemporaryRT(fullscreenRT);
That _FullscreenRT is a texture defined in the shader of your fullscreenMaterial:
1
2
sampler2D_FullscreenRT;float4_FullscreenRT_ST;
This would also be the pro way of doing deferred lighting. You’d have a list of lights, you’d traverse these pixels once, and light them. A common AAA method is to use a “Tiled Frustum” where you divide this texture in a grid, and have each tile reference only the lights that affect it. And you also do this grid depth wise, so lights that are far away are drawn with simpler math.
DrawProcedural with the CB:
This method draws custom data buffers with the vertex and triangles paradigm, or just with glPoints.
Remember back when I said you can have custom DirectX11 vertex shaders that run on data buffers? (FragInput vert (uint id : SV_VertexID, uint inst : SV_InstanceID))
This is how you run them in Unity with C#. In this case I defined a quad with 6 independent vertices, and the vertex shader will run 6 times for each particle in particleCount.
Alright, that’s enough for now. I’ll make a separate post explaining more advanced stuff like implementing lighting, and running Compute Shaders and managing memory and data structures.
In my graphics programming internets travels I realized a lot of people find it hard either to understand or to clearly explain the graphics pipeline and some of the tricks you can do.
The general high level theory is simple, but the API naming or hidden math makes it tough to get in practice. It’s confusing or incomplete even in academic material or nvidia’s GPU Gems etc.
It’s gonna take a minute, but I’m going to explain what the ominous they don’t tell you and what they’re confusing you with.
In here and the next post, I’ll walk you through for-realsies how a mesh (or a data buffer) gets converted throughout the graphics pipeline. And I’ll give some sample code for stuff like procedural geometry, reconstructing worldspace position, or using custom data buffers.
This tutorial has some advanced topics but is still accessible to newbs. It however assumes you’ve tinkered with shader code before, and know of basic concepts like how renderers have triangle interpolation.
The Mesh
First off, a Mesh is a class or structure that stores various coordinate arrays in Object Space:
An array of vertices.
ex: A quad can have least 4 verts: {-1,1,0.5},{1,1,0.5},{1,-1,0.5},{-1,-1,0.5} (clockwise notation, from top left)
An array of triangles that holds index values from the vertex array. Every 3 values represent a triangle.
in our ex with 4 verts you’ll reference each vertex once or twice: {0,1,2,2,3,0}
Note: If you need to split your quad mesh’s triangles apart in the shader (move them independently) for some vfx, you will have to create 6 verts instead of 4.
An array of normals. Each vertex has a corresponding normal (e.g. Vector3(-1,1,-1)).
For our quad example we will have only 1 normal per corner!
Note: Same as for triangles, if you need to process your mesh normals to tweak its smoothing, you will need more vertices (to double, triple etc. the normals)! Meshes you get from artists will (should) have this covered. Depending on the mesh format you can get multiple normals referencing the same vertex. (similar to how the triangles array works)
An array of UVs (texture coords). These start at (0,0) for top left and end at (1,1) in bottom right.
Minor Note: If you didn’t know how to get your shader to respond to the Unity Inspector’s material texture Scale and Offset values, you need to declare a float4 _MainTex_ST; next to sampler2D _MainTex. The _MainTex_ST.xy is Scale.xy, and _MainTex_ST.zw is Offset.xy.
The Graphics Pipeline
Terminology here is loose with many synonyms. I’ll throw all of them in and clarify.
A mesh goes from Object Space (or Model Space) to World Space to Camera Space (or View Space or Eye Space) to Projection Space (Clip Space) to NDC Space (Normalized Device Coordinates) to Screen Space (or Window Space).
You probably heard some new terms just now. The main reasons the pipeline is more confusing than you hear in the big picture concepts, is that some math operations make more sense when split up, are easier in certain coordinate systems (for gpu matrix multiplication), and it makes the pipeline more flexible.
This is how virtually any usual graphics pipeline works, but I’m specifically writing examples in CG^{1} terminology, with the UnityCG library in particular:
Note: the matrices I’m listing are used starting with the Vertex program, and commonly some are merged in game engines (as a math shortcut); in Unity it’s the first 3 that are merged into UNITY_MATRIX_MVP.
The Vertex Shader
1) Model->World is the World matrix (rotates, translates, scales vertices to their (Unity) world position). Unity and OpenGL call it the Model matrix and it’s merged in UNITY_MATRIX_MVP… But a less sadistic way to name it, for newcomers, would have been MATRIX_WVP not _MVP.
2) World -> Camera is the View matrix. This just converts the vert’s coords so they are relative to the camera. They are in -1 (bottom-left) to 1 (top-right), with the camera at (0,0) and z between (-1 (close) and 1 (far)).
3) Camera -> Clip space is the Perspective (or Orthographic (isometric)) Projection Matrix. The projection matrix doesn’t really do the projection divide here though. It builds up the vertex for the next step (frustum culling, and then perspective projection divide).
Here’s how the Projection Matrix multiplication looks in Unity (it’s the OpenGL standard):
The Projection Matrix above incorporates FoV, and the near and far planes.
focalLength = 1 / tan(FoV/2); aspectRatio = windowHeight / windowWidth;
Check out slide #6 here for a nice visual representation.
So the o.pos in your vertex function does not hold values between 0 and 1 or -1 and 1. It’s actually the result of the above matrix multiplication. [0,0] is at the centre of the camera (unless you have a fancy off-centre perspective matrix), but the values beyond that depend on the near/far plane and the camera size / aspect ratio.
Note:
Unity will convert the above matrix to the appropriate API it's using when it compiles the shaders (e.g. DirectX).
If you set custom projection matrices from C#, use the GL.GetCPUProjectionMatrix which converts the projection matrix you give it, to the appropriate Graphics API being used (e.g. directx or opengl).
Note: Z is actually converted to 1/z (the inverse, or reciprocal of z). Z is not linear but 1/z is (this allows linear interpolation, and helps precision). Here’s more on why 1/z is used, and on the depth precision.
Before we continue I must point out that up until now (Clip Space (or Projection Space)) all the coords spaces were in what is called Homogenous space or Homogenous coordinates. In “4D”: vertex.xyzw.
We need the w because GPUs work with matrix multiplications, but matrix transforms are rotation, scale, and translation – which is an addition. So from our matrix math we know we can get addition out of matrix multiplication if we add a dimension:
1
0
0
translateX
x
0
1
0
translateY
*
y
0
0
1
translateZ
z
0
0
0
1
1
Above we have a simple Translation matrix, multiplied by our vertex.xyzw. So normally w is 1 in a vector. Also, obviously you can combine multiple matrices into one (e.g. don’t just have 1s and 0s in our matrix above, maybe also include rotation and/or scale, or, say a UNITY_MATRIX_MVP).
After the Projection, w will be used for the perspective divide. The w will become z with this kind of matrix multiplication:
1
0
0
0
x
0
1
0
0
*
y
0
0
1
0
z
0
0
1
0
1
And we’re back:
After the Vertex Shader
Note: The following 3 steps are something that happens automatically, on the GPU (OpenGL and DirectX), after the Vertex program.
4) Frustum Culling. This is not a matrix, it’s a sampling operation. Up above we got “from Camera Space to Clip space by using the Projection matrix”. This “Projection” matrix contained the camera aspect ratio and nearPlane and FarPlane which are for ex 16:9 and [10,1000]. So the current vertex is discarded if any of its x, y, or z coordinates don’t fit the frustum (frustom culling).
Note: if the vertex being culled is part of a triangle (or other primitive (quad)) the rest of the vertices of which are within the frustum, OpenGL reconstructs it as one or more primitives by adding vertices within the frustum.
5) Clip -> NDC space is the perspective divide (by vertex.w) and normalization of the frustum-based coordinates. The frustum gets this perspective divide by w (distortion) which is not affine. Now the term we’re at is we’re in Normalized Device Coordinates.
Even though we still have a w (and it’s normalized to 1), we’re not in homogenous corrds any more. If you were confused why the vertex2frag out structure’s .pos attribute is a vector4, it’s because the perspective divide happens just after the vertex program.
In OpenGL View and NDC space coordinates go between (-1,-1,-1) at bottom-left-back and (1,1,1) at top-right-forward. In Direct3D the z goes between 0 and 1 instead of -1 to 1.
The Z coordinate here goes into the Depth buffer, and/or encoded into the DepthNormals buffer. The depth in buffers is [0,1].
Since we’ve done our perspective projection, the Depth buffer is not linear. So you can’t just do linear interpolation to fetch a depth point (I’ll explain later when I get to ray examples).
Note: Most literature would have garbled these last steps up into one and called it the “Projection matrix” or the “Projection step”, “where you map 3D coords to 2D”, which would have really confused you later when you’d be programming.
6) NDC space -> Screen space (Window space) (rasterization): Still after the Vertex and before the fragment, the GPU converts coords to viewport pixels. In other words, it transforms from Normalized Device Coordinates (or NDC) to Window Coordinates (raster pixels (fragments)). The pixel coordinates are relative to the lower-left corner of the screen [0,0], growing towards the upper-right [1920,1080].
The formula is:
x_{screen} = (x_{ndc}+1) * (screenWidth/2)+x,
y_{screen} = (y_{ndc}+1) * (screenHeight/2)+y.
The z is still between 0 and 1 from before.
So to clarify: the o.pos inside your vertex function is in Clip Space coordinates (e.g. {x:20, y:10, z:350}), and “the same” i.pos in your fragment function is in Screen coordinates (ie x and y in pixels and z in NDC).
Now if you want extra coords passed to the fragment in Screen space, you need to do the conversion to Screen space yourself in the vertex program (the auto conversion only applies to the SV_POSITION). Here’s an example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
v2fvert(vIv){v2fo;o.pos=mul(UNITY_MATRIX_MVP,v.vertex);//this will interpolate in the fragment to the up vector in world space for the geometry you are drawing:o.orientation=mul((float3x3)_Object2World,float3(0,1,0));//this will interpolate in the fragment to the world position of the current geometry you're drawing:o.position_ws=mul(_ObjectToWorld,v.vertex);//o.uv = v.texcoord;//Say you want a screenspace ray instead of a TEXCOORD:o.uv=ComputeScreenPos(o.pos);}
At this point our o.pos is converted to Clip space by MVP. So ComputeScreenPos is a cross-platform unityGC function that takes the Clip Space coordinates of your vertex and does all that automatic stuff described above that happens to o.pos (up until NDC), setting the right w param for you to perspective-divide with yourself once you get to the fragment. Then your uv will be within [0,1] in NDC space (0 is bottom left).
This is the divide in the fragment: float2 screenUV = i.uv.xy / i.uv.w;.
And if in the frag for some reason you would want screenUV.uv to be the same as i.pos (in pixels), you’d also need to multiply screenUV.xy by window width. But normally we do the opposite: we divide i.pos.xy by screen width and height to get (0,1) values.
The Tessellation and the Geometry Shaders
There’s actually more things between vert and frag (and before the interpolation that the frag is run on):
the Hull program: is run once every vertex and it requites as input the entire triangle or quad or line or point (whatever type of weird geometry data you use). It does some data structure conversions and some magic.
the Tessellation program: here you write what technique the GPU should use to subdivide your geometry. The best is usually a fractal one, then to also fade it by distance to camera. This function does not actually subdivide anything, it calculates positive barycentric coordinates.
Note: you can technically subdivide triangles in the Geometry function below, but that’s a more general purpose function that can’t process/subdivide as much and as fast as here.
the Domain program: Domain means triangle (or quad etc). It takes the 3 vertices of your triangle, and the one barycentric coordinate for the tessellation. It is run once for each barycentric point and actually spits out new vertex data.
the Geometry program. This optional step is where you can use affine transformations to manipulate vertices or create more vertices within a triangle.
[UPDATE:] Luckily I don’t have to get any deeper into Tessellation because it seems since I wrote this article, Jasper Flick of CatlikeCoding has done a very nice patreon’d writeup. Check it out because there’s some magic to understand about how to link each stage so they all get what they expect.
With the geometry program you can also for ex have a mesh with verts that are just points, and use a Geometry program to spawn 4 quad verts around and instead of those points/vertices and generate billboards. Here is an example of just that.
Here’s a subset of that shader. I’ll explain the key points.
//GS_INPUT - is the data structure that the Vertex shader outputs. We are getting just one //point at a time in this geometry shader, but we could get more (you'd need 3 for //tessellation).//TriangleStream<FS_INPUT> - this is a list of the data structure (the vertex output) that we//want interpolated and sent to the Fragment shader. The GPU will interpolate between them //like they were regular triangles from a mesh.voidGS_Main(pointGS_INPUTp[1],inoutTriangleStream<FS_INPUT>triStream){float3up=float3(0,1,0);float3look=_WorldSpaceCameraPos-p[0].pos;look.y=0;look=normalize(look);float3right=cross(up,look);floathalfS=0.5f*_Size;float4v[4];//The point we get from the vertex shader is in world space//we use that as the centre of the quad and create new worldspace vertex positionsv[0]=float4(p[0].pos+halfS*right-halfS*up,1.0f);//...//Matrix math! _World2Object is the inverse of the M matrix (object to world)//Multiplying the builtin MVP matrix by it gives us a UNITY_MATRIX_VPfloat4x4vp=mul(UNITY_MATRIX_MVP,_World2Object);FS_INPUTpIn;pIn.pos=mul(vp,v[0]);pIn.tex0=float2(1.0f,0.0f);//Here's where we append a point (or, rather, 4) to the global triangle stream.triStream.Append(pIn);//...}
In DirectX11 you can actually do the points-to-quads conversion trick directly in the vertex program by manipulating custom data buffers.
structParticle{float3position;//... and other stuff}// The buffer holding the particles. This Particle struct is also defined in C#,// and is initialized from there using something like computeBuffer.SetData(arrayOfParticle);StructuredBuffer<Particle>particleBuffer;// The small buffer holding the 4 vertices for the billboard. Again, allocated and set from C#.StructuredBuffer<float3>quad_verts;// A custom DX11 vertex shader. Params come from Graphics.DrawProcedural(MeshTopology.x, n, particleCount); from C#.// SV_VertexID: "n", the number of vertices to draw per particle, can make a point or a quad etc..// SV_InstanceID: "particleCount", number of particles.FragInputvert(uintid:SV_VertexID,uintinst:SV_InstanceID){FragInputfragInput=(FragInput)0;//When set up this way, the vertex program knows to run n times for the same particle point ("vertex") in the particle buffer.// Which means all we need to do is offset the current vertex according to our quad topology.float3oPos=particleBuffer[inst].position+mul(unity_WorldToObject,mul(unity_CameraToWorld,quad_verts[id]));// Now we just do the standard conversion from Object to Clip space like it was any regular mesh vertex.fragInput.position=mul(UNITY_MATRIX_MVP,float4(oPos,1));returnfragInput;}
The geometry shader example was calculating the camera space directions and then creating the new offsetted vertex point’s position based on those.
What I did above was I treated my quad_vert as if it was in camera space (instead of object space) thus automatically aligning it to the camera, and then converting it with the matrix from Camera (or View) to World space, and then from World to Object space. (even though the C# struct was of a stantard object space quad). So don’t be afraid to play with the space matrixes kids!
The Fragment Shader
Now we’ve finally entered the Fragment program before which the GPU has also done linear and perspective-correct interpolation on the vertices (and on texcoords, color etc.) to give us the pixel positions for the triangles.
Keep in mind that here you shouldn’t multiply any point by a matrix, unless you make it homogenous (w=1 and also reverse the perspective divide (ie take it back to clip space)).
The classic fragment function is like this: fixed4 frag(v2f_struct i) : COLOR //or SV_Target
This means this shader has 1 Render Target and it returns a rgba color.
The render target can be changed from C# with the Graphics or CommandBuffer API. (I’ll show that later)
In the deferred renderer, you can actually have (the option to output to) multiple render targets (MRT). The function structure changes, we use (multiple) out parameters:
1
2
3
4
5
6
7
voidfragDeferred(VertexOutputDeferredi,outhalf4outDiffuse:SV_Target0,// RT0: diffuse color (rgb), occlusion (a)outhalf4outSpecSmoothness:SV_Target1,// RT1: spec color (rgb), smoothness (a)outhalf4outNormal:SV_Target2,// RT2: normal (rgb), --unused, very low precision-- (a) outhalf4outEmission:SV_Target3// RT3: emission (rgb), --unused-- (a))
Now let’s continue from my vertex shader snippet from further above where I wanted a screenspace ray and did: o.uv = ComputeScreenPos(o.pos);.
Since it didn’t get a perspective divide (because it couldn’t have been interpolated to the fragment if it had), we need to do that now.
Fragment:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
{//a screenspace uv ray:float2uv=i.uv.xy/i.uv.w;//as opposed to an object space uv ray (ie if you had done o.uv = v.texcoord; in the vertex)://here i.uv would have sampled from 0 to 1 for each face of your cube etc.//And now we can sample the color texture from what //is already rendered to the screen under the current pixelfloat4colBuff=tex2D(_CameraGBufferTexture0,uv);floatdepth=tex2D(_CameraDepthTexture,uv);//...}
This _CameraGBufferTexture0 is from the deferred renderer, but you can set custom textures to a shader using the Graphics or CommandBuffer Blit function, even in the forward stage.
My next post will apply some of this knowledge in some common but unconventional uses for shaders.
After this hopefully everyone can start researching and understanding more fun stuff like clouds, atmospheric scattering, light absorbtion, glass caustics, distance fields, fluid simulations, grass, hair, skin etc..
PS: There’s quite a bit of in-depth pipeline stuff here. If anything’s unclear, or I happened to cock anything up, let me know.