OpenGL Control Flow

This page is designed to give a high-level overview of what OpenGL is doing. It is not designed to teach you OpenGL, tell you how to call or compile it; rather, the goal is to demystify what it is doing so that you can better understand how to make it do what you want it to.

Note that the word “pixel” is overloaded in graphics. OpenGL uses the word fragment to mean a pixel generated by one OpenGL primitive, pixel to mean an pixel in the display buffer generated by combining all the candidate fragments, and texel to mean a pixel of a texture image.

The (main) Pipeline

The general flow of a standard OpenGL application runs as follows:

  1. User code makes one or more OpenGL calls.
  2. These calls may be buffered in the user-space OpenGL library, but are always sent to the graphics card in order.
  3. The graphics card performs per-vertex operations (matrices, lighting, fog) on the vertices. This step is called the vertex shader and may be overridden using a shader language.
  4. Vertices are divided by their w component and scaled up so that (x,y) \in [-1, 1] maps to the pixel range of the current viewport.
  5. Vertices are combined into points, lines, and/or triangles and turned into fragments. For lines and triangles, extra information (depth, color, texture coordinate, etc) is interpolated to each fragment using a perspective-correct version of linear interpolation known as “hyperbolic interpolation.”
  6. The graphics card performs per-fragment operations (usually just texture lookup) on each fragment. This step is called the fragment shader and may be overridden using a shader language.
  7. Fragments are combined on a first-come first-serve basis with their corresponding pixel according to alpha blending and depth testing.

OpenGL as a State Machine

In most OpenGL programs, a principle performance bottleneck is the CPU—>GPU communication. To reduce this bottleneck, OpenGL is modeled as a state machine and most calls change its state. Thus, calling glColor does not do anything; it simply sets the color that will be attached to all vertices until glColor is called again.

Often one piece of state will modify another; for example, the light position is modified by the modelview matrix. In all cases, state is modified when it is generated and only when it is generated; thus, if you run

glTranslatef(0,0,1)
glLightfv(GL_LIGHT0, GL_POSITION, {0,0,0,1})
glRotatef(180,0,1,0)
glLightfv(GL_LIGHT1, GL_POSITION, {0,0,0,1})

then LIGHT0 is at (0,0,1) and LIGHT1 is at (0,0,-1). The glRotate call will not effect LIGHT0 unless we re-specify its position after the glRotate call.

One OpenGL state is special: the drawing state entered by a glBegin call and ended by a glEnd call. In order to speed up the drawing, OpenGL freezes most other state while in the drawing state, and most OpenGL calls will fail if used while in the drawing state. In general, the only methods that do work between glBegin and glEnd are glVertex, glColor, glMaterial, glNormal, and glTexCoord. When you call glBegin, OpenGL does some up-front computations (mostly 4-by-4 matrix multiplications and inversions) which allow it to run the per-vertex processing much faster than it could otherwise. If your code only has a few glVertex calls in each begin/end block, it will likely run slowly as much of this optimization is wasted.

The OpenGL calls that are not purely state-modifying are glVertex, higher-level drawing methods like glCallLists, and those that return information to the user such as glGet and glGenTextures.

Viewports and Matrices

As noted in step 4 of the pipeline, OpenGL will always transform [-1,1] in x and y (after vertex operations, including perspective) to fit on the screen. There is nothing you can do to change this. It will likewise always transform [-1,1] in z to fit in the depth range, which by default is [0,1). Everything will be clipped to fit in [-1,1] range in x, y, and z.

Because of this, calls to glViewport will not change what is displayed, only where it is displayed. You can change the viewport as often as you want (except within a begin/end block, of course) and the generated fragments will be placed on the screen accordingly. Be warned that glClear will clear the entire frame buffer, not just the active viewport; to clear a smaller area, use glScissor.

To change what shows up within the viewport, we need to impact that per-vertex operations in the vertex shader. The default fixed-functionality vertex shader performs roughly as follows:

  • Given Matrices Modelview, Projection, Color, and Texture.
  • Given points vertex, normal, texture coordinate, and color.
  • Let v' = M v and n' = M^{-T} n; c' = C c and t' = T t.
  • Compute light and fog based on v', n'.
  • Let v'' = P v'.
  • Forward v'', c', t', light and fog on to next stage of pipeline.

Note in particular that the modelview and projection matrix both modify the vertices, but that lighting and fog happen between the two matrices. The main purpose of the projection matrix is to allow perspective projection to happen after the lighting, since otherwise perspective changes how things would be lit. Unless you have really good reason to do otherwise, the projection matrix should contain nothing except a single matrix generated by one of glFrustum, gluPerspective, glOrtho, or gluOrtho2D. These calls should almost never be placed in the modelview matrix.

Given the restricted purpose of the projection matrix, the main way to adjust what is displayed in the screen is via the modelview matrix. There are many ways to think about what this matrix does; one of many correct interpretations is given next.

Making Sense of the Modelview Matrix

There are two coordinate systems to worry about: the camera system and the active modeling system. Any matrices you apply to the left side of matrix effects the camera; any to the right side effects the modeling system. The break between the two is arbitrary, but useful to for understanding. Suppose, for instance, that you have an orientation and location for the camera; if we model this as

(1)
R_o \; T_l \; \langle break \rangle \; I

then we read off the camera from the break leftwards and thus translate before we rotate; changing the orientation does not change the translation, which is specified in world coordinates, and will cause the camera to pivot in place. Conversely, if we have

(2)
T_l \; R_o \; \langle break \rangle \; I

then we translate the camera after we rotate the camera, so the location is in local camera coordinates; changing the orientation will cause the camera to rotate around the origin.

We interpret the modeling side in the same way, but in the opposite direction. Using both the matrix-style format and the what it would look like in code, consider the following example:

(3)
R_o \; T_l \; \langle break \rangle \; T_{z:2}\; R_{y:90}\; T_{z:2}\; S_{xyz:2} T_{x:1\;z:-1}\; S_{xyz:0.5}\;R_{-y:90}
// assume origin at (0,0,0); x=right, y=up, z=back to start
Translate ( 0, 0, 2) // z=back: the system has moved behind us to (0,0,2)
Rotate (90, 0, 1, 0) // the system is still behind us, but x=forward and z=right
Translate ( 0, 0, 2) // z=right: now the system is 2 units behind us and 2 to our right
Scale     ( 2, 2, 2) // stays put, but its units are now twice as large as ours
Translate ( 1, 0,-1) // x=forward and z=right; 2 units in each direction because of the scale; modeling origin now back at (0,0,0)
Scale     (.5,.5,.5) // scaled back down so modeling units match our units
Rotate (90, 0,-1, 0) // rotated back again; we are now where we started

At any point you can call glPushMatrix to save a copy of the current matrix for later use, and restore a matrix from the stack using glPopMatrix. Note that you can only push the projection matrix a few times as its stack is very shallow, but the modelview and other matrices can be pushed at least 32 times before they run out of space.

Odds and Ends

There are many other things that could be noted. Here are just a few

Depth and Alpha

In perspective mode, if any point in the [-1,1] range in has a zero or negative w, your depth buffer won't work. If you are using glFrustum or gluPerspective, make sure both the near and the far values are positive and not zero. In addition, the closer near and far are to one another, the more effective the depth buffer will be. The minimal distinguishable depth difference of a d-bit depth buffer will be \frac{far-near}{2^d} in orthographic mode; in perspective things are more complicated because most of the precision is near the near plane, but the total error is linearly related to \frac{far}{near}.

Both depth and alpha are composited in the order they are drawn to the screen. For depth, drawing the nearest objects first can give a small speed boost if your graphics card checks the depth buffer before doing the per-fragment computations, particularly if you have written your own fragment shader. For alpha, though, you always want to draw the farthest objects first. OpenGL does not provide means for doing this automatically. A common workaround which works if there are not too many partially-transparent objects in one area is to render all the opaque objects, then disable depth writing (glDepthMask(GL_FALSE)) but leave on depth checking, and then render all the translucent objects. This technique works exactly if the transparent objects are drawn farthest to nearest, if the blend function is replicating glow rather than translucency, or if no two transparent objects ever overlap in the view.

Textures

Textures can be quite complicated in OpenGL. The core idea is that a texture coordinate in the range [0,1] is interpolated to each fragment, scaled up to the size of the texture image, converted into one or more texels, and the color stored in the texel are combined with the interpolated color in some way. Each of these pieces has several options attached to it.

Texture coordinates can be specified using glTexCoord calls in 1, 2, 3, or 4 dimensions; the fourth dimension is a homogenous coordinate just as it is for vertex calls. They can also be generated automatically by the OpenGL hardware based on pre-modelview object position, post-modelview object position, reflection vectors, or (using the ARB_texture_cube_map extension) normals, using glTexGen calls. The texture coordinates need not have the same dimension as the texture image; extra coordinate dimensions will be ignored and missing dimensions will default to the appropriate element in the default texture coordinate vector (0,0,0,1).

Texture images are specified using either glTexImage or gluBuild*Mipmaps commands, or are copied from the frame buffer using glCopyTexImge or glCopyTexSubImage. Generally speaking you'll want to use 2D textures since the surface of a 3D object is 2D, though 3D textures can be used for animation.

Texture coordinates in the [0,1] range are mapped to the size of the texture image. Values outside that range can either wrap around (glTexParameter with GL_WRAP_* and GL_REPEAT) or be clamped to the range (GL_CLAMP) or to the center of the edge texels (GL_CLAMP_TO_EDGE). Generally, you want to use GL_REPEAT for tiling textures like walls and ground and GL_CLAMP or GL_CLAMP_EDGE for non-tiling textures like character decals.

There is a texture matrix you can use to change texture coordinates. This is rarely done, but behaves like any other matrix would.

Once a texture coordinate is mapped onto the size of the texture, there are still several decisions to make. First, OpenGL decides if the texture is being magnified (more than one fragment per texel) or minified (more than one texel per fragment). If it is being magnified it decides what to do based on the glTexParameter GL_MAG_FILTER, which can either so simple (GL_NEAREST) or linear (GL_LINEAR) interpolation. For minification, the GL_MIN_FILTER can also be GL_NEAREST or GL_LINEAR, or it can come from the mipmap.

A mipmap is a set of copies of the same image at different levels of detail. If a 32-by-32 block of the texture is mapped to a single fragment, even linear interpolation would sample only 4 neighboring texels and miss the majority of the image. If, however, we sampled from a low-res version of the same image we could get the correct result. We can tell OpenGL to find the image in the mipmap with the most appropriate level of detail (MIPMAP_NEAREST) and sample it using the nearest or linear filters (GL_NEAREST_MIPMAP_NEAREST or GL_LINEAR_MIPMAP_NEAREST), or to do this on both an image in the mipmap with fewer texels than fragments and one with more texels than fragments and to interpolate the two results linearly (GL_NEAREST_MIPMAP_LINEAR or GL_LINEAR_MIPMAP_LINEAR).

Once a (blend of) texel(s) is identified and a color retrieved from the texture, the color is combined with the (lit) color of the object in one of four ways specified by glTexEnv:

GL_REPLACE
This is the simplest; the object color is ignored and the texture color used instead.
GL_MODULATE
A simple element-wise multiplication of the two colors. Allows the texture to be lit.
GL_DECAL
Where the texture's alpha channel is transparent, the object color comes through; where it is opaque, the texture color is used instead.
GL_BLEND
This one is somewhat strange; first, you specify a color using glTexEnvfv(GL_TEXTURE_ENV_COLOR, GLfloat[4]); then you treat the RGB value of the texture color as three separate transparency channels; where the texture's red is 1, the texture environment red shows through; where it is 0, the object red is used instead, and similarly for green and blue. I am not aware of any application for this, though presumably there is one.

TODO: glCopyTexImage, convolution filters and similar processing, glTexGen and environment mapping, multitexturing, etc.

Cached Calls

OpenGL can save a lot of work and CPU-GPU communication by storing information in graphics memory. The two main things you put there are display lists and textures.

A display list is an (almost) arbitrary set of OpenGL calls which are stored and attached to a single number. You can then get OpenGL to re-run an optimized version of the same calls later by simply calling glCallList. This always saves on CPU-GPU communication; additional runtime savings are dependent on the complexity of the calls within the display list; complex lists with lots of matrix changes and begin/end blocks can be optimized more than simple lists can.

As a rule of thumb, create a display list for every object in the scene—or, in object oriented thinking, for every class; if you have 15 lamps you only need one lamp display list. If the geometry of several objects is the same but the texturing is different, simply leave the glBindTextures call outside of the display list and bind different textures before each glCallList call.

Textures can be bound to integers as well, and used by simply calling glBindTexure. This prevents a huge amount of CPU-GPU communication and, if you use gluBuild*DMipmaps instead of glTexImage*D it saves quite a bit of processing time as well.

Animation

It is often desirable to have individual objects animated within OpenGL. There are four basic techniques for doing this:

  • Model a different mesh for each frame, compile them into display lists, and cycle through them when drawing. This allows high-quality animations at the expense of memory. As memory continues to become cheaper, this is taking over the game market.
  • Model an object as several meshes compiled as separate display lists and move them individually with the modelview matrix. Works well for mechanical objects, but usually leaves obvious seams if used for organic objects.
  • Use the ARB_vertex_blend extension to have multiple modelview matrices smoothly transitioned, similar to traditional boning techniques. This allows a single display list to specify an articular organic mesh. The main caveat is the difficulty of getting the weights right; this is the same problem as boning and skinning for 3D animation.
  • Move the vertices manually, one at a time. Obviously the most versatile method, this is also the slowest. Using glDrawElements, glEnableClientState, and the various gl*Pointer methods can make it quite a bit faster, both because it is more efficient than sending all the points individually and because you won't need to resend the texture or color arrays, only the vertex and normal arrays.

Lighting and Materials

Some games use what is called baked lighting, where the textures have the lighting built in and OpenGL's lights are all disabled. Some use a mixture of baked lighting for static objects and OpenGL lighting for moving characters, though getting them to look right together can be tricky. Some use cube-mapped textures with the normal map for environmental lighting and the reflection map for specular highlights, which can result in some very nice effects but is infeasible if the lights are near moving objects.

If you do use standard OpenGL lighting, know that light interacts with materials, and that materials are not the same as colors. When you enable GL_LIGHTING, calls to glColor are, by default, ignored; use glMaterial instead. You can use glColor to specify the diffuse and ambient color of the material by enabling GL_COLOR_MATERIAL; you can also change what part of the material is controlled by glColor using the glColorMaterial method.

OpenGL lighting is a per-vertex operation and results in a single color; that color is then interpolated to the vertices and combined with the texture to give the final fragment color. If you wish, you can cause the specular highlights to be applied after the texture lookup by calling glLightModeli(GL_LIGHT_MODEL_COLOR_CONTROL, GL_SEPARATE_SPECULAR_COLOR). Phong shading (aka per-pixel lighting) is not built into OpenGL at present, but can be implemented as a vertex and fragment shader if desired.

Lighting can be confusing. Let d_i(l) and s_i(l) be the Lambert diffuse lighting intensity and Blinn-Phong specular lighting intensity, respectively, from light l. Let g_{a} be the global ambient light from glLightModelfv(GL_LIGHT_MODEL_AMBIENT, …); the material's ambient, diffuse, specular, and emission colors be m_a, m_d, m_s, m_e; and L be a set of lights, where each light has ambient, diffuse, and specular colors l_a, l_d, l_s. Then the lit color is

(4)
m_e + m_a \otimes \left(g_a + \sum_{l \in L} l_a\right) + m_d \otimes \left(\sum_{l \in L} d_i(l) l_d\right) + m_s \otimes \left(\sum_{l \in L} s_i(l) l_s\right)

\otimes indicates element-wise multiplication.


By default, m_a = g_a = (.2, .2, .2, 1) and m_d = (.8, .8, .8, 1); all other colors default to (0,0,0,1) except the diffuse and specular colors of LIGHT0, which are both (1,1,1,1).

Writing Shaders

TODO: write enough to get people going… include difference between fixed pipeline, standard shaders, and GPGPU

page_revision: 8, last_edited: 1213814496|%e %b %Y, %H:%M %Z (%O ago)
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-Share Alike 2.5 License.