Shambles Software
i am a master spy

That’s not really a question but yes, yes you are.

How to build an atlas

As we all do at some point in our lives, I needed to build texture atlases computationally. Filling up a large image with lots of smaller ones in an optimal way is NP-Complete, aka unsolvable. It’s the 2D equivalent of the knapsack filling problem. So I packed my approximation bags and boarded the heuristic express to guess-ville. It turns out you can get great results with even a very simple algorithm.

I start with a set of images and have one large image, or ‘atlas’ to place them all in to. I don’t try to find the best atlas size here, I just have a predefined atlas size and want to fit as much on to it as I can. The commonly suggested method is using BSP trees to partition free atlas space. That’s similar to the approach I took, but I simplified the tree structure away. Here’s what I did.

The first step is this sort of situation is always going to be sorting the target images in terms of awkwardness. We want to place the most difficult to place images first so that we don’t waste space. What makes an image ‘hard to place’?

  • Larger images are more difficult because they use more empty space.
  • Squarer images are more difficult because they don’t fit well in to spare space in the atlas.

The largest, squarest images are hardest. Large images have a large area (= w * h), and squarer images have a smaller perimeter (= 2w + 2h) than equally sized counterparts. A good ranking value is area divided by perimeter, or ((w * h) / (2w + 2h)). Multiplying by 2 cancels out the constants in the denominator. This is a very simple and very effective way of estimating an image’s placement difficulty: ((w * h) / (w + h)). I started by sorting my images with this ranking, with the first image scoring the highest and the last image scoring the least.

At this stage, there is another simplification that could be performed. That is to transform all images to portrait (or landscape) form. Just loop through each and if you find that w < h, rotate the image by ninety degrees and remember to correct the uv coordinates later. That way every image will have the precondition that w >= h and it might make the placement easier. I never bothered doing this because I’m a little lazy.

To place the images, I made a list of rectangles of free space in the atlas. Initially I placed the full area of the atlas in to this list. For each image I needed to place, I searched the list of free rectangles for the best fit. How did I calculate the best fit? I picked the rectangle that had the snuggest fit, which wasn’t a particularly clever heuristic but it did a good enough job. It’s just the minimum axis of the rectangle size minus the image size. If it’s less than zero, it means the image will not fit in this rectangle.

I copied the image in to the best fitting rectangle, and stored its uv coordinates. I removed the rectangle from the list and then added in up to two extra rectangles of unused space to the list of free rectangles. If the fit was snug, it may not leave two rectangles of free space, it could leave one or zero. To divide up the rectangle, I picked the cuts that leaves the largest rectangle.

I kept doing this until all images were placed or until space ran out, which is also known as “the bad ending”. The results were very good. A few tweaks here and there might improve it, but even this naive implementation is getting >90% coverage, as you can see in this slightly faked-out real world example:

And that was good enough for me.

Abnormal Distributions

When tinkering with my particles, I thought it would be nice to distribute them in various different ways. I wanted to add a ‘normal’ distribution alongside my existing linear one. C++11 has a bunch of fancy classes for generating random values in this way, but they seemed a little heavyweight for me. I just wanted a crappy function that does the job for my little particles.

The first thing that I did was scour the internet for clues and answers. The traditional way of generating normally distributed random values is called the ‘Box-Muller’ transformation. I believe it has nothing to do with yoghurts. It looks like this:

  float r, x, y;
  do
  {
    x = randf(-1.0f, 1.0f); // linear random between -1 and 1
    y = randf(-1.0f, 1.0f);
    r = x*x + y*y;
  } while (r == 0.0f || r >= 1.0f);
  
  return sqrt(-2.0f * log(r) / r) * x; // or y

This does work… so, uh, you can probably stop reading. I’ve made a little graph of what it looks like right here. It’s nice.

image

HOWEVER, I wanted something simpler than that and I didn’t even care if it was quite crap.

You can convert from a linear distribution to any distribution you have a function for provided you can do two simple steps. First, integrate your distribution’s function. That gives you the cumulative distribution function. Then, invert that function. Now you have a generator. Use that generator on a linear distribution and hey presto, you have your desired distribution of random values.

Okay, so integrate and invert aren’t the easiest things to do. In fact, they’re often not possible. The normal distribution function has the form e^(-x^2). This doesn’t have a simple integral. That’s why there isn’t a simple generator for the normal distribution.

However the graph of the integral of the normal distribution function looks like the graph of hyperbolic tan. They’re not the same but they’re pretty much the same. Close enough. Now, the inverse of tanh is arctanh, which has a really simple expression. It’s just this:

  return 0.5f * log(1.0f / randf(0.0f, 1.0f) - 1.0f);

Yes! That is reasonably similar to the normal distribution and much, much simpler to spit out.

image

Compare it to above, you’ll be pleasantly surprised I am sure. It is a little leaner down the middle but the overall feel is the same. That satiated my need for normally distributed data.

How to write a JSON-like parser with templates

I’m writing this on the off-chance that you despise yourself and want to become riddled with templates, but also parse some JSON-like data at the same time.

wat why??

Well, there are a number of reasons you might want to do this. Here’s a breakdown of my motivations:

  • I needed to import data structures of various kinds in to my latest project and I am not a fan of writing different parsers or layers on top of loosely typed parsers. I wanted something strong and powerful like a muscleman.
  • A parser that exists entirely inside a type system is pleasurably extendible. Any additional features get blasted back to all existing parsers.
  • This kind of stuff tickles my fancy.

That’s all the motivation I needed to get my parse on. I also updated my compiler/build to use C++11 because this sort of thing is deeply unpleasant without variadic templates.

Let’s tokenize!

The first step to parsing is to tokenize. There are plenty of existing tokenizers out there, but that’s irrelevant because there are also plenty of existing DSLs out there and all this has been done before better. So don’t think about that and write a tokenizer. It’s incredibly easy! Here’s what I did, roughly.

class Tokenizer
{
  // private member variables

public:
  Tokenizer(const std::string& text);
  ~Tokenizer();

  void skipWhitespace(); // moves to next token

  bool accept(const std::string& token);
  void expect(const std::string& token); // exception if token not found

  template<typename T>
  T extractToken()
  {
    // interpret the next token as type T or exception
  }
};

So far, so good. The tokenizer has a read point over the text, which is given in the constructor. ‘skipWhitespace’ moves the read point to the next interesting thing after all the spaces, tabs, returns, comments, etc… ‘accept’ checks if the next token matches a string and if it does, it moves the read point past it and returns true. ‘expect’ calls ‘accept’ then potentially throws a hissy fit.

'extractToken' interprets the tokens as types. I simply made it pull out the token as a string (by calling extractToken<std::string>) and then used std::istringstream to pull a type out. That's a reasonably safe way of handling rubbish input, and it's always possible to make it strong and aggressive later.

For tokens, I simply used string matching, but it’s entirely plausible to stuff enums in here for different token types and add some tools or supporting classes for matching tokens.

Wrapping the tokenizer up

It’s great we have a tokenizer (woo) but don’t start honking your own horn yet. We’ll need to create a type that uses the tokenizer. Why? Because this is exactly the sort of thing you do with templates.

template<typename TYPE>
struct Element
{
  typedef TYPE target_type;

  static void parse(TYPE& target, Tokenizer& t)
  {
    target = t.extractToken<TYPE>();
  }
};

Ooookay, that seems kind of pointless, but it’s not. We can now declare types which are parsers which can parse most built-in types from text. YES. So Element<int> is a parser which sets an int to the contents of some text which has an int in it, uh… Look, it’ll all come together when this stuff is combined in to bigger parts. Which is next.

Data structures

Parsing a data structure is a little less straightforward. You have a struct type, it has members each of which have their own types. That means they have their own parsers, whether they are Element<T> type, parsers for other structures or something else. I decided to use two types for this: Structure, to describe a struct in terms of its members, and Member to encapsulate one member of a struct. Let’s take a look at Member first.

template<typename CLASS,
         typename TYPE,
         TYPE CLASS::*MEMBER,
         typename PARSER = Element<TYPE>>
struct Member
{
  static void parseMember(CLASS& target, Tokenizer& t)
  {
    TYPE& type = target.*MEMBER;
    PARSER::parse(type, t);
  }
};

You can use a pointer to member variable as a template parameter because at its heart it is just an offset. It’s typed in two ways, though. To the class and to the type of the member variable. That means the Member template parameters are quite bulky, but don’t worry, we can macro the pants off it later.

I have not included a typedef for the CLASS parameter, and the parse function is called ‘parseMember’. This doesn’t fit the same pattern as Element. That’s because this template can’t be at the top level of a parser, it has to belong to a Structure, so I’ve deliberately phrased it a little different.

Note that by default it assumes the parser is going to be an Element<T> type.

And let’s see what a structure looks like…

template<typename TARGET,
         typename... MEMBERS>
class Structure
{
  typedef TARGET target_type;

  static void parse(TARGET& target, Tokenizer& t)
  {
    // ... implementation ...
  }
};

Okay, I left out most of the implementation because I’m busting out the variadic templates now and it might be a good time to recap how this stuff will be used. Here’s an example of creating a parser for a struct:

struct Example
{
  int i;
  double d;
};

typedef Structure<Example,
                  Member<Example, int, &Example::i>
                  Member<Example, double, &Example::d>
                 > ExampleParser;

That typedefs a little parser type that will be able to interpret something like “{2, 5.3}” and plop it right in to the struct. And of course, ExampleParser can be used as a fourth parameter to Member so structs containing Example are eligible for parsers too. The power! It’s growing!

Let’s get our hands a bit dirtier. How might the Structure template work internally? Thanks to variadic templates, it’s actually quite a pleasing experience.

template<typename TARGET,
         typename... MEMBERS>
class Structure
{
  template<typename MEMBER>
  static void parseMember(TARGET& target, Tokenizer& t)
  {
    MEMBER::parseMember(target, t);
  }

  template<typename M1, typename M2, typename... OTHER_MEMBERS>
  static void parseMember(TARGET& target, Tokenizer& t)
  {
    parseMember<M1>(target, t);
    t.expect(",");
    parseMember<M2, OTHER_MEMBERS...>(target, t);
  }

public:
  typedef TARGET target_type;

  static void parse(TARGET& target, Tokenizer& t)
  {
    t.expect("{");
    parseMember<MEMBERS...>(target, t);
    t.expect("}");
  }
};

That’s deceptively simple. The key part is that if you call a variadic template with the parameter list as an empty list, it makes itself magically disappear. The Structure expects ‘{’ and ‘}’ around the members, and expects a comma between any two members. Fantastic.

A little bit more

The next step would be arrays. They take a form like “[6, 0, 3, 1, 7, 6, 9]”, with an unknown number of elements of matching type. They’ll need to go in to a std::vector or a std::deque or something, so the Array parser will have to be container agnostic. Each container is a different type, but so are our parsers so that kind of works okay. Usage of ‘accept’ and ‘expect’ from the tokenizer makes this task fairly sane.

template<typename TARGET,
         typename PARSER = Element<typename TARGET::value_type>>
struct Array
{
  typedef TARGET target_type;

  static void parse(TARGET& target, Tokenizer& t)
  {
    t.expect("[");
    if (t.accept("]"))
      return; // empty array

    do
    {
      typename TARGET::value_type value;
      PARSER::parse(value, t);
      target.push_back(value);
    } while (t.accept(","));

    t.expect("]");
  }
};

No variadic magic this time, because we don’t know how many elements there will be at compile time, we have to tokenize it and find out. The container has to support the standard push_back function and have the value_type typedef, which all the standard containers do.

That’s enough of this. I made a few more different parser components for myself, but I’m not going to go over them. I made a Nullable component that would accept “null” for a pointer and make it zero instead of parsing a target. I also made an associative container type which uses JSON-style ‘key:value’ syntax. It’s very similar to the Array type.

I’ll leave those as exercises for the reader (meaning: this is already long enough).

This stuff is unusable

Yes, the shame of it. As anyone could have guessed, building a parser out of nested types makes a massively unwieldy mess of types. How can this be cleaned up? And how do we even use this in a sensible way? I’ll answer the second question first.

void resourceToText(std::string& text, const std::string& resource);

template<typename PARSER>
bool parse(typename PARSER::target_type& target, const std::string& resource)
{
  std::string text;
  resourceToText(text, filename);

  Tokenizer t(text);

  try
  {
    PARSER::parse(target, t);
  }

  catch ( /* my exception */ )
  {
    // Output the exception details showing where the parsing failed
    return false;
  }

  return true;
}

That’s quite simple. Basically just pull out the target_type from the parser and combine it with whatever resource loading you do. The exceptions the tokenizer can throw can be dealt with sensibly. Now to parse a structure, just call straight to parse<ParseType>(myType, “data/myType.blah”); and you’re done.

The only remaining problem is how to typedef the parser types without becoming hopelessly lost in the depths of template hell. Well, I saved the worst for last: Jam everything full of macros and pretend nobody saw.

Okay, so there’s actually a pretty clever trick you can do with pretend functions and C++11’s decltype operator to decompose a pointer to member in to its respective class and type. Let’s have a quick peek at that.

template<typename CLASS, typename TYPE>
CLASS extractClass(TYPE CLASS::*);

template<typename CLASS, typename TYPE>
TYPE extractType(TYPE CLASS::*);

#define MEMBER(x) Member<decltype(extractClass(x)), decltype(extractType(x)), x>
#define MEMBER2(x, y) Member<decltype(extractClass(x)), decltype(extractType(x)), x, y>
#define ARRAY(x) Member<decltype(extractClass(x)), decltype(extractType(x)), x, Array<decltype(extractType(x))>>
#define ARRAY2(x, y) Member<decltype(extractClass(x)), decltype(extractType(x)), x, Array<decltype(extractType(x)), y>>

Yikes. But this means instead of doing Member<Example, int, &Example::i> we can just do MEMBER(&Example::i). And instead of Member<Example, ExampleChild, &Example::child, ChildParser>, a simple MEMBER2(&Example::child, ChildParser) will suffice. The Array versions work similarly. I’m sure that was perfectly clear.

That makes our typedefs a lot cleaner. Here’s a real (fake) example of one I used.

struct ParticleFrame
{
  int frame;
  int delay;
};

typedef Structure<ParticleFrame,
                  MEMBER(&ParticleFrame::frame),
                  MEMBER(&ParticleFrame::delay)
                 > ParticleFrameParser;

struct Particle
{
  int delayMultiplier;
  std::vector<ParticleFrame> frames;
};

typedef Structure<Particle,
                  MEMBER(&Particle::delayMultiplier),
                  ARRAY2(&Particle::frames, ParticleFrameParser)
                 > ParticleParser;

// later on..
  Particle particle;
  if (parse<ParticleParser>(particle, assetFilename))
    d->particles.push_back(particle);

I think that’s pretty sweet, actually.

How to get annoyed at Chess

When people talk about game design, chess is often cited as an example or counter-example. This irks me right in the ribs. I don’t really have anything against chess, it is quite an interesting game. If you have even a passing familiarity with it, you should watch this because it is both fascinating and historical.

What irks me is that people make a lot of assumptions about chess. While what they are saying might be perfectly valid, I kind of use it as a warning flag for badly thought out opinions. Chess has a reputation for being a very intellectual game, which it is and it isn’t. There are a lot of things about it that aren’t special. So I’ll make a list.

  1. Chess isn’t designed at all, so using it as an example of good design is fallacious. It’s a game that emerged or evolved over time. There’s weird vestigial stuff in it, like the reason a rook is called a rook. En passant is a balance hack that doesn’t feel in any way like it was designed in advance. Mostly because it wasn’t.
  2. Chess is a perfect information game. That doesn’t mean that it’s a super good awesome brainiac puzzler for wise long-bearded old men. It just means that there’s no randomness and no hidden variables. Tic-tac-toe is a perfect information game too, but nobody ever cites that because it sucks.
  3. Chess (probably) isn’t balanced. It’s quite likely that white has an advantage. That’s not a bad thing, but again, people treat chess as some sort of flawless ideally designed game without apparently knowing obvious stuff like this.
  4. Whatever it is they are using chess as an example for is probably better served by using Go as an example, and if they knew anything about classic board games they would know that.

Go is an older board game than chess and it is also more simple and more complicated at the same time (yes). It is also feature rich:

  • Has amazingly simple rules.
  • Has a working points system that avoid draws and attempts to balance the game for both white and black.
  • Has a proper handicap system that makes a lot of sense.
  • Has reduced board sizes for quicker games and beginners and so on.
  • Has a secret minigame to pass short amounts of time.

Oh but get this: Go also wasn’t designed. Aside from the historical context, it seems well thought out and logical. If you don’t know it and you have any interest in board games, you should definitely learn it.

I don’t mean to dispect chess or claim that Go is better, I’m mostly jiving at people who toss out chess in conversations about games when they don’t know which way round the board goes (hint: white square at bottom right). I guess I just like people to actually read up on their opinions and know what they’re talking about.

I’ll end with a quote because there’s a quote button here and there’s a quote which sums up all this stuff better than I did. In fact I probably should have just used this quote and saved some typing.

While the Baroque rules of Chess could only have been created by humans, the rules of Go are so elegant, organic, and rigorously logical that if intelligent life forms exist elsewhere in the universe, they almost certainly play Go.

-Edward Lasker (chess player)

How to get bored of voxels

I admit it, my heart wasn’t in it. I stopped working on my voxel engine a while ago and then I was busy on holiday and so on. I’m making something 2D now. In a while I will post a rant about chess, look forward to it.

Writing a box engine part 8: Geometry construction

I haven’t been updating my block engine stuff for a while because I have been writing other parts of my game! Shocking, I know. Let’s continue…

In previous installments, I have glossed over geometry generation and alluded that ‘whatever works for now’ is good enough. I think it’s time to work through it in detail.

By storing the chunk’s geometry in a separate class, we can separate the geometry creation from the chunk entirely. This is good because it’s entirely likely that geometry creation would be a candidate for threading later. For geometry creation itself, I’ve split it in to two parts. One part is figuring out what to make, as in what faces in what directions using what textures. The other part is actually producing the vertices based on that information. I’ve encapsulated them inside GeometryConstructor and FaceConstructor. Here’s what I have.

The geometry constructor takes a target chunk and a face cardinal direction (‘face type’) and generates a list of coordinates and texture numbers. The face constructor takes a coordinate, texture number and face type and spits out a set of polygons. The geometry constructor gathers all the polygons together and plops them in to the chunk’s geometry object.

This all sounds fairly complicated, but it makes each part quite simple. Here is the diagram again with the data flow indicated.

The geometry constructor needs to query a lot of block data. It needs to check every block in its target chunk, which for me is 32x32x32 = 32768 blocks. Each of these blocks needs to be compared to a neighbouring block to figure out what face it generates. Assuming there’s no clever optimization, that means the geometry constructor will need to perform 2x32768 = 65536 lookups on blocks. Of these lookups, only 32x32 = 1024 of these blocks will lay outside the target chunk

There’s a bit of a muddle here. Looking up each block through the field every time is going to do a whole lot of unnecessary field queries. We can figure out the target chunk and compare the blocks in there, but then you can easily get caught out with the 1024 outlying blocks. There’s a high risk of spaghettification. For example:

BlockCoord coordOffset(const BlockCoord& source, FaceType face)
{
  switch (face)
  {
    case FaceLeft: return BlockCoord(source.x - 1, source.y, source.z);
    case FaceRight: return BlockCoord(source.x + 1, source.y, source.z);
    case FaceDown: return BlockCoord(source.x, source.y - 1, source.z);
    case FaceUp: return BlockCoord(source.x, source.y + 1, source.z);
    case FaceFront: return BlockCoord(source.x, source.y, source.z - 1);
    case FaceBack: return BlockCoord(source.x, source.y, source.z + 1);
    default: return source; // do proper error handling...
  }
}

  // Later, inside geometry constructor...

  for (int z = 0; z < BlockChunk::Size; ++z)
    for (int y = 0; y < BlockChunk::Size; ++y)
      for (int x = 0; x < BlockChunk::Size; ++x)
      {
        BlockCoord target(x, y, z);
        BlockCoord comparison = coordOffset(target, face);

        // 'comparison' may or may not be a coordinate that is outside of the chunk
        // It depends on which face type is being processed...

      }

It would be nice if the search order could be organised based on the face type so that the 1024 outside-of-chunk blocks always appeared at a predictable point in the algorithm. Well, that’s entirely possible by making a simple helper.

class BlockCoordGenerator
{
  int a, b;

public:
  void reset()
  {
    a = 0;
    b = 0;
  }

  void next()
  {
    if (++a == BlockChunk::Size)
    {
      a = 0;
      ++b;
    }
  }

  bool isValid() const
  {
    return b < BlockChunk::Size;
  }

  BlockCoord coord(FaceType face, int depth)
  {
    switch (face)
    {
      case FaceLeft: return BlockCoord(BlockChunk::Size - depth - 1, a, b);
      case FaceRight: return BlockCoord(depth, a, b);
      case FaceDown: return BlockCoord(a, BlockChunk::Size - depth - 1, b);
      case FaceUp: return BlockCoord(a, depth, b);
      case FaceFront: return BlockCoord(a, b, BlockChunk::Size - depth - 1);
      case FaceBack: return BlockCoord(a, b, depth);
      default: return source; // do proper error handling...
    }
  }
};

With a helper like this, you can iterate across an entire layer (or ‘depth’, the parameter to the coord function) in one pass. As long as 0 <= depth < BlockChunk::Size, then the generated coordinate will be inside the chunk. That means depths 0..30 can be checked relying on the target chunk as they compare against depths 1..31. Depth 31 compares against depth 32 which is outside the chunk, so it can be handled differently. Example!

  BlockCoordGenerator generator;
  for (int depth = 0; depth < BlockChunk::Size - 2; ++depth)
  {
    for (generator.reset(); generator.isValid(); generator.next())
    {
      BlockCoord target = generator.coord(face, depth);
      BlockCoord comparison = generator.coord(face, depth + 1);

      // Both target and comparison are inside chunk
    }
  }

  // Final layer
  for (generator.reset(); generator.isValid(); generator.next())
  {
    BlockCoord target = generator.coord(face, BlockChunk::Size - 1);
    BlockCoord comparison = generator.coord(face, BlockChunk::Size);

    // target inside, comparison outside
  }

This helps to cut a lot of field access out and lets the geometry constructor work directly in the chunk for the most part. In my actual code, I merged the BlockCoordGenerator and the GeometryConstructor and templated the whole thing based on face type.

Next time, let’s look at how the geometry constructor decides what to send to the face constructor. It’s not very complicated!

Writing a box engine part 7: Textures

Textures are quite easy to do. The first thing one might think to do is to use a texture atlas, which is where you put all the different textures in one big grid. Here’s an example of a 2x2 texture atlas with stone, grass, sand and dirt:

The uv coordinates for stone would be (0, 0) to (0.5, 0.5) in this case. There are a couple of issues with texture atlases that are worth noting.

  • If you use linear blending, the edges of the textures will bleed through each other. That means you have to use nearest.
  • Automatic mipmap calculation might well be wrong.
  • There’s no easy way to repeat the textures over a surface, it’s basically like you’re stuck on clamp-to-edge.

None of these problems are insurmountable, but they are kind of annoying. The absence of repeating textures completely closes out the opportunity for optimizing meshes. The forced use of nearest blending is also unfortunate.

A better solution is to use texture arrays. This is where individual 2D textures are stored in an array. It’s sort of like using a 3D texture but with no blending on the third dimension.

That’s the same textures as shown before but now split up and put in to a texture array. Fortunately this is a hands-down winner over using texture atlases. It fixes all of the above issues and there’s only one downside which is extremely minor: all the textures have to be the same size. That’s probably going to be the case anyway.

Texture arrays are available from OpenGL 3.2 core profile, and are used just like 3D textures. The extension is also commonly available from around version 2.1.

Writing a box engine part 6: Blocks and flyweight

Here’s a class diagram summarizing the last five parts:

The controller is my top level container. That could be a game engine, a level editor, or whatever. It contains a Field and a FieldRender. The latter contains all the information which is useful in the field’s rendering function. The field contains some Chunks, and the chunks have a map coordinate, a whole load of Blocks and a store of their geometry.

When rendering, the field takes a FieldRender, walks through each applicable chunk and tests it against the Frustum. Then it creates a final matrix from the projection matrix and the chunk’s position, and sets that to the uniform. Then it tells the chunk to render the geometry for the cardinal directions visible from the camera position.

The chunk’s geometry is generated by examining its block data. Earlier on, I stated that I’m using a 16 bit integer to represent my blocks. Here lies a problem: This isn’t really helpful from an OOP point of view. Why not just use classes?

In C++, each class gets a 4 byte space for the vtable1, and more bytes for the data of that class. Also, each allocation of a class will add a few extra bytes of info for the memory allocator. Add a few bytes penalty if your archetecture requires memory alignment. There’s also memory fragmentation and the slowness of allocation to worry about.

Making each block a class instance will add a lot of baggage to them. In most circumstances it wouldn’t be a problem, but in a box engine you really want to allocate a million billion little blocks really quickly without honking up the machine. It is imperative to use as little cruft as possible. On the other hand, a class hierarchy with a standard API would still be useful. Fortunately there is a way to get the best of both.

The block data itself can be laid out in numerous ways. Its bits could represent:

  • A block type
  • Some standard flags
  • Some built in values (light level, water level)
  • Custom block state

I am using 8 bits for block type and 8 bits for block state. I am not recording any standard flags or values.

To access this through a class hierarchy, we must use a flyweight pattern. The flyweight pattern is where you set up your class hierarchy but make your classes stateless. Stateless means they are missing their internal data. Each of the standard functions in your class API should take its internal data as a parameter. Then you instantiate one copy of your class hierarchy and only use those instances.

When I want to query a block, first I check its block type and look up the correct block class pointer from an array. This block class pointer is an inherited version of a base block class. The functions of this class require the block data is passed in to them. For example:

BlockData blockData = field->at(myCoord);
Block* block = Block::blockFromType(blockData);
int texture = block->texture(blockData, FaceFront);

In this example, ‘texture’ is a const function so there is no need to write the blockData back to the field. Other functions might modify the blockData value. For safety, a class can be written that wraps the block class entirely and performs the correct read and writes. Here is an example class layout with the example texture function.

class Block
{
public:
  static Block* blockFromType(BlockData data);

  Block();
  virtual ~Block();
  
  virtual int texture(BlockData data, Face face) const;
}

class BlockAccessor
{
  Field* m_field;
  BlockCoord m_coord;
  BlockData m_data;
  
public:
  BlockAccessor(Field* field, const BlockCoord& coord) :
    m_field(field),
    m_coord(coord),
    m_data(m_field->at(m_coord))
  {
  }
  
  ~BlockAccessor()
  {
    if (m_field->at(m_coord) != m_data)
      m_field->set(m_coord, m_data);
  }
  
  int texture(Face face) const
  {
    return Block::blockFromType(m_data)->texture(m_data, face);
  }  
};

This set up allows me to overload the Block class, create new block types with whatever internal logic I need, and then access the field through the BlockAccessor as if it is really storing instances of classes.

Next time: Faces and textures, I think.

1: classes without virtual functions do not have a vtable.

Writing a box engine part 5: Frustum culling

I brushed over frustum culling last time because it’s worthy of its own post. Which is this.

The total of everything which the camera can see is going to be contained in some area. This area is called the ‘frustum’. It’s going to bounded by the near and far clip distances and also by the field of view of the camera. You can see more further away, so it won’t be a cube or cuboid, but rather a trapezoid.

Let’s return to the 2D example because 3D is hard to draw in paint. This image shows what the camera area might look like for a 2D world:

To convince the camera, we only need to draw things whose boundaries intersect or overlap that frustum. This is quite easy to calculate, if a little mathematical.

First we need to represent the frustum. This is usually done as a series of plane equations. So you can think of the frustum as a combination of infinite planes:

One equation for each side of the frustum. In 2D, it has four sides but in 3D it will have six sides. The area behind each plane is outside the frustum, and the area which is in front of all planes is inside the frustum. We’re going to look at the outside area behind the planes, which is highlighted in blue:

Or, separated for each plane:

If an object is completely contained in any of these light blue areas, it will be outside the frustum. The red objects here are outside the frustum and fit this criteria:

So a simple algorithm emerges:

  • Find frustum planes.
  • For each plane test if the object is behind the plane and if so, don’t render it.

Note that the converse is not true, so this isn’t perfect. There’s a chance that it could fail the plane test and still lay outside the frustum. We don’t need to worry about this edge case because it would just mean we render something which we needn’t have bothered with. The following red object shows this edge case:

The plane equations are easy to find, but a little confusing to derive. There are other blogs and articles which describe the maths behind them, so I’ll skip it. You need to pull rows out of your projection-camera matrix. There are four rows, which I’ll call respectively x, y, z and w. The four planes are given as:

  • w - x
  • w + x
  • w - y
  • w + y
  • w - z
  • w + z

It will also be useful to normalize these once they have been found. Once normalized, you get a vector for each: (Px, Py, Pz, Pw). The first three components (Px, Py, Pz) form the normal of the plane, and Pw is the offset from zero. Using the dot product you can project a vector on to the plane’s normal, and then compare it against Pw to find if that vector protrudes through the plane.

To test if a point is inside or outside of the frustum:

bool Frustum::isPointInside(const Vector3& point) const
{
  for (int t = 0; t < 6; ++t)
  {
    if (point[0] * m_plane[t][0] +
        point[1] * m_plane[t][1] +
        point[2] * m_plane[t][2] +
        m_plane[t][3]  < 0.0)
      return false;
  }

  return true;
}

If the calculation gets a value of less than zero, it means the point is behind that plane and thus outside the frustum. To test a sphere, we just add the sphere’s radius in to the test:

bool Frustum::isSphereInside(const Vector3& centre, float radius) const
{
  for (int t = 0; t < 6; ++t)
  {
    if (centre[0] * m_plane[t][0] +
        centre[1] * m_plane[t][1] +
        centre[2] * m_plane[t][2] +
        m_plane[t][3] + radius < 0.0)
      return false;
  }

  return true;
}

Testing a sphere is good enough, even if working with cubes. Just create a sphere from the centre of the cube which bounds it tightly. It is possible, with some precalculation and an additional multiply, to get a more precise test for a cube. In profiling I found a sphere test only gets a false positive compared to the cube test in around 1.5% of cases, so haven’t gone in to it here.

That’s a very quick, haphazard description of frustum culling. For further info on the maths behind it, check out the dot product projection and plane equations.