How I Stop Worrying and Love Data-Oriented Programming!

Created on: 08 Mar 25 14:00 +0700 by Son Nguyen Hoang in English

A lengthy discussion on advantage of Data-Oriented Programming over OOP in Game Development

Game development should be full of cheerfulness, making game should not be a hassle. But every developer in this field know that game architecture should never be underestimated. Coding new features, build one mechanics upon older mechanic means letting one foot down to the River of Styx!

“Code should be extensible but not easy to modify” - one said (this is one of pillars of SOLID principle), but it’s easier said than done. Code scaling is a nightmare, like an old Lovecraftian novel where at first they are just ugly creatures, which appear shadowy, blurry and obscured, yet every one dare to go one step further into abyss immediately recognize the figures to be an old god: messy, chaotic, armed with tentacles and leaves an impression of a chthonic being!

Is there a better way to build software (and game in particular)? Should I follow the good old Martin’s advice (wink wink, Clean Code!). Should we spend time deeply to study game design pattern instead? Should we go back to the basic of Computer Science, looking for the Holy Grail of programming paradigm in a Renaissance spirit like the classic enlightenment-era philosophers? What about the new fancy ECS and its DOTS system? This one seems lovely albeit the API seem overcomplicated. However, its underlying methodology is surprisingly good, especially for video game! But first, let’s dive deep into how games are commonly made.

Traditional approach: Object-Oriented Design

A video game is commonly abstracted into different objects, each with number of properties (e.g Health, Point). This would meant Object-Oriented Programming seem intuitively fit for the problem. Albeit, this is not the only way and in reality, various programming paradigms can be implemented in the same project. A mix with functional programming and OOP is common, especially, there were even discussion of “Pure Functional Programming” for game dev, but this is out-of-topic for this article [0].

However, what is the catch of Object-Oriented Programming? The thing is OOP in theory is solely about Object (or Instance, Entity, … depends on language), but in reality, it’s never only about object and class, but rather about relationship between them. How to manage this relationships is the hardest problem (the same is true in real life). This, I would argue, is the reason why so many design patterns were born, in fact I would strongly believe that the creation of OOP lead to the advent and the need of design pattern in general.

In game developments, the number of objects scale relatively fast. More objects would mean more relationship and communication between objects, sometime in same level of abstraction and (very commonly) between different layer of abstraction. Soon, you would realize that putting critical data (properties like Health, Point) in each gameObject (entities, or gameObject) would make communication between system much heavier and more troublesome, even difficult to debugging code. We would illustrate this point in the example below.

Example game: A Game of Blocks

Image we have a small 2d platforming game involves with simple game-goal: Traversal to the destination, building pathway by combining different blocks¸ control and merge blocks into bigger ones to patch the broken road.

Upon this small game idea, the old-me, fresh, lack of experience, naïve and full of wishful thinking develop the game would sketch a architecture included of:

Block gameObject (This article focusses on implementation for Unity Game Development Btw). This gameObject (and its components) contain all data related to the block itself (e.g Position).
Commonly, This would imply the need of a BlockManager.cs class, that have reference of all blocks in the game and ideally contain a GameState. From here the developer command BlockManager to controls its Block. Example of control would be UI-updating, toggling effect, showing & moving Blocks.

This would imply we first have a relationship between different level of abstractions; this one is the link between BlockManager vs Block. Quite self-explanatory, isn’t it? We refer this as relationship 1

But that’s never enough. The game allows blocks to be “merged” into bigger block, this bigger one can still have all above state, they can be moved together. Hypothetically, after the merge, moving one block would also modify position of the block-in-partner. This would again imply there is a need a second relationship: block vs block, or a communication method between blocks. This is named as relation ship 2

Communication in OOP and Game Development

A little push to the game design, hypothetically, let’s say there is a special block that once triggered, this would explode nearby blocks instance in a radius of 1 game unit. That would potentially mean:

First the triggered block must have reference to the BlockManager (Relationship 1)
The BlockManager reference to the triggered Block then get the corrseponding position.
Then the BlockManager loops through its all blocks (yes, in Relationship 1, the manager must have references to ALL Blocks in the game), it should check for potential candidates via position, then trigger the explosion on the valid ones.
Hang on, the exploded candidates might belong to another group-of-block (aka bigger block). So after the explosion trigger, the the exploded one must return to its block-partner and tell them: “Hey, I am about to be exploded, delete me in your list also!”. Not doing that we would risk putting ourself into null exception and unwanted behaviors.

Just a simple mechanics, but this includes at least 3 communication signals:

Communication from first triggered block to Manager (Relationship 1)
Communication from Manager to its children blocks (Relationship 1)
Communication from its children blocks to its partners (Relationship 2)

And that’s ignore the block and its own components inside, each contains specific state & data required based on specific implementation and they may or may not communicate to each other.

Thus, arguably, message and communication are important aspect of Object-Oriented Programming. At this point, any web-dev veteran would understandably realize that web backend programming is much more straightforward and why functional programming is so suitable to backend development. You call a method (or an API) and you expect the result. The function rarely modifies the reference value and communication go straight to the point. Modification in actual data is occurred in the Database.

Secondly, what we are all agreed is that Manager must access to data to work. Game mechanic required data to be triggered correctly. The above sample also indicates that a big part of communication is for data querying only. Because data is hard to be retrieved, the Manager have to go forward-and-backward between different entities to look for the candidates.

The catch

Back to the original game, the simple visualization of the system looks like below alt text

Accessing data in this architecture is super annoying & troublesome. Game Features and game mechanic or higher-level manager are useless without data. If data cannot be retrieved easily, codes can get inflated with helper method, which will make your manager class get literally longer. Bigger codebase mean code get worse smell as more features are built upon. Below are some examples to illustrate how inconvenience it can be in this system just to get the data for mechanic to work.

Example: Block Moving

Back to our scenario. A Block in our scenario must hold a list of reference to their partner. In case the block move, it would have to reference to all of blocks in its group then update position in each of them.

That’s pretty average, but in case if you move the block, it must be sure that the new block position do not collide with other blocks. That would mean moving any block requires the followings:

Update all position of the block in group
From this position, ask the BlockManager: “Hey, here are my new position, is there block other than us took it before?”
The BlockManager will again loop through all its child, get the reference, get the position data of each and manually do comparison and return yes or no.
Based on this, each Block will perform change (e.g render a red outline to let everyone know the new positions are not valid, etc).

Absolutely, you can reduce the number of data querying by caching it somewhere, but that would mean you have to manually update the cached data if needed, which increase the order of complexity in the code. You can design a complex communication system, like a classic pub-sub to increase readability but still, it’s unavoidable your code grows exponentially.

Worse, in the actual game, there would be multiple level of manager exist. For example, BlockManager would be injected inside a GameManager, or sometime they would both access by a TutorialManager for example. This would make data accessing become deeply nested. Something like

State state = BlockManager.Instance.FindBlock().ComponentA.State

… would happen.

Of course, to improve code readability, you can add middle function in the layer between two class. The catch is that it would inflate the Manager itself with helper method. Such as:

BlockManager.GetAllInvalidBlock()
BlockManager.GetBlockOfPosition()
BlockManager.GetBlockUnderAnotherBlock()

The above sample proves that code scaling in first architecture get ugly so fast. So, what are the alternative?

A moment of analysis

Back to our web-dev analogy, it’s well known that front-end developer encounters the same issue as what we had discussed so far (albeit they refer to its as state management problem). So how do they solve the problem? They implement a centralized state manager for the app. In React they built a library named Redux to solve it. In Flutter they have RiverPod. Different name and design but all follow the same principle, that is to create a data source for everything. That’s the hearth of our new Data-Oriented Design Programming System (DOD).

A New Architecture

In object-oriented design, object is first-class citizen. But the same is not true for DOD system, In DOD, Data is everything. The new System would look like below:

alt text

The first and most important aspect of this architecture is that all important data (state inside block component) are stayed in a centralized hub, in which data is similar to a SQL database. Each block data now has a unique ID and all required state. Accessing component data are so much easier as in C# because now everyone can access important data directly from the hub. Also, because we are using Unity with C#, this bring us one of the most powerful features of the .DOTnet ecosystem: LINQ. Which LINQ, there is virtually no need for the helper method. Complicate data filtering soon become so much scalable, adaptable and requires much less code.

 GetBlockInSameGroup(byte blockId){
  var hub = Datahub.Instance;
  var (dataBlock, success) = hub.Data.Find(a => a.Id == blockId);
  if (success == false){
    return -1 ; // Invalid id, record not exist
  }
  var data = hub.Data
  .Where(a => a.BlockGroupId == dataBlock.BlockGroupId).toArray();
  return data
}

/// Complicated querying can also be achived using LINQ

GetBlockDemo(byte blockId){
  var hub = Datahub.Instance;
  var (dataBlock, success) = hub.Data.Find(a => a.Id == blockId);
  if (success == false){
    return -1 ; // Invalid id, record not exist
  }
  var data = hub.Data
  .Where(a => a.BlockGroupId == dataBlock.BlockGroupId)
  .Where(a => a.State == BlockState.Valid)
  .Where(a => a.Position != dataBlock.Position)
  .toArray()
  return data

}

Next, don’t forget that in new system, Block are no longer dependent to each other. In the DOD-system, to illustrate the relationship between blocks, we would simply create a data field e.g BlockGroup (number), assign its Block with a unique ID. Whenever there are two or more blocks has the same value for BlockGroup then they would be treated as “partner”. In this scenario, if block moving, the data flow could be

Block Update data, update the position on data hub of its and all other blocks in group. This should be superfast because the position of all blocks also stays in the hub. No complex data accessing is required. Filtering blocks in this case is simple with LINQ (see example above).
Block Call Manager to update all children (by id) and for each of them, trigger the update on actual Transform and other component.

The flow of data of this method is a dramatic improvement from the first approach: alt text

But How to Build Such System?

I would carefully say that this approach follows exact same way as WebDev when first creating a Database. Some prerequisites are:

Identifying the game and analysis what should be “data”. In the current example, the “data” should be Position and State.
Identifying the number of “tables” the game require. Coding game in this approach is hardly different than making a small SQL database inside the game structure. So, how many “tables” for this database?

Once the data analysis has been done, we can design the data to be a struct, which is simple, light weight and much cheaper than class (in c# at least). A small catch is that this struct must have a unique ID, this one would be critical for the game. Every Block in the game has this ID and one only.

In the Manager, instead of having a List<Block>, we would have a Dictionary<ID,Block>, in which each Block has unique Key for fast indexing. In our example of Moving Block, once the new positions are updated, the DataHub would returns the list of modified id. By this output ids, the Manager would update the ones whose id match the list.

If you design the database to be struct only, the problem is to update data correctly. You can use Expression and Dictionary to craft a generic UpdateFunction

public bool Edit(T1 id, Dictionary<Expression<Func<T, object>>, object> updates)
{
    if (!_dataDictionary.TryGetValue(id, out var oldData))
    {
        Debug.LogError("Key not found: $" + id);
        return false;
    }
    Object newData = oldData;

    foreach (var update in updates)
    {
        var propertyExpression = update.Key;
        var newValue = update.Value;
        string name = string.Empty;
        if (propertyExpression.Body is MemberExpression memberExpression)
        {
            name = memberExpression.Member.Name;
        }
        else if (propertyExpression.Body is UnaryExpression unaryExpression && unaryExpression.Operand is MemberExpression unaryMemberExpression)
        {
            name = unaryMemberExpression.Member.Name;
        }
        
        var fieldInfo = typeof(T).GetField(name);
        if (fieldInfo == null)
        {
            Debug.LogError("Null property expression");
            return false;
        }
        if (!fieldInfo.IsPublic)
        {
            Debug.LogError("Property is not public (read-only)");
            return false;
        }
        try
        {
            fieldInfo.SetValue(newData, newValue);
        }
        catch (Exception ex)
        {
            Debug.LogError("Failed to set property: $" + ex.Message);
            return false;
        }
    }
    
    _dataDictionary[id] = (T)newData;
    RebuildArray();
    OnEditedSuccess(id, oldData, _dataDictionary[id]);
    return true;
}

In actual usage, you would use the function like example below:

var update = new Dictionary<Expression<Func<YourDataType, object>>, object>()
{
    { data => data.Field1, Field1_NewValue  },
    { data => data.Field2, Field2_NewValue  }
};
hub.Edit(id, update);

Expression on its own is a super interest topic. The idea behind expression is to datafied the code itself! This would deserve its own analysis and a seperate article. But now, that’s all for the implementation part.

The overwhelming advantages

The first advantage is that data are now easily accessible through out the whole system. In our game, DataHub is a Singleton, so every entity and manager in the game gain access to its. Writing and Editing data is restricted through a small mediator layer in between of your choice.

If you are thinking that doing this would only put pressure to the DataHub. I would kindly agree, however, one strikingly good outcome from this method is that this hub can be scaled horizontally. Image game get bigger, there are enemy, obstacles, projectiles and so one, you can ALWAYS MAKE MORE DataHub. This is somehow very similar to one principle of building extensive, heavy cloud system: that’s to scale horizontally (buying more machines) instead of vertically (upgrade machines).

Needless to say, the Manager benefits the most from these architectures. They would not likely get inflated and even if it does, we can create sub manager at one level below without fear of code scaling, because the data flow now is straightforward. In reality, the project I am doing have Mediators class to keep reference to each group Entity. So the Manager is basically handle the game phase and minimal core game logics!

The next huge plus from this architecture is that debugging is never be such simple. Image having 20 entities (Block) on scene and one of them has weird behavior. How to check for that Block’s specific state? With Data-Oriented system we can easily craft a data viewer to investigate data directly. Here is a snippets of a custom data viewer we made for our project. alt text

Closing thoughts

Migrating system into a Data-oriented architecture gave me a lot of joy. Never before I realized so much of the code base are duplications, coupled or convoluted. Data-oriented system resolves much or the issue and helps code scale better than ever!

I am now wonder how official Data-oriented technology stack of Unity (aka ECS and Dots) work. Although I know DOTS for a long time, I never have implemented its into a real game. So that would be another project in my to-do-list!

References and Extra Read

Brian Will have a very good video on OOP, I would leaves it here: https://www.youtube.com/watch?v=QM1iUe6IofM

Richard Fabian wrote a greate introduction on DOD. Much of the book use analogy of game dev so you would find it so much accessible: Data-Oriented Design: Software Engineering for Limited Resources and Short Schedules (isbn:9781916478701)

[0] https://prog21.dadgum.com/23.html