How I cheat in Civilization VI with the SIEM I built in Microsoft Sentinel
This blog is to outline the steps I took to ingest in-game Civilization VI logs into Microsoft / Azure Sentinel and treat them as though they are real-world data, in a real-world SOC and essentially allowing me to cheat against the AI players. By following along you’ll see how I:
- Optimize and standardize log ingestion from multiple, horrifically configured data sources and types using a Log Ingestion API.
- Develop Analytic Rules with KQL to alert on in-game metrics and events.
- Utilize those alerts into SOAR actions by using Playbooks / Logic Apps to respond automatically.
- Make use of Sentinel Notebooks with Python to produce better visuals than you would normally get with KQL.
This is an extension of my previous project where I wrote an in-console ‘SIEM’ to alert me on new enemy unit spawns and when those units were approaching my capital city by matching regex patterns in the in-game logs. This project achieves the same outcome but with the more formal use of Azure and it’s suite of services.
If you don’t mind some creative justification, this weekend project has been a great microcosm of Cyber Security in a lot of ways. For example, our capital city in the game has a ‘Firewall’ defined by its hexes on the map and you’ll read that we can configure the logs in a way so that we’re alerted on when enemies pass that border. Other civilization’s (read: threat actors) units can have their X, Y co-ordinates logged in game and in a real-world scenario, if we can do that we should. I role play all that here by build a small scale SOC with the relevant end-to-end cloud infrastructure in Azure.
Or put simply, I discovered a very convoluted way to cheat in Civilization 6.
Step 1: Log Ingestion
First thing to do is to ingest the in-game logs from my local PC in to a Log Analytic Workspace (LAW)— after fumbling around with an Azure Monitoring Agent (AMA) and the now legacy Microsoft Monitoring Agent (MMA) for a couple of hours I decided to go with a custom Log Ingestion API simply because I could send logs to my LAW in near real time as opposed to every 5 minutes. It meant a lot of extra work upfront but once it was running, I knew it would give me the best results.
Log Transformation —
The logs are saved locally in either a ‘.CSV’ or ‘.log’ file type and not the JSON format Azure wants and needs for the API - This meant that I needed to convert the logs to a JSON format before sending them off. This was a little trickier than expected because these logs aren’t at all designed for this kind of use-case. I won’t talk to much about the API code I wrote but if you’re interested to see how I converted each file or log line to a JSON format suitable for the API, check out my Python code here.
Once the JSON got to my Data Collection Rule (DCR) in a somewhat acceptable format, I took the opportunity to parse, standardize and optimize the data as best I could. Unfortunately for me, Microsoft have not released their ASIM compliant schema for Civilization 6 but I think I improvised well enough.
Some files were great, for example ‘AStar_GC.log’ (which tracks the movements of enemy units) is nicely delimited with commas but ‘Lua.log’, which logs map generation, is delimited by new lines and also with colons at times.
Some of the .CSV’s even have 2 different sets of processes logging to the same file meaning that 2 different types of data appear under the same headers. To give you an example of the some of the horrors, take ‘Barbarians.csv’ for example:
The way I understood this is that:
- The AI Barbarians in the game ‘desire’ 18 camps total (highlighted yellow)
- After each turn the number of camps they actually have is listed under the ‘Num Camps’. (Green)
- In these same columns though, the game also logs whenever new Barbarian units are spawned and their X, Y co-ordinates. (Blue)
- Then the row immediately below that lets us know how many total units have been added this turn (Pinky/Red)
Clearly there are two different processes or systems logging to the same file in the back-end and I wondered if there was a way to parse the one source into 2 different tables. Not a typical challenge but I figured that if I created 2 DCR’s with 2 distinct API calls that parsed the same data slightly differently then this should work nicely. It’s a lot more upfront work but having the 2 nicely formatted tables void of blank cells would be worth it for anyone else working with the data in the future (namely me).
All that being said I made 2 tables, one called ‘Barbarian_Units_CL’ and the other ‘Barbarian_Camps_CL’. I understood that when a new log came into ‘Barbarians.csv’ that I’d then need to make an API call to each DCR to get the results I wanted.
To walk you through the transform of Barbrians_Units_CL from the CSV seen above to something usable in my SIEM —
I check if the column called ‘Added_This_Turn’ has a hyphen (-) and if it does, its definitely an X, Y co-ordinate and belongs in it’s own column which I’ve extended and called “XY_Coords” and removed it from the ‘Added_This_Turn’ column. Else, keep the data as is. Here are the results so far as seen in Azure:
Already so much better but I’ll take it one step further and pull out the specific X and Y co-ordinates by using regex to the match the data to their own respective columns:
source
| extend TimeGenerated = now()
| extend XY_Coords = iif(Added_This_Turn contains "-", XY_Coords = Added_This_Turn, "")
| extend Added_This_Turn = iif(Added_This_Turn contains "-", Added_This_Turn = "", Added_This_Turn)
| extend X_Coords = toint(extract(@"(\d+)-", 1, XY_Coords))
| extend Y_Coords = toint(extract(@"-(\d+)", 1, XY_Coords))
And the results of splitting the two X, Y values:
Next, when I started ‘projecting’ only the columns I wanted I found that the ‘Turn’ column was still keeping rows in the table, seen below with the blank rows:
Not wanting to discard this column entirely, I check if ‘Unit_Name’ contained “UNIT”. If not, then Turn = 0. Then asked the KQL query to return me only results where Turn doesn’t equal 0.
By the time I was done, the entire query looked like this:
source
| extend TimeGenerated = now()
| extend XY_Coords = iif(Added_This_Turn contains "-", XY_Coords = Added_This_Turn, "")
| extend X_Coord = toint(extract(@"(\d+)-", 1, XY_Coords))
| extend Y_Coord = toint(extract(@"-(\d+)", 1, XY_Coords))
| extend Unit_Name = iif(Desired_Camps contains "_", Unit_Name = Desired_Camps, "")
| extend Turn = iif(Unit_Name !contains "UNIT", Turn = "0", tostring(Turn))
| where Turn != "0"
| project TimeGenerated, Turn, Unit_Name, X_Coord, Y_Coord
And the resulting table in Azure looks like:
Huge difference from the original CSV, much more useful to me now. I followed a very similar process for the other set of columns in the data source to make ‘Barbarian_Camps_CL’, but applied the transforming KQL to the other DCR API call, here are the final results:
source
| extend TimeGenerated = now()
| extend Desired_Camps = iif(Desired_Camps contains "_", Desired_Camps = "", Desired_Camps)
| extend Num_Camps = iif(Num_Camps contains "_", Num_Camps = "", Num_Camps)
| extend Turn = iif(Num_Camps == "", Turn = "0", tostring(Turn))
| where Turn != "0"
| project TimeGenerated, Turn, Num_Camps, Desired_Camps
And the final result:
If I ever really needed this data in the same table again as it was originally, I could always stitch them back together again with a join command:
Barbarian_Units_CL
| join kind=leftouter (
Barbarian_Camps_CL
) on Turn
| project-away Turn1, Type, Type1, TenantId, TenantId1, TimeGenerated1
In hindsight, rather than splitting this data up via 2 different DCR’s I think it may have been easier to use the 1 DCR but use KQL functions once ingested to split it up into their respective, easier to read tables. Lesson learnt!
I actually had the inverse of this problem with ‘player_stats.csv’ and ‘player_stats1.csv’ where similar data I’d prefer to have together was getting logged to 2 different files. These 2 files track essentially all metrics revolving around each city — Gold balance, Faith generation, and Science yields for just some examples.
To configure these 2 sources into 1 table, I ingested them as their own individual tables but made a KQL function to join them together. That way when querying this data, I’d never specify or call either table, but the function that joins them together.
It’s a simple process of creating a KQL query joining the 2 tables on 2 ccolumns (‘Player’ and ‘Turn’) and saving it as a function called ‘Player_Stats’.
Player_Stats_1_CL
| join kind=inner Player_Stats_2_CL on Player, Turn
| project-away TimeGenerated1, Player1, Turn1, TenantId1, Type, Type1, Population1
Now I can call ‘Player_Stats’ as if it were a regular table and get a much more holistic view of all the data in the one table.
Those 3 data sources were by far the most interesting (read: difficult) so I won’t go into as much detail with the following data sources I’ve decided to ingest. However, to quickly go over them —
- AI_CityBuild.csv (Civ_Production_Queue_CL) — Let’s me know what each city is building and whether it’s a building, unit or district. The only value I see in these logs are when particular units are submitted to the build queue or when these units are complete and have spawned in.
- AStar_GC.log (Unit_Movement_CL) — My favourite of them all, it logs the starting position of each unit on the map at the start of the turn, and then their finishing co-ordinates at the end of the turn. Very useful tracking each unit on the map and whether they’re approaching my cities.
- GameCore.log (Player_Generation_CL) — This is a log file that outlines what player is assigned what city. It’s not useful for alerting but I’m predicting that having a source that identifies the cities in play for reporting and dashboards might be useful.
- Lua.log (Map_Generation_CL) — A very similar file to above, but it identifies very specific data on the map. It’ll tell me things like what the map consists of, where Volcanoes are and even where the other major and minor cities have been placed. Again, not useful for alerting but I’m expecting it to be incredibly useful in dashboards and reports.
Log Ingestion Summary
This process became a significantly larger part of this project than I initially anticipated. However, the effort will be well worth it in in the long-term. By optimizing log ingestion and standardizing their format, it’s going to allow me to work with the data in a much quicker, and more efficient manner.
If this were a real world SOC, this upfront work enhances overall security by enabling more effective monitoring and quicker detection of anomalies for CIR teams. Standardizing the data improves it’s scalability, usually allowing for direct plugins or data connectors to work out of the box, ultimately allowing systems to handle increasing volumes of log data seamlessly.
Step 2: Analytic Rules with KQL
With the logs ingesting nicely, it was time to start generating incidents and alerts on them. This is where I begin to feel the benefits of all my efforts to ingest logs as cleanly and as succinctly as possible. However, it does make some pretty boring reading for you but that was almost the point —
For all the work it took to parse ‘Barbarians.csv’ appropriately, the analytic rule to alert me on new Barbarian units is now as simple as this:
// New Barbarian Unit spawned
Barbarian_Units_CL
| where Unit_Name has_any ("")
The other 2 analytic rules are similar:
// When Civilization's submit new units to their production queue:
Production_Queue_CL
| where Construct == "SUBMITTED"
| where Order_Source startswith "UNIT_"
// When Civilization's spawns a new unit:
Production_Queue_CL
| where Construct == "COMPLETED"
| where Order_Source startswith "UNIT_"
There’s really not much else to talk about here, the analytic rules are NRT based for the most immediate responses and only generate alerts as opposed to incidents. I do customize the ‘Alert Details’ so that in the next step where I use Logic Apps to alert me on these rules, I can easily pass custom and descriptive alerts to an email address.
Here is a view of my ‘SecurityAlerts’ table showing a few alerts from my ‘New Barbarian Unit’ and ‘New unit submitted to production queue’ NRT rules, these alerts appear as I’m playing a live game:
Analytic Rule Summary:
I am generating alerts on:
- New Barbarian units spawning
- New Civilization units are added to their city’s queue
- When these Civilization Units actually spawn
The work used to cleanup the data before ingestion has allowed me to create very simple, easy to understand Analytic Rules.
The use of parameters in the rules also allows me to enrich the Logic Apps with custom, specific data usable in the Logic Apps you’re about to read about.
Step 3: SOAR with Logic Apps / Playbooks
With all this alerting, it’s time to automate — Since all of my logs are custom logs from my local PC, there’s essentially zero commonality with much of the usual SOAR actions you’d usually do. For example, automatically freezing a user’s account who’s suspected to be compromised but I can send beloved emails.
To trigger an email when an analytic rule fires, involves an automation rule that connects the rule to the Logic App. In this use case the Logic App is incredibly simple because all I need it to do is trigger on the alert, and translate some of the alert’s details into an email to send to me. Here I have the alerts display name and description from the Analytic Rule passed in as the subject and part of the body:
This Logic App is incredibly flexible because any of the 3 Analytic Rules I have can be configured to use this Logic App effectively because I make use of their individual custom parameters in the Analytic Rule (Alert Display Name / Description) to enrich the email with their specific data.
There is something else I’d like to alert on and that’s when enemy units are getting close to my Capital City. Since Analytic Rules aren’t stateful, I need to create a Logic App where it first finds my Capital City’s X, Y coordinates and then make a comparison to the X, Y coordinates of the enemy’s units. And if they’re getting close I’d like to know about it. In a clever way this an Analytic Rule masquerading as a Logic App.
Here’s what that looks like:
It first grabs my Capital City’s co-ordinates by reading ‘Map_Generation_CL’ and importantly, defines my ‘perimeter’ or ‘border’. Here’s the KQL that does that:
Map_Generation_CL
| where TimeGenerated between (now() .. ago(2m))
| where Message startswith_cs "Major Start X:"
| extend XCoordinate = extract(@"Major Start X:(\s*)(\d+)", 2, Message),
YCoordinate = extract(@"Major Start Y:(\s*)(\d+)", 2, Message)
| extend X_right_border = toint(XCoordinate) + 10
| extend X_left_border = toint(XCoordinate) - 10
| extend Y_top_border = toint(XCoordinate) + 10
| extend Y_bottom_border = toint(XCoordinate) - 10
It defines my capital City’s border as a10 hex radius surrounding my City. Following that it runs a query to determine each enemy units co-ordinates, and then does a condition check to determine whether their co-ordinates are within my borders with the series of AND statements seen in the above image.
My KQL to grab all the enemies units co-ordinates is as simple as this:
Unit_Movement_CL
| where TimeGenerated between (now() .. ago(2m))
This Logic App runs every 2 minutes although I’d imagine suppressing these alerts in times of war would be necessary to avoid spam. This is most useful as an early warning system than anything else.
Analytic Rule and SOAR Summary:
Through the use of Analytic Rules and Logic Apps, I am getting email alerts of new unit’s spawning and also when those units enter a defined proximity of my capital city’s location.
To produce an alert on incoming enemies involved a little bit of creativity with Logic Apps to imitate a stateful Analytic Rule. If my Capital City’s co-ordinates where static then an Analytic Rule would be fine but to keep functionality between games where my co-ordinates change, this custom Logic App is much more effective and flexible.
Bonus Step 4: Visualizations with Notebooks + Python
There a lot of advantages to using Notebooks with Python on your data as opposed to KQL.
One clear advantage Python has over KQL are it’s visualization libraries. The way Azure handles Notebooks is to use Python to then run a KQL query on your workspace and then return the output as a data frame making it super easy to do any of the regular data analysis you might ordinarily do.
Since I’m logging map generation details, I took the chance to draw the starting co-ordinates of my own, and other civilizations starting point by reading the Map_Generation_CL table:
I built this POC assuming only one other civilization is playing, on a duel map, and where our settlers spawn on the map are where we ‘settle’ our first city, which is usually best practice. The code for this can be found here.
To sum up -
It’s made for interesting project to ingest this data into a SIEM and parse it into something useful. The messy, unorganized logs provided me some unique logging challenges I wouldn’t normally encounter in my day-to-day.
I’ve really enjoyed becoming so familiar with a particular game / application — learning it’s quirks and discover how it really works under the hood.
I’ve kept the scope relatively small here but I can see a ton more logs in the game files worth exploring - There is enough data logged in the game files here to play around with for months. I’d like to accrue enough data to start experimenting with Machine Learning to predict the winner of games early based of certain criteria, I’ve also recently heard of the new ‘Codeless Data Connectors’ in Sentinel and from my early impressions I feel as though they would be go-to method of ingestion if I was to do this again.
But maybe for another time - thank you for reading!
Jacob Lummus — jclummus.w@gmail.com