An interesting but rarely useful method of over-optimizing Azure log ingestion

Jacob Lummus
3 min readJul 28, 2024

--

Inspired by the effort that went into the network compression seen in this video by ThePrimagen — I wondered what crazy lengths I could go to, to compress security logs destined for a SIEM.

In this theoretical scenario, I imagined a security log that had a column which contained relatively consistent data. For example, a log with an error message that was only ever “ABC” or “XYZ”. These logs would be ingested via a data collection rule (DCR) and in that process, I would transform the data into a smaller size to limit the cost of their storage.

I would transform for example, “ABC” to equal 1, and “XYZ” to equal 2, theoretically reducing the size of this data to a third of it’s original form. Then to ‘unpack’ the data to it’s original form, I could reference a KQL function to query the data rather than the storage table itself. Of course the function would do the exact opposite operation where if the error message column == 1, then let it equal “ABC”. That way I can maintain the data’s integrity whilst simultaneously reducing it’s storage cost. Albeit fractionally but I thought it was a cool concept and could be useful if applied to a large dataset.

I also thought this had a nice, incidental effect of obfuscating the data if it were ever leaked or stolen. Anyone reading the data wouldn’t understand what the 1’s or 2’s meant without also reading how the DCR transforms the logs for ingestion. And it was in this moment that I realized this sounded a bit (a lot) like encryption.. oh yeah Azure must encrypt our data at rest right?

Of course they do — 256 AES. Which is why this thought experiment is largely useless.

Advanced Encryption Standard (AES) is a symmetric encryption algorithm which uses a fixed block sizes of 128 bits (16 bytes). Having a fixed block size in encryption means that the algorithm processes data in uniform chunks of a specified size, regardless of the actual size of the input data. For our case here with AES, this block size is 128 bits. Therefore, if our input data (“ABC” or 1) is not at least 128 bits or not perfectly divisible by 128 bits (which both are not) it must be padded to fit a 128 block.

Ultimately then, both data would be stored as a 128 bit long string where most of the resulting encryption will be the padding needed to make the data fit a single 128 bit block. And all of this fancy DCR and function transformations are for nothing.

Unless however there is a scenario where your expected data is more than 128 bits in length. Then you could theoretically convert it down to something less than 128 bits in the DCR, and unpack it via a KQL function and actually save on some storage cost. This is because Azure’s 256 AES would ensure that your smaller, transformed data would be stored in a single block of encryption — rather than several perhaps.

What logs exist out there that are exacting, predictable and over 128 bits for this method to work? I don’t definitely don’t know, but if you have them please feel free to use this method and let me know!

--

--

Jacob Lummus
Jacob Lummus

No responses yet