Tuesday, November 14, 2023

Detection Engineering stages of maturity: A Story

Detection Engineering stages of maturity, getting the most out of your SIEM, a story over time

The SIEM is installed and all the detections out of the box from the vendor are enabled. Terrible, low fidelity, too many alerts, you quickly realize it's unmanageable
The out of the box rules are quickly without much in-depth thought wildly tuned just for the same of making the queue quieter. This is tons of work, you start to realize you don't even understand why most of these rules even exist or what they're doing.
You slowly start deleting the out of the box rules, 1 at a time, and replacing them with your own. For each of these new detections, you build out some sort of documentation explaining why they exist for the SOC to read and understand.
It takes quite a bit of time, but eventually you've disabled all out of the box SIEM rules. They've all been replaced with custom detections you built that are very similar but now have proper documentation.
You now have a strong grasp on every detection in your SIEM because you wrote it and tuned it, and the SIEM alerts are relatively quiet compared to what they used to be. But you realize these rules are very specific to certain TTPs and that you have gaps in coverage.
You have mapped all your rules to the MITRE attack matrix. With very little thought or planning, you start grabbing Atomic Red Team tests, and using them to build detections for tactics/techniques you're lacking in.
You have tons of detections now, your coverage map is looking better because you now have detections for most tactics/techniques. But you're starting to get back into that issue of too many alerts, so you have to be more strategic about this.
The SOC is busy, but you have free time due to a strong coverage map, so you start more proactive threat hunting. You start realizing your detections are fragile & easy to bypass & you still miss stuff (e.g. if name of a process changed, the order of parameters change, etc.)
Your threat hunts keep leading you back to really basic fundamentals, like inventory. You start building massive data sets of all assets and software in your environment. (e.g. hostnames, ips, publishers, process names, etc.)
You're slowly building a better inventory than anybody in IT ever had. You dump & group this data into your SIEM in lists & occasionally use it for tuning out false positives. This definitely improves your alert fidelity and the SIEM alerts are getting quieter again.
You now regularly threat hunt & continue to see your rules are missing stuff. You need something more, so you start building baselines from your threat hunts. Key/Value pairs, such as locations a user logs in from. Accounts that login to a server. IPs it connects to. etc.
You build a first ever experimental "baseline detection". All it does is fire if it sees something that is outside the baseline, e.g. something that isn't in your list. (ex: This server is known to connect to these IPs, but it just connected to a new IP.)
It works. You start expanding this to other "baseline detections" such as alerting when: an admin account ran a new process, an VIP user logged in remotely from a new country never seen before, etc.
You are loving this idea! How can a threat actor bypass this? They probably can't. In any incident, there's always going to be a single new IP, new executable, new country, new domain, or something. But this is going to be low fidelity & noisy, how can we scale this?
Baby steps. You gather a list of crown jewels (DAs, GAs, DCs, Critical Apps, Public systems, OT gear, etc.) One at a time you build baselines around them (what process they run, domains they connect to, countries they login from, etc.) Infinite possibilities.
It's paying dividends, but it's time consuming & exhaustive as you are the primary curator of the baselines. You need a feedback loop that can get data into the baselines. You engage your SOC in a workflow that allows them to suggest baseline additions they find to you.
The process matures, trusted senior SOC staff can add things directly to baselines. Senior staff create a loop to get baseline additions from their junior staff. A machine now exists that feeds itself, a dynamic constantly expanding baseline that increases alert fidelity.
You now are able to add more and more "baseline detections" to cover the non-critical but still important systems. As you build out these baselines, you make sure to run historical hunts to gather "what is normal" and pre-fill the baselines before SOC ever sees them.
You now have a massive data lake, dare I say a TIP? But it's not full of malicious indicators. It's full of "what's normal" in your environment. It gives the SOC droves of information and correlation points to quickly identify benign false positives.
The detections that are firing are now in most cases "new things" never seen before in your environment, thus worth a review. It keeps the SOC staff engaged & interested because it's not the same old mundane detections over and over again.
Consensus is building that those old rules like atomic red team, while they have their place, aren't as valuable as these new rules. If push came to shove (perhaps due to SIEM licensing, performance) you have some flexibility & comfort in disabling those original rules.
Time progresses, the number of baseline detections grow, SOC continues to feed the baseline, and you can dive even deeper, setting more trip lines for threat actors. You can start creating baseline detections for parent processes, east-west traffic, listening ports, etc.
Journey isn't over, it continues. Mistakes have been made, time may has been wasted, but you just keep moving forward. New chapters need to be written. We'll see where this goes. Thanks for reading this, and I hope it helped somebody. #detectionengineering #blueteam #siem