If you are a leader in Cybersecurity, we both share the pain of thousands of other leaders in achieving continuous improvement in our security operations. In case you are looking for a perfect formula that can work forever, let me save a few minutes so you don’t waste your time keep looking: There is no secret or perfect formula; there is always something that needs to be adjusted, something that doesn’t make sense anymore, and something that requires tuning to keep your operations always improving. This post won’t give you a definitive answer, but I will share a few of my experiences on this journey.

Metrics Criteria Should Not Last Forever.

Nothing in cybersecurity will last forever, and metrics are not the exception. Whenever you define the criteria to measure your security operations, keep in mind that your metrics will not last forever; they will require frequent reviews to adjust to your current needs. This effort will take some time and alignment with the operations leaders to define what is important, what you’ve learned during the last revision, and what targets need to be defined to push on improvement. This exercise should happen at least once every year, but in some cases, adjustments must be evaluated every one or two quarters.

Do not just measure and report, but also set clear expectations and targets. Probably, this is the trickiest part of this job, especially defining the right balance between achieving the targets and having the right resources to achieve them. These resources are a combination of people, technology, and processes, and they must work strategically towards achieving the targets defined by the leadership team.

Targets are not SLAs — At least not necessarily.

Based on my own experience, some people who come from other areas different than cybersecurity try to implement SLAs on security operations and incident response. Don’t get me wrong, I am a fan of SLAs too, especially when I am not accountable for them, but SLAs are a double-edged sword (and very sharp), which requires careful evaluation from a risk management perspective too.

Using monthly and quarterly targets with the combination of frequent checkpoints is a useful technique to push leadership towards continuous improvement. These checkpoints should focus on the current metric status to help the team to identify cases that deviate from the targets, so controls can be implemented to ensure target compliance. But especially when you are new to this, it is extremely easy to confuse a target with an SLA, and this is where we must be very careful in articulating what the expectations are when defining the requirements. The term “SLA” is immensely powerful and could also drive people to make wrong and not educated decisions in security operations. An example can better explain this last point.

Consider that the Incident Response leadership team established an SLA of 10 minutes to investigate security alerts, which are not necessarily confirmed security incidents. During a security alert investigation, one of the security analysts has already spent 9 minutes without having a full understanding of whether the event corresponds to a security incident or not. At the 10-minute mark, he decides to either close the alert as a false positive or treat it as a true positive and shut down the suspected system because he must meet the SLA target.

I will not get into the positive or negative consequences of either overacting to a false positive alert or not reacting to a true positive alert, but instead, think that the security analyst could keep working on the alert until having full confirmation of what he was dealing with to make an educated decision on the alert handling after the investigation, while not meeting the SLA target of 10 minutes.

Now, during your metrics review session, that case will be highlighted because it took more than the target. Then, the leadership team will review that case to determine what can be done differently to have better telemetry, more documentation, guidance, etc, so the analysis of such alerts can be more efficient to meet the target. Using average metrics toward target accomplishment works better than pushing SLAs on individual cases, especially because the incident response process is very dynamic and there is no case the same as another.

Remember that the incident response process goal is to protect the business, but if not properly executed, it can also be a risk to the business itself. So be careful.

Avoid overcommitment on targets and metrics.

Metrics and targets must be realistic, not too easy to achieve, otherwise they won’t push for improvement. Also, they need to make sense to evaluate if they are achievable with the number of resources and technology available.

Some metrics are under your control, and some others are not. It is very easy to commit to a metric or target that you don’t have the empowerment to achieve, especially when your CISO is the one asking for it. Cybersecurity operations are bigger than just incident response, but the IR metrics are insightful on other areas of cybersecurity and IT that require improvements and controls. Let me give an easy example.

One of the most common security incidents in any organization is an account being compromised due to a successful phishing attack. In incident response, a leader could commit to reducing the response time after an incident is confirmed, but not to reduce the number of successful phishing attacks, because that will require a more holistic approach that involves multiple parties in and out of IR, including the cybersecurity awareness team, people managers of the most targeted users (something that IR should be able to identify using past incident reports), and other functions on the organization that help implementing technology to catch the attacks before they come available to the end users on the inbox. So, while you can commit to this target as a group, don’t shoot yourself in the foot by attempting to do this alone.

Final Advice

When I started my own journey in cybersecurity, I made a few mistakes while defining my metrics criteria. But one of the most common mistakes is trying to solve everything in one single shot. With some experience and maturity, I came to the conclusion that you need to identify your critical pain points, those that are putting you at higher risk. Being focused is a special skill in this job, because there are many moving targets and interesting things that call your attention, making it very difficult to keep your mind on the most important item on the list, especially when there are easy wins.

So, my advice is to select the top 3 of your pain points and push on fixing those during the next review period. For example, if you noticed that your time to implement containment actions on a few sample confirmed incidents was too high (too high is subject to your own judgment), maybe you should start measuring the average time to contain confirmed security incidents weekly. Why? As any “101 incident response course” will tell you, the higher the time of containment, the higher the risk of lateral movements and further damage and exploitation to the business. Also, by implementing actions on reducing this metric week after week, it will help you to identify controls that will have a side effect on other metrics, like business impact after a security incident (another difficult metric).

After your review period, you will know if the target was met, and then adjust it. For instance, if your initial target was 60 minutes average and you achieve 50 minutes, it can probably be reduced to 45 minutes in the next quarter, of course, if that is realistic with the resources you have.

Leave a Reply

Your email address will not be published. Required fields are marked *

TOP