MITRE ATT&CK test

11 June 2020

Research Threat Intelligence-led Security

What is the truth? Do we have a winner?

About a month ago, the long-awaited MITRE ATT&CK APT29 emulation test was published. Many of the vendors that participated have since then published statements around their supremacy, and you will probably get the impression from each of them that they are the winners. So, what is the truth? Can everyone be a winner? Is there a winner? Does it matter? In this post, we will try to give some clarity to the vendor buzz.

First off, what is the MITRE ATT&CK Framework?

MITRE ATT&CK® is a globally accessible knowledge base of adversary tactics and techniques based on real-world observations. The ATT&CK knowledge base is used as a foundation for the development of specific threat models and methodologies in the private sector, the government, and in the cybersecurity product and service community.

Source and more info here: https://attack.mitre.org/

Different matrices are covering different areas, such as Enterprise, Mobile and ICS.

How are the MITRE ATT&CK Evaluations performed?

This is how MITRE explains it:

MITRE evaluates cybersecurity products using an open methodology based on the ATT&CK® knowledge base. Our goals are to improve organizations against known adversary behaviours by:

Empowering end-users with objective insights into how to use specific commercial security products to address known adversary behaviors
Providing transparency around the true capabilities of security products to address known adversary behaviors
Driving the security vendor community to enhance its capability to address known adversary behaviors

These evaluations are not a competitive analysis. We show the detections we observed without crowning a “winner.” There are no scores, rankings, or ratings. Instead, we show how each vendor approaches threat defense within the context of ATT&CK.

About the MITRE ATT&CK APT 29 Emulation test

Two scenarios emulate publicly reported APT29/Cozy Bear/The Dukes/YTTRIUM tradecraft and operational flows. The first scenario begins with the execution of a payload delivered by a widespread “spray and pray” spearphishing campaign, followed by a rapid “smash and grab” collection and exfiltration of specific file types. After completing the initial data theft, the value of the target is realized, and the adversary drops a secondary, stealthier toolkit used to further explore and compromise the target network

The second scenario focuses on a very targeted and methodical breach, beginning with the execution of a specially crafted payload designed to scrutinize the target environment before executing. The scenario continues through a low and slow takeover of the initial target and eventually the entire domain. Both scenarios include executing previously established persistence mechanisms after a simulated time lapse to further the scope of the breach.

Source and more info here: https://attackevals.mitre.org/APT29/

Is there a winner or leader?

The good thing with MITRE’s testing is that it is transparent. Each test is described, and the outcome is documented well; if you compare with traditional testing houses. Most testing houses provide very limited information about what is tested, what information has been submitted to the vendors before the test, etc. So, as a customer it becomes hard to go into detail on how a tested vendor performs and why they are considered, for instance, a leader by the testing house. This is especially visible in the EPP (Endpoint Protection Platform) space, where many of the leading vendors have received a 100% prevention rate score, which we know is not true. In this sense, MITRE is fulfilling its goal to provide objective, transparent and unbiased results.

The drawback of the MITRE testing is that it is very “atomic”. Each sub-step detection is either a pass or a fail, and marked with Detection Main Types and Modifiers. The Main Types could be divided by the quality of the detections; for instance, a detected Technique or Tactic is better than raw Telemetry. However, it would be too hasty to solely look at this as a quality measure as telemetry detections could be valuable data points used to connect the dots (detections) into a timeline, with the end goal of aiding an analyst to respond quickly to an alert or investigation.

When looking at the modifiers, “alerts” and “correlated” are better than (for example) “detection configuration changes” that had to be made to get a pass on the specific sub-step. It is hard to understand if the configuration change only provided a pass in this test or if the change also would have a positive effect on a solution in a real environment.

Evaluating a vendor based on the total number of detections would, in many cases, lead to the wrong conclusion as well, as that could indicate that you will be flooded with events in a real environment. Some sub-steps could also provide more than one detection pass, which, depending on the solution, either enriches a timeline or floods it with more alerts.Similarly, technique coverage will tell you how well the solution maps its discoveries to a MITRE Technique, but nothing about the usability, which could result in a flood of disconnected and un-correlated alerts.

In short, a vendor could receive good coverage in the test by adding a lot of detection capabilities focusing in on the 58 MITRE Techniques that were evaluated in the MITRE ATT&CK APT29 emulation test.

Conclusion

Despite the above reasoning, the MITRE goal to provide the community with transparent evaluation data is a good thing. But the way the data is later used by some vendors is far from perfect, as you could prove almost anything by looking at it from the right angle. The test says very little about real-life scenarios, does not cover the risk of alert fatigue, and doesn’t include a tools ability to aggregate alerts, response capabilities etc. The MITRE emulation testing is unique in its transparency in public testing, so we have initiated contact with the evaluation team to discuss how to further improve and possibly address some of the current testing issues.

There is no quick and easy way to find the winner or leader out of the MITRE test data, and most importantly, it is not the intention of the test either. So, one of the main conclusions is that MITRE is not the only thing you should use or rely on when you evaluate security solutions. In fact, it is only a piece of the puzzle. To choose the right EDR (Endpoint Detection and Response) tool you need to know your control gaps and it is not only about the technology but also processes and people (e.g. SOC analysts). MITRE emulation test results could be used as one source of decision data but should certainly not be the only one.

If you are currently evaluating Endpoint Security vendors and would like to dive into the MITRE test data or the crowded endpoint security space together with us, we are more than happy to help.

Source: https://attackevals.mitre.org/APT29/