SOAR: conclusions for 2020
Since 2016, the acronyms SIRP (Security Incident Response Platform), then SOAR (Security Orchestration Automation Response) have been making headlines. In the meantime, integration projects have been initiated and the market has been consolidated with a number of significant acquisitions:
Figure 1 – Key SOAR solution acquisitions – Source: Orange Cyberdefense
If, today, the usefulness of such a tool is no longer to be proven, it is nevertheless important to take stock and draw up an inventory of the situation at the beginning of 2020. In this article, we are going to share some feedback on SOAR solutions, both on our internal uses and those observed at our first customers who have started an integration project.
It is clear that today the most mature customers on cyber security issues have already started to think about and even already deployed a SOAR solution. Large organizations and MSSPs (Managed Security Service Providers) are the most concerned by this need to automate and industrialize incident processing. It is obvious that a smaller organization from a security point of view will tend to focus its strategy on other priorities such as detection.
In this context, can and should an organization equip itself with a SOAR as soon as its SOC (Security Operation Center) is created? Some companies have indeed chosen to deploy the complete package: SIEM (Security Information and Event Management) + SOAR. These projects are ambitious, the pace is high, but with advice, assistance and strong team involvement, it is actually quite possible, but far from being a factor in the success of the SOC.
In terms of deployment, Gartner published in February 2018 the study entitled Preparing Your Security Operations for Orchestration and Automation Tools[1] mentioning integration times ranging from 6 to 9 months. The figures are fairly close to what we have observed, bearing in mind that, like the majority of SIEM detection scenarios and rules, worflows (or playbooks) are subsequently constantly improving.
In terms of use cases, automating the processing of malicious emails collected from an abuse mailbox seems to be the most common with a large number of customer deployments and a good ratio of automated time-consuming tasks. Automatic parsing and extraction of artifacts and their enrichment (Whois, DNS record, HTTPS certificate information, etc.) comes first. This is followed by a more or less thorough analysis of the mail, including the comparison of indicators with Threat Intelligence databases and the detonation of files and URLs to a sandbox. Some organizations have in fact taken a certain lead with the application of several additional tools that, among other things, provide a more in-depth analysis: Levenshtein distance, pattern/text matching, image analysis (OCR, logo detection, etc.), but also machine learning algorithms, which supplement the knowledge bases that are often insufficient to detect more advanced or targeted attacks.
Figure 2 – Phishing Playbook automated (extract) – Source: Demisto/Palo Alto XSOAR
Then come the more generic playbooks, sorting and qualification of incidents. Here, we find mostly in sources, a SIEM, or more simply an EDR (Endpoint detection and response) or network probes, which will send back alerts and which will then be collected, “mapped”, and then enriched by various integrations (Active Directory, Threat Intelligence, CMDB or Configuration Management Database, etc.). Assignments, notifications and SLA management (Service Level Agreements) are often part of this first generic playbook.
Figure 3: Industrialization of the sorting and qualification of incidents. Source: Orange Cyberdefense CyberSOC
In large organizations that generally have a CERT and/or a CSIRT providing targeted indicators of compromise (IOCs), we also found a fairly common use case: the search for indicators on . From a MISP (Malware Information Sharing Platform) instance, they are sent from an email, a CSV file, or a simple indicator: the ability to quickly query SIEM, log management, EDR or NTA (Network Traffic Analysis) type equipment in order to identify whether an indicator has been observed on the network or in an operating system is also carried by the SOAR.
The interest is real because today the majority of compromise indicators are deployed within SIEMs for real-time detection, but they are not queried over a previous period.
We will therefore detect a future compromise, but not a past one. Why not? A good number of attacks are mutable or punctual, for example, phishings or infections by malwares such as stealers allow an exfiltration of credentials. Without historical research, this compromise would never have been identified. There is no doubt that previous ransomware campaigns have also pushed a good number of companies to develop their capacity to search for compromises.
Figure 4 – Retro-hunting operation. Source: Orange Cyberdefense
In addition to the generic aspect, of course, more specific incident response playbooks are deployed. Priority is given to the treatment of malware-type incidents, which can be broken down by type (ransomware, worms, Trojan, etc.) or even family (Emotet, Nanocore, Azorult, etc.). As sorting, enrichment and qualification are partly automated upstream, the work here is more focused on the analyst with high value-added manual tasks.
Malicious email incident response playbooks are also widely deployed with family specifics: phishing, spear phishing, president fraud and many others.
Figure 5- Malware Investigation Playbook (extract). Source: IBM Resilient
The automation of remediation processes is still only marginally deployed. Although it is technically possible (deployment of a blacklist on a proxy/firewall, isolation of a station/server using an EDR, deletion of emails from mailboxes via Office 365, etc.), organizations still prefer to rely on manual implementation.
Either the teams in charge have direct access to the solution and a task will be assigned to them, or the SOAR takes care of automatically creating a ticket to a third party tool such as an ITSM (IT Service management). A good knowledge of the risks and an up-to-date repository to identify critical servers, VIPs and administrators are necessary to avoid or limit collateral damage as much as possible. A decision tree can then be built based on the information collected.
Of course, nowadays we find some cases of massively deployed use, but there are however very different needs for each business. An MSSP, not always having administrative access at its client’s, will orient its use on investigative playbooks and will give great importance to collaborative aspects, capitalization and the knowledge base. An organization with its own SOC will have more extensive access and will thus be able to extend contextualization, remediation, but also the development of IT and business playbooks.
Although paradoxical, the first difficulty observed seems to be the management of human resources. As solutions are rarely plug-and-play and faced with the reality of a lack of IT security skills, deployments are much more complex and time-consuming than expected. A clearly quantifiable return on investment is therefore rarely achieved in the first few months of such a project.
Automation and integration capabilities often rhyme with development, which of course implies specific skills, an operational model, or even a dedicated team in the case of playbooks or complex connectors developed in-house. In addition, integrations with third-party tools or solutions involve a multitude of heterogeneous teams within the project, some of which may be very skeptical that a tool can perform potentially damaging actions automatically. Of course, there are functionalities available to constrain or validate these types of actions, but this is not always enough to convince.
Adapting to change is also a challenge for teams. Interfaces are sometimes complex to grasp at first glance, and are only mastered after a few days or even weeks of use. The most notable change is surely for level 1 teams, who are as much impacted by the arrival of a new tool as by the automation of a large part of the sorting and qualification process. Taking into account the turnover of the teams, training on these new tools is therefore a real problem because of its central role.
Underneath the promising marketing discourse, therefore, lies a real complexity and some organizations have not yet acquired the necessary maturity to undertake the deployment of a SOAR. Security incident handling processes are often only partially mastered and documented, and without this essential contextual information, no automation is possible.
As mentioned in the introduction, projects are long and involve many teams. The governance and project management aspect is therefore not to be neglected. The validation of technical prerequisites, which involve a lot of opening of network flows and creation of service accounts or APIs (Application Programming Interface), as well as the mobilization of teams can be significant risks. Let’s not forget that the organizations most likely to equip themselves with a SOAR are far from the start-up model. Complex processes and operating in silos are real brakes, especially since teams sometimes have very different, even contradictory, objectives.
The definition of a target operating model is a reflection to be had: who is in charge of creating and improving playbooks? A new dedicated team or the SOC teams? The same question must be asked for the definition of processes, the development and maintenance of integrations, as well as for the platform. Certainly, project management methods specific to the development world have their place here and must be considered.
Incident response processes are poorly documented today. It is essential that they are known, mastered, but above all they must be reliable and contextualized.
Bases do exist: SANS, NIST and even CERTs, such as CERT SG [2], share their incident response sheets and methodologies. Some solutions also include templates for playbooks and workflows. Of course, these documents and templates are very good starting points, but the contextualization is not to be neglected and an accompaniment on the definition and improvement of the incident response processes is judicious in certain cases. The BPMN standard (Business Process Model and Notation [3]) seems to be unanimously accepted because a large number of publishers rely on this standard for playbook modeling.
A documentation of upstream incident response processes also allows to optimize their representation by identifying similar or identical phases present in several use cases. For example, the management of a compromised account, whether as a result of a malware infection, or via phishing, the process will be much the same in most organizations. Thus, on some solutions that allow the creation of “sub-playbooks”, the reuse of sub-blocks will make the integration and evolution of playbooks much easier.
Operation in degraded mode is also to be considered. How to work if the SOAR solution is unavailable? The usefulness of preparing these response processes, which are documented and quickly accessible, makes it possible to keep the SOC operating acceptably, despite a degradation of service.
The mistake of wanting to automate too much of the incident response process from the outset is often made by companies. Instead, the best approach would be to deploy exclusively manual playbooks in the first instance, and then, working iteratively, to automate the most repetitive and time-consuming tasks first.
This method makes it possible to have an operational solution quickly, but also to be able to calculate a return on investment very easily. Adherence to this type of tool is therefore more natural for teams of analysts.
It is also better to focus on preparation and quality than on the number of integrated playbooks. We recommend starting by identifying a number of limited use cases, mapping the equipment and solutions that will be used, determining the information relevant to your organization and operational teams, which will allow you to create, populate, and “map” fields specific to your organization. Then, the definition of the metrics and KPIs to be integrated will come next. A Test procedure and acceptance testing can be used to validate the complete deployment chain.
As a MSSP and SOAR user, we have a real duty to advise and support our customers who are today questioning the usefulness and complexity of such a tool. Publishers also have a card to play in order to offer truly turnkey products with simple models such as SaaS (software as a service) solutions, which will thus be able to facilitate the adoption of less mature organizations, while offering them an interesting return on investment. This is the approach of solutions such as Microsoft’s Azure Sentinel and Rapid7’s InsightIDR/Connect, which provide SOAR solutions directly integrated with their SIEM SaaS solutions. Although less advanced than pure players, these solutions nevertheless provide sufficient functionality for a majority of SOCs.
In addition to editors and service providers, user communities also seem to play an important role and a number of editors use this marketing argument. Free, but limited, code sharing of integrations, playbooks, solutions making content available, for example on a GitHub repository, therefore facilitate sharing and community participation. The Palo Alto XSOAR (formely Demisto) solution, for example, has clearly understood this interest, and provides a Github repository and a Slack channel for its community.
We are still in the early stages of SOAR, Gartner indicated that by 2022, 30% of organizations with a team of at least five people will be equipped with a SOAR, compared to less than 5% in 2019. The next few years are therefore going to be crucial for these solutions, and these three players (MSSP, publishers and communities) will play a crucial role.
[1] Gartner, Preparing Your Security Operations for Orchestration and Automation Tools https://www.gartner.com/en/documents/3860563
[2] CERT Societe Generale, IRM (Incident Response Methodologies): https://github.com/certsocietegenerale/IRM
NIST, Computer Security Incident Handling Guide
https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-61r2.pdf
SANS, Incident Handler’s Handbook
https://www.sans.org/reading-room/whitepapers/incident/incident-handlers-handbook-33901
[3] Wikipedia, BPMN: https://fr.wikipedia.org/wiki/Business_process_model_and_notation
S