Author: Wicus Ross
In December 2021 we published a blog post on the Golden SAML (Security Assertion Markup Language) attack against Microsoft Active Director Federated Service (ADFS)[1]. That blog post was inspired by the larger event that impacted many companies in late 2020 and pivoted mostly around the cyberattack of SolarWinds. Our blog post looked at how attackers targeted ADFS to steal trusted signing certificates. These certificates enabled the attackers to forge digital signatures that enabled them to gain authenticated access to services that had a trust relationship with the compromised ADFS, such as cloud services. With a legitimate username the attacker can now forge a valid authentication response that any third-party service (also known as a resource partner organization) in an existing trust relationship with ADFS will accept [2]. A password or MFA is not required, and the attackers bypassed the authentication mechanism.
More recently, an announcement by Microsoft that an attacker, tracked as STORM-0558, managed to gain unauthorized access to Exchange Online data hosted in Azure by using Outlook Web Access (OWA) has caused quite a stir [3]. Microsoft indicated that the attackers targeted a subset of accounts belonging to specific organizations. Microsoft also indicated that the attack path was closed when they revoked several certificates associated with what is called Microsoft Account (MSA) consumer signing keys and fixed a validation flaw on their end. At the time of writing Microsoft admitted that they could not explain how these attackers obtained a copy of a private key of an MSA certificate used in the attack and were still investigating the matter. Microsoft indicated that this inactive MSA key enabled attackers to fool the Relying Party (RP) process that checks authentication token signatures, as the forged authentication token was signed by the trusted certificate.
A company called Wiz.io then published a blog in which they shared their views of some of the technical aspects of the attack, including the types of accounts that could be impacted by this type of attack [4]. At the end of the blog post Wiz indicated that a Microsoft team reviewed their blog to ensure it is “technically correct”. It is not clear if “technically correct” means that Wiz’s hypothesis about the attack vector is accurate, or if the technical details of elements of the MSA account as per technical documentation is correct. Irrespective, it seems clear that a team from Microsoft reviewed the blog post and did not object to the implication of the content. However, this is still not official Microsoft communication regarding technical aspects of the respective compromise, and the Wiz analysis is still only plausible speculation.
The MSA certificates revoked by Microsoft were associated with public services on Azure and enabled user access to applications hosted on Azure. The MSA certificates are used to ensure the integrity of Azure Active Directory (AAD) issued tokens. These applications need to follow prescribed practices detailed by Microsoft to ensure correct functioning behavior, including various validation steps such as validating the token authenticity using a certificate. As stated earlier, there was a flaw that allowed an unintentional side effect, namely granting access to data of unrelated parties. The flaw fixed involved absent validation of the Issuer specified in the certificate information (Issuer claim). The specified Issuer was not compared with the logically associated Issuer for the respective Azure tenant. In other words, each Azure tenant has their own unique certificates issued by Azure and there is an identification value that must be checked as part of the authentication process to ensure that the certificate is used in the correct context and is associated with the tenant listed in the certificate’s issuer claim.
The MSA certificates were trusted in certain contexts and, combined with the validation flaw, enabled the attackers to gain implicit access due to the absence of additional verification steps. Another limiting factor for the attackers, according to the Wiz blog, was the attackers had to tailor their attack to focus on multi-tenant or mixed audience applications. Single tenant applications could not be abused in the same way. This limitation was hardly relevant, however, as the accounts the attackers were targeting fitted this profile.
Another cyber-attack that made headlines around the same time was the breach of JumpCloud, a cloud-based identity provider [5]. JumpCloud detected the compromise and opted to force an admin API key rotation for all clients [6]. The admin API keys could potentially allow an attacker to further compromise JumpCloud’s clients environments by abusing the privileged API access. Mandiant published a blog describing how the attackers compromised one of the JumpCloud victims by using the privileged API key access [7]. The attackers then used the JumpCloud agents deployed on the endpoints to push malicious scripts to further their actions on objectives.
All three incidents described above involved stolen authentication key material that enabled attackers to gain access to respective infrastructure or data [8]. In the ADFS Golden SAML attack the attackers had to extract key material per compromised organization, thus multiple victims had to be probed for the specific services and then the attackers could pivot from there. In the JumpCloud instance the attacker had to compromise JumpCloud itself that held all the privileged admin API keys and then had to pivot to the respective organizations based on their API keys. In the MSA STORM-0558 incident, the attacker had to acquire one or more of the eight MSA private keys and then target user accounts that might be associated with multi-tenant or mixed user applications. From the disclosed information at least Exchange Online data was accessed without authorization.
In the case of the MSA incident, it is unclear if the attackers simply got lucky regarding the flaw relating to the missing check on the certificate issuer claim value. The attackers might not have known about the validation flaw and probed at the Azure applications using the stolen certificate and accidentally achieved access. Alternatively, the attackers might have known about the absence of the checks and knew that if they got hold of a MSA private key they could get in. According to Wiz, Azure multi-tenant application must be configured in a way that allows the MSA certificates to be usable. This constitutes a misconfiguration according to Wiz, but Microsoft’s documentation is not clear on this even though it explicitly mentions issuer validation [9]. As with many vulnerabilities, several stars had to align for this exploit to be feasible.
Both JumpCloud and Microsoft shared high level details about the respective cyberattacks. In the JumpCloud incident we know the starting point was a spear phishing attack, but beyond that and the Mandiant blog not much else is publicly known. In the case of the MSA incident, Microsoft say they are still investigating how a MSA private key got leaked. Besides the Wiz.io blog, we do not know if any other techniques were used by the attackers [10].
Using a single certificate as part of an authentication process can be a costly mistake, especially in the ADFS Golden SAML and MSA attacks. Grouping the Golden SAML and the MSA attacks under the same umbrella seems like a stretch, but these two attacks are more alike in nature than the JumpCloud incident. The Golden SAML attack can be used to forge authentication tokens for multiple users associated with that authentication domain. The impact of the stolen MSA private keys is even larger as it is not limited to just one organization, while the Golden SAML attack is limited to a specific Active Directory domain.
All three incidents resulted in the potential to pivot into other environments to further the goals of the attackers. The extent to which attackers could move around seems to be varied, with the MSA incident scoped to specific Azure based applications. In the Golden SAML attack and JumpCloud incidents attackers could get free reign to a variety of services and even devices (as in the JumpCloud incident).
The Fast Identity Online (FIDO) Alliance is an open organization that seeks to reduce password dependency and has been publicly operating since 2013. The FIDO standard has given us a phishing resistant authentication process by building on the advances made in web browser security as well as the adoption of the cryptographic network protocols like Transport Layer Security (TLS).
The most recent FIDO standard, FIDO2, is based on public key cryptography, which is what ADFS and AAD MSA authentication rely on as well. This makes one wonder whether FIDO2 may also be susceptible to this kind of private key compromise. In the broader sense, the short answer is ‘no’.
Comparing FIDO2 with ADFS and AAD MSA is not an apples-to-apples comparison either. ADFS’ intended purpose is to provide a federation of trust for parties that wish to authenticate several applications using Active Directory credentials. ADFS cannot vouch for the actual credentials as it only indicates by means of trusted signature if the provided credentials, which another party verifies, are valid. The AAD with MSA authentication is a better comparison but falls short on the non-repudiation principle because a single public/private key pair is used across all accounts.
FIDO2 describes 16 Security Goals (SG) and 29 Security Measures (SM), each with their own unique features that contribute to the strength of the standard [11] [12]. Each SG has a SM mapping that helps us further understand the design strengths of FIDO2 [13]. To address the question of whether FIDO2 is susceptible to private key theft and token forgery we can look at SG-5 - Verifier Leak Resilience and SG-6 - Authenticator Leak Resilience in the SG – SM mapping table.
FIDO2 makes private key theft less feasible and more difficult since each user has their own private key for each identity provider login [14] [15]. FIDO2 requires a type of secure enclave, which is a hardware component that complies with several requirements for hardware attestation [16]. Strong cryptographic capabilities are also mandated that increase the difficulty level for attackers to clone authenticators or exploit potentially weak cryptographic algorithms [17]. A possible, but more expensive option is for attackers to physically gain access to the FIDO2 hardware device by physical theft, for example.
From documentation provided by Microsoft and Yubico, a FIDO2 compliant hardware authenticator vendor, AAD, must be used directly as ADFS is incapable of handling FIDO based authentication [18] [19] [20]. Microsoft has also indicated that their Azure Multi-Factor Authentication Server (MFA Server) will be deprecated at the end of September 2024 [21] and that any new deployments of MFA Server will not be possible. Microsoft Entra ID (the new name for Azure Active Directory) will be the new solution to handle MFA or passwordless authentication in the future [22].
Windows Hello for Business with FIDO2 support is another approach that can accommodate cloud and hybrid deployments where passwordless and FIDO-based authentication are used [23]. ADFS can still be present in environments with Windows Hello and FIDO, but the interactions will require additional authentication considerations for Single Sign On (SSO).
The World Wide Web (WWW) has changed significantly since the mid 1990’s. A combination of several key concepts such as the Domain Name Service (DNS), public key cryptography, secure Hyper Text Transport Protocol (HTTPS) with Transport Layer Security (TLS), and universally accepted web standards is today manifest in the form of a web browser. This has made it possible to navigate the web with a degree of certainty that would never have been possible before. Without this assurance the Internet as we know it today would be a minefield or toxic wasteland.
On the Internet it is important to authenticate not just the website but also the user’s identity. But how does the website know which user is trying to access it? It is possible to use public key cryptography, like how websites are validated, by issuing a certificate for each registered user. However, this is difficult to execute in practice using traditional approaches, due to the additional enrollment requirements imposed on end users. A much simpler approach has thus been widely adopted, namely the ubiquitous combination of username and password. Supplying a username and password during a login process enables the website to authenticate the user. The authenticated user is issued a session token, also referred to as a session cookie, which is handled by the web browser seamlessly. We know that the username and password are very portable and can be stolen or leaked.
Multi-Factor Authentication (MFA) was introduced to make it more difficult for attackers to abuse the stolen credentials, as out-of-channel authentication components such as push notifications to mobile authenticators are much more difficult to compromise at scale. Attackers thus adapted their techniques to performing Person-In-The-Middle (PITM) attacks that focused on stealing the session cookie directly. With this session cookie the attacker can now act as that authenticated user, even though they do not have the username, password, or additional authentication factors.
Mutual authentication using public key cryptography, when combined with cryptographic concepts to protect against replay attacks, makes it more difficult for attackers to perform a PITM style attack. Passwords and MFA with session cookies do not provide those types of protections.
The FIDO2 standard was created to guard against all known attack scenarios that could result in credential theft. It even has protection against PITM style attacks where the base URL of the host requesting authentication is included in the verification process. In the case of a classic PITM based attack, the authentication process cannot be concluded because the party that sits between the victim and spoofed website must have a distinct domain name and certificate identical what was used during the initial FIDO registration process. Does FIDO2 eliminate the associated problem with portable session cookies? The answer is ‘yes’, as a session must still be maintained to identify the user with the website for each interaction. Now attackers must find ways to inject themselves into the browser through other means, such as malicious browser extensions or vulnerabilities, where the latter is much more difficult to achieve than the former. Another approach would be to gain physical access to a host and then try to dump the browser cache containing the session cookies. It is unclear what risk the use of classical web proxies with valid TLS certificates may pose with approaches such as FIDO, but such proxies have caused an outcry in the past due to the privacy risk and inherit security concerns introduced by their use [24].
Think of the traditional session cookie as the titan Atlas from Greek mythology, who carries the earth or sky, depending on which version of the story you prefer, on his shoulders [25]. Like the titan, the session cookie is responsible for a user’s identity and carries the weight of that responsibility on its shoulders. One day someone will come to take away that burden, like Heracles who built the Pillars of Hercules to do so.
A PITM Attack against FIDO2 is conceptually feasible but appears hard to execute in practical terms. There is an edge case where an attacker, in control of the DNS lookup mechanism, poisons the DNS cache of a victim that resolves to a spoofed web site. The attacker would need a legitimate certificate for their spoofed web site, which is non-trivial. Having achieved this means that the base URL will match what the FIDO validation processes the browser follows will attest to. This scenario is possible if the attacker has control of a router and DNS service, but modern browsers have already implemented DNS over HTTPS (DOH) that makes it exceedingly difficult to interfere with. DOH is not always enforced especially if Wi-Fi captive portals are in play. A now defunct web standard called HTTP Public Key Pinning would also have been a valid mitigating strategy against spoofed digital certificates [26].
If the attacker can overcome the challenges of poisoning the DNS and obtain a digital certificate that will please the browser, then it might be possible to steal the mythical Atlas session cookie. Microsoft Hello for Business offers viable mitigation by storing the session cookie in the Trusted Platform Module (TPM) of the device running Windows. The session cookie is also renewed frequently and seamlessly, making this attack path more costly and less viable.
There is no doubt that in many cases the adoption of ‘cloud’ based systems for traditional use cases will improve the technical security posture of the business. There is a saying that “Nobody is qualified to configure and manage Microsoft Exchange except Microsoft themselves.” This is true in many similar cases, and many businesses will benefit from outsourcing technology systems and platforms to specialist operations in the cloud.
However, this reduction in technical risk comes at the cost of an increase of less obvious, non-technical risks, as follows:
1. The threat will adapt
Since cybercrime and other threats are driven by powerful systemic factors, we see historically, and we can predict that attack vectors will adapt to changes in the technology landscape. Crime will go where the ‘money’ is and the hacking ecosystem will evolve its capabilities to be effective in an emerging ‘cloud centric’ world. We may not know how this will happen yet, but the attacks described in this post expose some probable future trajectories. Another example we have seen is the deployment of malicious insiders.
2. Homogeneity & Contagion
Over the last two decades security and resilience has really suffered because of the ubiquity of the homogenous Microsoft desktop and server platforms, and the opportunities to specialize and scale that this presents attackers, i.e., an attack that works against one Microsoft system will work against all Microsoft systems. This dynamic persists and even accelerates as homogenous SaaS (Software as a Service) and PaaS (Platform as a Service) are adopted.
Furthermore, homogeneity also exacerbates contagion, i.e., the impact of vulnerability, attack or compromise can spread rapidly across interdependent environments, as we saw with WannaCry, notPetya and SolarWinds. The more we adopt homogenous cloud systems, the more we expose ourselves to this kind of contagion risk.
The more standardized platforms are and centralized, the less opportunity there is for alternative approaches to survive. As a result, we collectively actually lose our access to alternative technologies and approaches, which further reduces our resilience, for example by removing the option of ‘falling back’ in the case of a compromise or other failure. E.g. When M365 mail goes ‘down’, are we be able to find an on-prem mail server to replace it, and someone who knows how to use it?
3. Attack Surface Management
Attackers have always understood that most compromises happen because they are able to find a system that is vulnerable, rather than because they find a vulnerability in a system. As cloud adoption grows businesses will face the growing challenges of understanding and managing their evolving attack surface. Rather than track and reduce internet-exposed IP addresses and Ports, they now must learn to manage ephemeral systems, complex user and role permissions, diverse storage locations, API keys, compliance, and geopolitical risks and the like. This is already leading to frequent non-technical compromises where data or capabilities are simply exposed onto the Internet for anyone to access.
4. Two pillars
As SaaS, PaaS and other cloud-based systems become standardized, we are seeing an inevitable migration toward ‘web applications’ in which the code and data reside on a 3rd party cloud system and the rendering and UX are performed in a browser. That means that increasingly all the responsibility for security now rests on these two pillars – the cloud service provider and the browser vendor. These players have proven to be very capable in the past – and there is a lot of technical benefit to this approach – but from a system perspective we should be aware that the security and resiliency of cyberspace increasingly rests on just these two pillars.
5. Geopolitical Threats
Although we think of the cloud as something ‘ephemeral’, it is in fact comprised of actual computers located in actual datacenters and managed by actual people. These factors are all linked to specific geopolitical realities and therefore under the influence of the political forces and powers that govern those places. This concept was illustrated when Russia took control of the Internet in occupied Ukraine simply by redirecting the traffic from those physical locations into systems located in Russia, to facilitate censorship and surveillance. As hacking and cybercrime become increasingly influenced by politics and power, the geopolitical context of a cloud-based system becomes increasingly important. This is especially true when one considers that political realities are increasingly volatile and can easily change within the lifetime of a technology platform. In other words, what’s politically acceptable today may not be politically acceptable tomorrow. In adopting cloud-based systems and platforms, businesses must recognize that they are making themselves vulnerable to threats that may emerge when the political realities in the physical and political ‘homes’ of these platforms change.
6. Switching costs
Subscription-based businesses models (as we predominantly observe in the cloud) are highly incentivized to make ‘switching’ difficult for the customer. This is not always apparent but is deeply baked into the business prerogatives of these offerings. Once a business has chosen to adopt cloud platforms and systems it may be very difficult to switch to alternative options. This represents an obvious risk but is also a risk to resilience.
7. Responsibility, accountability, and transparency
Businesses should recognize that, while cloud providers may assume responsibility for certain elements of cybersecurity, the accountability for security failures will almost always still rest wholly with the businesses. In cloud offerings, where so many of the technologies, people and processes are obfuscated from the end user, this can make it very difficult for the client to understand and manage what their real risk is.
This can be illustrated by an article posted by Tenable CEO Amit Yoran in which Amit alluded to inherit risks of running on the cloud where the cloud vendor has critical vulnerabilities that could be exploited to gain access to tenants' data. Amit’s post mentioned that a researcher at Tenable found a vulnerability in Microsoft Azure that allowed the researcher to gain access to a financial institution’s cloud infrastructure. Tenable raised the issue with Microsoft in March 2023, but Microsoft has pushed the complete fix out to end of September 2023 27. From a risk point of view what does this mean? Who is responsible for any breach related to this? All we know is that Microsoft is aware of the weakness and hopefully they are keeping an eye on it.
What we know about the Golden SAML attack against ADFS, stolen JumpCloud admin API keys, and stolen MSA private keys, are that clever and well-resourced attackers have demonstrated that compromising authentication and signing keys are viable avenues of attack.
In the cloud age, attackers continue to try bypass, weaken, or abuse Identity Providers to gain access by pickpocketing the keys and walking in through any door. Properly handling sensitive key material will become more important than ever as Identity and Access Management are fundamental in the cloud computing era.
Account activity and account creation monitoring are therefore important means of detecting anomalous and malicious activity. Microsoft did provide free access to their Azure Purview Premium log auditing service for Azure clients to help identify suspicious account activity, but only after the incident caused public outcry [27]. It is not clear how long this courtesy will last.