#Cloudbleed: The rank system perspective.
Cloudflare’s bufffer overrun was dubbed ‘Cloudbleed’ as a historical reference to ‘Heartbleed’.
Why am I talking about this?
Some interesting events have occurred surrounding Cloudflare, one of the largest global CDNs, and I’ll take the opportunity to put some opinions out there about what has happened.
What a CDN is
A Content Delivery Network is a system of strategically positioned servers. Those servers maintain and accelerate the delivery of content. The main goals of a CDN are about speed, scalability and high-availability. A request from a consumer will generally be routed to the nearest geographic point-of-presence. The consumer’s physical distance to these servers has an impact on loading time. A closer and highly performing point-of-presence significantly improves user experience as a result of reduced loading time, lower latency and minimised packet loss. A Content Delivery Network also cuts operational costs by allowing businesses to effectively outsource the logistics and maintenance of these servers. This allows companies the ability to benefit from global load balancing and leverage the cost-savings that accrue due to economy of scale, because CDN provisioning is structured in the economic domain as an oligopoly.
Sounds nice, so what’s the problem?
There isn’t a problem in principle. In practice however, sometimes really bad things happen. When you have an oligopoly, the effect of someone accidentally placing an “==” equal sign in their code when they actually meant to write an “>=” greater-than-or-equal sign, can have pretty dramatic effects in terms of the number of people who might be affected by whatever happens as a consequence. Which, incidentally, is how ‘Cloudbleed’ happened.
It’s all part of the advantages and disadvantages of the present infrastructure. The advantages outweigh the disadvantages, but it means that this is the way that internet has developed and people have to basically be prepared for this kind of incident.
The unwanted behaviour at Cloudflare was coming from an HTML parser chain that is used to modify webpages as they pass through the service’s edge servers. The parser carries out a range of functions, such as inserting Google Analytics tags, converting HTTP links to HTTPS links, finding strings that look like email addresses and then obfuscating them, and preventing malicious web bots from accessing some parts of a page.
When the HTML parser was used in combination with three Cloudflare features – email obfuscation, server-side excludes and automatic HTTPS rewrites – and if an HTML page being served to a consumer by a Cloudflare proxy had a specific combination of unbalanced tags, then a pseudo-random leakage of memory pages outside the boundary of what was supposed to be served would also be interspersed into what was being served.
This means that encryption keys, cookies, passwords, sections of POST data, chat messages from some online chat services, online password manager data, and HTTPS requests from other Cloudflare-hosted websites were being leaked pseudo-randomly.
Because the structure of the system is such that the proxies are shared between all Cloudflare customers, all customers were affected, and leaked pages of memory for pages being served on behalf of any given customer, were being interspersed among the expected responses for any other given customer.
Cloudflare optimises the performance of more than 5 million websites, and as this story unfolded, it really has become clear to everyone just how significant that number is. The duration of the ‘bleed’ is also significant, since this ‘bleed’ may have been occurring since 22 September 2016, and the period of greatest impact was between 13 February 2017 and 18 February 2017.
Furthermore, web crawlers and archivers, search engine cache services, corporate squid proxy-cache networks, and browser caches on consumers’ workstations globally were all downloading and holding the pseudo-random data that was ‘bleeding’ for the entirety of the duration of this period. It was just that most people didn’t understand what it was that they were seeing or where it was coming from, or otherwise didn’t notice it.
At that stage, it is not known whether anyone had realised it was happening before 19 February 2017, or whether it was exploited in any way.
What’s the appropriate response?
In situations like this, you have to decide on how good you think your luck is, and how important you think that you and your organisation are, and how thorough you are willing to have your response be. What you or your organisation chooses to do in response may be different from what you might recommend on a wider level to others. On principle, given the scale of the ‘bleed’ and the possibility that passwords may have become exposed, many security professionals are advocating that it may be best for all consumers to change their passwords for basically everything on the internet as soon as possible.
Another way of looking at it however is that the internet – much like the feudal structure of pre-modern Japan, or Korea, or India – has a kind of informal rank system. Messaging has to be different for different groups, because not everyone performs the same function, or has the same time available to devote to a particular task, and some people and groups tend to be more in scope of hostile state and non-state actors than others.
Changing all passwords everywhere, while technically the correct response for the ‘Brahmins and Kshatriyas’ of the internet, may seem like a complete overblown response to a scenario where 0.00003% of HTTP requests were affected, if narrated from the perspective of the ‘Vaishyas and Shudras’ of the internet.
In other words, sounding the alarm as loudly as possible could induce a kind of security fatigue among the ‘normal people’, and may even incentivise bad behaviour from ‘normal people’, since when mass-changing their passwords, they may be more likely to repeat the usage of many similar passwords across the services they use, and they may – in their haste – be inclined to reduce the complexity of their newly-crafted passwords.
In other words, sounding the alarm in the loudest and most severe way possible will have the effect of inducing the correct and thorough response from the custodians of key infrastructure – who already were going to display that correct response anyway regardless of the words in the media – while in fact also having the unintended effect of inducing a wrong or inadequate response from ‘normal people’.
It also has the effect of creating a ‘morning after bounty’, since for people who are engaged in signals collections and tailored access operations, this would be a luxurious time since the percentage of transmissions which will be about the changing of passwords would be spiking over the next one or two weeks if every individual in the entire world were asked to change all their passwords. Such adversarial actors would be incentivised to mount subversive campaigns during this time because the possible cost-benefit ratio of carrying out the project just tilted a bit more toward the ‘benefit’ side of the equation.
Thus, paradoxically, the panicked response to the already-fixed problem could be what in fact creates the environment in which a technically unrelated but socially ‘subsequent’ actual array of attacks could occur which otherwise may not have occurred.
Similar to a problem that has been discussed in relation to CT
If all of this sounds similar to the problem of managing a population’s response to terrorist threats while also maintaining a strong counter-terrorism posture, you’d be correct. It is basically similar.
It also comes with the same danger faced in erring too much to the side of ‘downplaying’ while trying to avoid inducing ‘panic’. Downplaying an incident so as to avoid triggering inadequate or inappropriate responses from the ‘normal people’, deprives them of information and can make people become suspicious of the intentions of the system. It can make professionals look like they are ‘incompetent’ or even that they ‘have something to hide’.
In such a case, a panicked response in the general public as a result of the feeling that they are being lied to by authorities or that authorities do not appreciate the scope and scale of a threat, may inadvertently end up leading to the very same damaging outcomes that the authorities were attempting to avoid in the first place, with the additional downside being that distrust of the persons in authority and the proliferation of conspiracy theories become added to it.
This is why it’s vital to find ways to assess the mood of the general public and to model their responses in some way, in response to almost any issue in society. The messaging for different geographic, occupational, and socio-economic groups has to somehow be different without being completely contradictory between themselves. If people in authority in any given situation are unable to leverage the social domain with sufficient adeptness to do that, then they may lose control of the narrative which is something that can have potentially unpredictable or even disastrous consequences.
Mastering the social domain and producing outcomes that mesh with and evolve with operational necessities, is something that is vital to continuing effective governance, be it governance of a multinational company which controls one of the Content Delivery Networks, right the way up to, say, governance of a country or of a regional supra-state.
Additional thoughts on Cloudflare
I of course do have criticisms of Cloudflare, but they are criticisms which are not about criticising the concept of what a CDN is, and rather, are more specific to Cloudflare as a company.
I’ll cover two issues.
I’ll start with the less concrete and more speculative one. For dissident groups that are not tacitly supported or at least allowed by the states in the North Atlantic, Cloudflare might present a risk to such groups because Cloudflare is within the jurisdiction of the United States and they could conceivably respond to legal requests made within the United States. Another factor to consider is that Cloudflare has taken dark funding and may actually be ‘on side’ with FVEY-related collections since at least 2012. Admittedly, it is difficult to substantiate this claim, but it’s something worth considering.
The more concrete criticism which I can definitely substantiate is Tor-related. Matthew Prince, the CEO of Cloudflare, took to his blog on 30 March 2016 to make what appeared to be a rather nuanced argument in favour of anonymity but against Tor in its present form due to the issue of malicious abuse of the network.
Much of what he wrote was eminently reasonable.
For instance, Prince suggests that Cloudflare could become friendly toward Tor under the circumstances where onion addresses were to begin using stronger hashing algorithms than the presently-existing SHA-1 80 bit hashing algorithm. Under such a circumstance, Prince suggested that the stipulation that onion addresses only be issued certificates if such certificates are EV certificates – which require extended validation procedures, cannot be issued automatically, and undermine the very anonymity which Tor was intended to promote – could be relaxed, as CA/B Forum would likely be open to discussing the automatic issuance of certificates in such a circumstance. Cloudflare could then allow its customers to create onion sites in some kind of automated way, and the issuance of certificates for those onion sites could also be automated. Tor traffic could then be whitelisted when it is directed toward those onion sites, while blacklisting could continue for Tor traffic which is directed toward the non-onion sites.
The world described in Prince’s suggestion would certainly be an interesting world to live in. However, we don’t actually live in that world.
Instead, we live in a world where Cloudflare alleges that 94% of the traffic directed toward its customers across the Tor network is ‘malicious’, based on the data from the Cloudflare IP reputation system. That may or may not be true, but given that there are a lot of people using Tor and a limited number of Tor exit nodes, this means that Cloudflare is either CAPTCHA-challenging or blocking 80% of Tor IP addresses and this number is steadily growing. This has the effect of discouraging people who have legitimate intentions from using Tor to access sites that are protected by Cloudflare.
Prince’s explanation for this is that Cloudflare is forced to behave that way in order to protect their customers from abuse, and that they can only rely on IP reputation because there is no way to do browser fingerprinting to differentiate between different Tor browsers, because the Tor browser is specifically designed to lessen the ability to generate unique fingerprints. Cloudflare can in such a circumstance only evaluate the communication on the basis of the reputation of the IP and the content of the request. That is also true and is a reasonable explanation, but at the same time it is what it is.
While Cloudflare’s default behaviour is to CAPTCHA-challenge Tor, it is possible to add the country ‘T1’ to the Cloudflare firewall whitelist, which would exclude Tor users from having to complete CAPTCHA-challenges. This behaviour became possible in late 2016, and so ‘dissident’ sites that continue to present challenges to Tor users are responsible for choosing or not choosing that behaviour.
In a kind of funny irony, Prince also notes that 18% of all global spam begins with an automated bot harvesting publicly available email addresses through the Tor network. Given that a significant subset of this spam is phishing-related, it is an unintentionally hilarious statement by Prince because 40% of all phishing sites in 2015 were using certificates that were issued by Cloudflare’s ‘Universal SSL’ service.
Furthermore, Cloudflare’s ridiculous ‘Flexible SSL’ – billed by them as ‘the easiest secure sockets layer ever’ – provides what is essentially security theatre between Cloudflare’s proxy and the client, without any of the actual security that would be required between the client tier and the middleware, and has the damaging effect of giving consumers a false sense of security. The so-called ‘Flexible SSL’ is so ‘flexible’ in that scenario that it is essentially non-existent. Consumers have been trained to look for the padlock in the address bar before submitting sensitive information to any website. ‘Flexible SSL’ grants phishing sites and other malicious actors the ability present that padlock to users with minimal effort. ‘Easiest SSL ever’, indeed.
I tend to prefer actual, real, end-to-end SSL to be the only possible implementation. But hey, that’s just me, right?
But now I’m just bullying them, so I’ll dial it back a bit and bring this article to a close. It’s possible that the people at Cloudflare didn’t anticipate that their services would be abused in these ways, and they did get unlucky with the Cloudbleed buffer overrun incident, but in any case, those who are inside glass houses should be careful not to throw stones. Matthew Prince should reflect on the recent incident and refrain from throwing any stones at anyone for at least a couple months.
Was Majorityrights.com affected by Cloudbleed?
This should go without saying, but I will say it anyway.
We don’t use Cloudflare here. As such, Majorityrights.com was not affected by any of the events described in this article.
Kumiko Oumae works in the defence and security sector in the UK. Her opinions here are entirely her own.
Post a comment: