#Cloudbleed: The rank system perspective.

Posted by Kumiko Oumae on Sunday, 26 February 2017 21:45.

Cloudflare’s bufffer overrun was dubbed ‘Cloudbleed’ as a historical reference to ‘Heartbleed’.

Why am I talking about this?

Some interesting events have occurred surrounding Cloudflare, one of the largest global CDNs, and I’ll take the opportunity to put some opinions out there about what has happened.

What a CDN is

A Content Delivery Network is a system of strategically positioned servers. Those servers maintain and accelerate the delivery of content. The main goals of a CDN are about speed, scalability and high-availability. A request from a consumer will generally be routed to the nearest geographic point-of-presence. The consumer’s physical distance to these servers has an impact on loading time. A closer and highly performing point-of-presence significantly improves user experience as a result of reduced loading time, lower latency and minimised packet loss. A Content Delivery Network also cuts operational costs by allowing businesses to effectively outsource the logistics and maintenance of these servers. This allows companies the ability to benefit from global load balancing and leverage the cost-savings that accrue due to economy of scale, because CDN provisioning is structured in the economic domain as an oligopoly.

Sounds nice, so what’s the problem?

There isn’t a problem in principle. In practice however, sometimes really bad things happen. When you have an oligopoly, the effect of someone accidentally placing an “==” equal sign in their code when they actually meant to write an “>=” greater-than-or-equal sign, can have pretty dramatic effects in terms of the number of people who might be affected by whatever happens as a consequence. Which, incidentally, is how ‘Cloudbleed’ happened.

It’s all part of the advantages and disadvantages of the present infrastructure. The advantages outweigh the disadvantages, but it means that this is the way that internet has developed and people have to basically be prepared for this kind of incident.

The story

The unwanted behaviour at Cloudflare was coming from an HTML parser chain that is used to modify webpages as they pass through the service’s edge servers. The parser carries out a range of functions, such as inserting Google Analytics tags, converting HTTP links to HTTPS links, finding strings that look like email addresses and then obfuscating them, and preventing malicious web bots from accessing some parts of a page.

When the HTML parser was used in combination with three Cloudflare features – email obfuscation, server-side excludes and automatic HTTPS rewrites – and if an HTML page being served to a consumer by a Cloudflare proxy had a specific combination of unbalanced tags, then a pseudo-random leakage of memory pages outside the boundary of what was supposed to be served would also be interspersed into what was being served.

This means that encryption keys, cookies, passwords, sections of POST data, chat messages from some online chat services, online password manager data, and HTTPS requests from other Cloudflare-hosted websites were being leaked pseudo-randomly.

Because the structure of the system is such that the proxies are shared between all Cloudflare customers, all customers were affected, and leaked pages of memory for pages being served on behalf of any given customer, were being interspersed among the expected responses for any other given customer.

Cloudflare optimises the performance of more than 5 million websites, and as this story unfolded, it really has become clear to everyone just how significant that number is. The duration of the ‘bleed’ is also significant, since this ‘bleed’ may have been occurring since 22 September 2016, and the period of greatest impact was between 13 February 2017 and 18 February 2017.

Furthermore, web crawlers and archivers, search engine cache services, corporate squid proxy-cache networks, and browser caches on consumers’ workstations globally were all downloading and holding the pseudo-random data that was ‘bleeding’ for the entirety of the duration of this period. It was just that most people didn’t understand what it was that they were seeing or where it was coming from, or otherwise didn’t notice it.

At that stage, it is not known whether anyone had realised it was happening before 19 February 2017, or whether it was exploited in any way.

What’s the appropriate response?

In situations like this, you have to decide on how good you think your luck is, and how important you think that you and your organisation are, and how thorough you are willing to have your response be. What you or your organisation chooses to do in response may be different from what you might recommend on a wider level to others. On principle, given the scale of the ‘bleed’ and the possibility that passwords may have become exposed, many security professionals are advocating that it may be best for all consumers to change their passwords for basically everything on the internet as soon as possible.

Another way of looking at it however is that the internet – much like the feudal structure of pre-modern Japan, or Korea, or India – has a kind of informal rank system. Messaging has to be different for different groups, because not everyone performs the same function, or has the same time available to devote to a particular task, and some people and groups tend to be more in scope of hostile state and non-state actors than others.

Changing all passwords everywhere, while technically the correct response for the ‘Brahmins and Kshatriyas’ of the internet, may seem like a complete overblown response to a scenario where 0.00003% of HTTP requests were affected, if narrated from the perspective of the ‘Vaishyas and Shudras’ of the internet.

In other words, sounding the alarm as loudly as possible could induce a kind of security fatigue among the ‘normal people’, and may even incentivise bad behaviour from ‘normal people’, since when mass-changing their passwords, they may be more likely to repeat the usage of many similar passwords across the services they use, and they may – in their haste – be inclined to reduce the complexity of their newly-crafted passwords.

In other words, sounding the alarm in the loudest and most severe way possible will have the effect of inducing the correct and thorough response from the custodians of key infrastructure – who already were going to display that correct response anyway regardless of the words in the media – while in fact also having the unintended effect of inducing a wrong or inadequate response from ‘normal people’.

It also has the effect of creating a ‘morning after bounty’, since for people who are engaged in signals collections and tailored access operations, this would be a luxurious time since the percentage of transmissions which will be about the changing of passwords would be spiking over the next one or two weeks if every individual in the entire world were asked to change all their passwords. Such adversarial actors would be incentivised to mount subversive campaigns during this time because the possible cost-benefit ratio of carrying out the project just tilted a bit more toward the ‘benefit’ side of the equation.

Thus, paradoxically, the panicked response to the already-fixed problem could be what in fact creates the environment in which a technically unrelated but socially ‘subsequent’ actual array of attacks could occur which otherwise may not have occurred.

Additional thoughts on Cloudflare

I of course do have criticisms of Cloudflare, but they are criticisms which are not about criticising the concept of what a CDN is, and rather, are more specific to Cloudflare as a company.

I’ll cover two issues.

I’ll start with the less concrete and more speculative one. For dissident groups that are not tacitly supported or at least allowed by the states in the North Atlantic, Cloudflare might present a risk to such groups because Cloudflare is within the jurisdiction of the United States and they could conceivably respond to legal requests made within the United States. Another factor to consider is that Cloudflare has taken dark funding and may actually be ‘on side’ with FVEY-related collections since at least 2012. Admittedly, it is difficult to substantiate this claim, but it’s something worth considering.

The more concrete criticism which I can definitely substantiate is Tor-related. Matthew Prince, the CEO of Cloudflare, took to his blog on 30 March 2016 to make what appeared to be a rather nuanced argument in favour of anonymity but against Tor in its present form due to the issue of malicious abuse of the network.

Much of what he wrote was eminently reasonable.

For instance, Prince suggests that Cloudflare could become friendly toward Tor under the circumstances where onion addresses were to begin using stronger hashing algorithms than the presently-existing SHA-1 80 bit hashing algorithm. Under such a circumstance, Prince suggested that the stipulation that onion addresses only be issued certificates if such certificates are EV certificates – which require extended validation procedures, cannot be issued automatically, and undermine the very anonymity which Tor was intended to promote – could be relaxed, as CA/B Forum would likely be open to discussing the automatic issuance of certificates in such a circumstance. Cloudflare could then allow its customers to create onion sites in some kind of automated way, and the issuance of certificates for those onion sites could also be automated. Tor traffic could then be whitelisted when it is directed toward those onion sites, while blacklisting could continue for Tor traffic which is directed toward the non-onion sites.

The world described in Prince’s suggestion would certainly be an interesting world to live in. However, we don’t actually live in that world.

Instead, we live in a world where Cloudflare alleges that 94% of the traffic directed toward its customers across the Tor network is ‘malicious’, based on the data from the Cloudflare IP reputation system. That may or may not be true, but given that there are a lot of people using Tor and a limited number of Tor exit nodes, this means that Cloudflare is either CAPTCHA-challenging or blocking 80% of Tor IP addresses and this number is steadily growing. This has the effect of discouraging people who have legitimate intentions from using Tor to access sites that are protected by Cloudflare.

Prince’s explanation for this is that Cloudflare is forced to behave that way in order to protect their customers from abuse, and that they can only rely on IP reputation because there is no way to do browser fingerprinting to differentiate between different Tor browsers, because the Tor browser is specifically designed to lessen the ability to generate unique fingerprints. Cloudflare can in such a circumstance only evaluate the communication on the basis of the reputation of the IP and the content of the request. That is also true and is a reasonable explanation, but at the same time it is what it is.

While Cloudflare’s default behaviour is to CAPTCHA-challenge Tor, it is possible to add the country ‘T1’ to the Cloudflare firewall whitelist, which would exclude Tor users from having to complete CAPTCHA-challenges. This behaviour became possible in late 2016, and so ‘dissident’ sites that continue to present challenges to Tor users are responsible for choosing or not choosing that behaviour.

In a kind of funny irony, Prince also notes that 18% of all global spam begins with an automated bot harvesting publicly available email addresses through the Tor network. Given that a significant subset of this spam is phishing-related, it is an unintentionally hilarious statement by Prince because 40% of all phishing sites in 2015 were using certificates that were issued by Cloudflare’s ‘Universal SSL’ service.

Furthermore, Cloudflare’s ridiculous ‘Flexible SSL’ – billed by them as ‘the easiest secure sockets layer ever’ – provides what is essentially security theatre between Cloudflare’s proxy and the client, without any of the actual security that would be required between the client tier and the middleware, and has the damaging effect of giving consumers a false sense of security. The so-called ‘Flexible SSL’ is so ‘flexible’ in that scenario that it is essentially non-existent. Consumers have been trained to look for the padlock in the address bar before submitting sensitive information to any website. ‘Flexible SSL’ grants phishing sites and other malicious actors the ability present that padlock to users with minimal effort. ‘Easiest SSL ever’, indeed.

I tend to prefer actual, real, end-to-end SSL to be the only possible implementation. But hey, that’s just me, right?

But now I’m just bullying them, so I’ll dial it back a bit and bring this article to a close. It’s possible that the people at Cloudflare didn’t anticipate that their services would be abused in these ways, and they did get unlucky with the Cloudbleed buffer overrun incident, but in any case, those who are inside glass houses should be careful not to throw stones. Matthew Prince should reflect on the recent incident and refrain from throwing any stones at anyone for at least a couple months.

Was Majorityrights.com affected by Cloudbleed?

This should go without saying, but I will say it anyway.

We don’t use Cloudflare here. As such, Majorityrights.com was not affected by any of the events described in this article.

If we were to ever have a burning need to actually use a CDN here, for various reasons I would probably suggest using either Yottaa or KeyCDN anyway, and not Cloudflare.

Kumiko Oumae works in the defence and security sector in the UK. Her opinions here are entirely her own.

Tags: Blogs & Blogging, Business & Industry, Economics & Finance, Globalisation, Humour, Law & Order, Media, Military Matters, Organised Crime, Political analysis, Psychology, War of Discourse, War on Terror, World Affairs

Comments:

Posted by James Bowery on Tue, 28 Feb 2017 14:39 | #

—————Forwarded message—————
From: James Bowery <jabowery@gmail.com>
Date: Mon, Feb 20, 2017 at 9:52 PM
Subject: Re: James Bowery shared “AOB” with you
To: (omitted)

OK, if it’s back to the drawing board, RFC 7927 is probably worth perusing.

https://tools.ietf.org/html/rfc7927

The reason is that these “named data objects” (NDOs) are actually just “versions” of stateful objects. The “name” is the checksum of the version. File versions are one kind of NDO. I haven’t looked into the rfc enough to tell if they have a unique ID for a stream of versions (a stateful object).

The reason this “information centric networking” is probably the right way to go is that this is kind of an inevitable next stage for the internet for a lot of reasons. One _big_ reason they aren’t really paying much attention to—at least not explicitly—is it puts network effect monopolies like Facebook, Youtube, etc. out of business with P2P social networks. What they talk about is how much more efficient the network can be—that can beat the shit out of ordinary multicast video, can handle 2 way real time interaction better (skype for example), etc.

Here are the basic ideas I put into AOB that probably should be incorporated into something like 7927:

Each object is a stream of versions.
Each objects has a network unique ID: OID.
A version of an object has an network unique ID, VID, that is the checksum of its contents.
Each object has a list of its versions.
Each item in the version list has the VID and 2 times: [Last read] > Last written.

Here’s where it gets interesting:

Each item in the version list _also_ has a virtual machine OID. In other words, versions are active. The virtual machine identified by the OID runs the version. A VM OID must always exhibit the same behavior. If you change the behavior of, say, Perl, you have to give it a new OID.
Yes, this means VM’s also have versions and yes this means those versions also have VMs that run them.
This bottoms out with OIDs that are hardware specs (instruction sets etc.)—and you have to actually store the hardware specs in lieu of executable software. These specs get written out out to archives so that data archaeologists can reconstruct the behavior from hardware up. (This is the “Archive” aspect of “Archive Object Base”.)
Of course since OIDs designate an abstract behavior, it is not necessary to keep going down levels of abstraction till you hit hardware—you can just pull up the implementation of that behavior on your present hardware if you have it. “Oh, Yeah, I have OID = Perl 5.12 compiled for my strange hardware that no one will ever know about because I’m on Alpha Centauri and I built my own silicon because I wanted to… let’s go!”

And here’s where the “make” logic comes in—which should sound familiar:

Every OID that is not an input to the system has a “make” OID to generate its next version. The first time an OID is accessed there is no most recent version (cache) so, it has to run the “make” OID. Running a make OID will access other OIDs (hence most their most recent (cached) versions if available—or running _their_ make OIDs, etc.). Each accessed OID will add the accessing OID (not it’s make OID) to its list of dependent OIDs.

SO

Whenever a new version of an OID is written, it goes down the list of dependent OIDs and notifies them to void their cache—that is, mark the dependent OID to run their makes the next time they’re accessed. It then deletes its list of dependent OIDs. If they want on it’s list again, they have to access it again.

An optimization is to keep track of “observations”—that is, if an OID is bound to an output device, it is marked as under observation AND all of the OIDs it accesses are also marked as under observation. When an OID is under observation, the “void cache” notice is interpreted as “rebuild”—it eagerly evaluates the new version and sends on “rebuild” signals to its dependent OID list. It still deletes its list of dependent OIDs.

An further optimization is to compare the VID of the prior version to the new one and do nothing further if they’re the same—this stops propagation of “no change” and meaningless rebuilds.

I haven’t gotten into atomic actions yet, which was a big part of AOB but this should be enough to chew on for a bit.

—Jim

For a bit of my background here—going back to the dawn of TCP/IP in my interactions with David P. Reed (co-originator of the end-to-end principle) you might find this circa 2000 Slashdot post informative.

Posted by Code Review on Tue, 05 Feb 2019 05:25 | #

For the record, Kumiko gave Bowery the highest ranking in code review of his solution…