14 October 2018
DMITRY KOHMANYUK: Hello everyone. Attention folks, ladies and gentlemen, this is our next Plenary session, and this is Dmitry from RIPE PC, Peter Hessler is my co‑chair. We have three talks today, plus one additional remembers of Jon Postel by Daniel Karrenberg, that will be just before lunch, so, hopefully we'll do the usual time, maybe we'll be just a little more time needed but I don't think there will be a big problem for you guys. So, I'd like to introduce Lorenzo Cogotti. Let's assume about half an hour, a little bit less, so maybe 20, plus five or ten for questions.
LORENZO COGOTTI: Hello everyone. I am Lorenzo Cogotti, I'm the co‑founder of the Alpha Cogs and I work at the Italian National Research Council of Italy.
Now, it's my first talk, does it show? So, today I would like to introduce some tools that we developed inside the Isolario project to ease the BGP analysis and data retrieval.
Now, most of you are familiar with the BGP route collectors. A BGP route collector is a machine that collects BGP data and from other AS typically other routers or feeders, not collaborated with the project and choose to share their routing table either partially or entirely.
This data is then collected using the MRT format, which is defined in an RFC. The data is dumped periodically, so it depends on the format which information is collected. There are snapshots which collect, then tie the data in a moment, in a certain moment, and updates that only collect the data in a certain period of time.
There are various route collecting projects all over the world and the Isolario is probably the youngest. What sets it apart from most of the other projects is that it tries to provide information on the routes in realtime since day one.
Now, the problem is that the BGP data is increasing over time and some BGP capabilities made it work, for example the ADD PATH capability inflates the data significantly, as you can see in that graph over there.
In 2018, Isolario started to support that port capability and as you can see, the data increased massively in that period of time.
Now, most tools don't provide support for these capabilities and others that are significant for the Isolario project. As you can see, we examined the various tools and apparently only BGP dump can handle all the data, all the capabilities that we are interested in.
Other are either maintained or don't support all the stuff that we are interested in.
So, our solution was to dive right into C, which is not pleasant but necessary, and we tried to create a solution that was efficient enough for our needs. And that was hopefully up to date.
We choose C because it is easy to wrap it in more abstract languages and it is close to metal enough for our need to keep us sufficient and take efficient and take the most out of everything available.
We didn't regulate to the regular subset but we get into the C 99 language, which allows us to optimise our locations and hopefully just a little bit more code readability so that we could keep the code lesser prone.
Now, taking a step further, we also employed all the new features in the C11 standard such as multithreading optimisiation use atomic variables.
Now, you don't have to worry. We don't want you to use plain C, it is not a pleasant experience. So, really, expert people might choose to use the C language. In fact, our project is divided in two libraries, into two distinct tools, the library itself which is in C, and you can of course use it if you have a special need such as ourselves. And you can also use the actual tool which is a friendly UNIX utility with a grep friendly output, which most of you will find more pleasant and easy to work with. It supports all the most widely available formats for MRT data. And it also allows for filtering, for packet filtering, which is the most interesting part, actually. You can use it to filter all the packets using expressions, arbitrarily complex expressions, and in fact you can also filter AS paths using a language very similar to expressions which I believe most of the network administrators will be familiar with given their job.
And you can see an example of that filtering by sub‑net in the AS paths. I believe this format might ‑‑ this feature might prove extremely useful for BGP analysis. In fact, it can do a lot more, but I will obviously keep it shorter for time reasons.
And to see if our efforts were successful, we tried to perform some benchmarks with the tools that you saw before in the slides. We took the month of July from various collectors to try and produce various datasets that could exercise our code in various ways. We took the month of July, the first snapshot of the month of each route collector, we then compressed it to eliminate the compression of it, in fact we were interested only in the performance of our tool, and we averaged the results to see if we made a decent tool or if it was completely useless.
Now, you can see on the right side, the timing benchmarking in algorithm scale of each tools. I don't know if you can read it easily, but basically, we analysed the various ‑‑ all the tools that we found. And on the right side, you can see the memory consumption of our tool compared to our tools. And on top of the slide you can see the amount of data that we analysed.
As you can see, the BGP scanner utility behaves much better in terms of performance, which was our original goal. On the memory consumption side, you can see that BGP performs better on other agent, but given the fact that we are in the order of megabytes it doesn't really matter for us much compared to the improvement in performance.
Now, keep in mind that you can use the library and have similar results, if not better, if you have special needs, obviously. No need to use C.
Now, we eliminated some of the that weren't able to process the following data or that were plainly too slow for our needs so you won't see that many tools taken into account in the following slides.
For the second benchmark, compared to the first one in which we only analysed IPv6 data, in this benchmark we will analyse IPv4 and IPv6 data, so mixed, to see if it makes a difference in performance, and as you can see, the difference is not there. We are approximately 7 times faster than most other tools. And the memory usage is there, it's the same thing as before. So the data doesn't really affect the performance much.
Now, this is what happens when ADD PATH is taken into the picture. As you can see, the data is massive compared to the other slides, and by the way, we are grateful to the NLNOG RING for providing us with 68 full IPv4 tables and 69 full IPv6 tables, we are really grateful for that, thank you.
And the performance difference here is massive. Keep in mind that we are on an algorithmic scale. We are approximately 12 to 14 times faster compared to BGP dump. This means that an analysis that will take one hour might take 12 hours with other tools, and the situation is only going to get worse with the increase of data that will be probably there in the following years.
Now, just for show, we made some benchmarks with filtering. In the other benchmarks, the results were not filtered, so it was just a regular dump into a readable format. Now with filtering enabled, the performance is better, because obviously if we are going to discard a packet we are not going to format it or process it any further, so the performance will get better, so the more you filter your results, the better it will get. And this is desirable for performance reasons. And maybe we can get better results out of the regular mode of operation too if we are motivated enough; for example, by employing multitrading, but we are quite satisfied with the results this far, so probably it's not in the near future.
So, this is all for my first presentation. I apologise for my clumsiness, and in the end, I would like to let you know that all the tools and the library are Open Source and you can get them visiting our GitLab web page or you can also get some binaries and some good old configure making of the project as opposed to the version that we have on GitLab. Well, come visit us any time in our GitLab and feel free to contribute to the project or to make me any questions that you might have in your mind.
Thank you very much for your attention.
PETER HESSLER: Thank you very much. If you have any questions, please come to the microphone, please remember to state your name and affiliation and keep your statements short, preferably extremely short, and questions specific.
AUDIENCE SPEAKER: Pascal Merindol from the University of Strasbourg. Thank you for your presentation. I noticed on slide 13 that you show that the memory consumption of your technique out performed the one of BGP dump with the ADD PATH while it was the opposite in the next one. On this one BGP scanner out performed BGP dump in terms of memory consumption. Why it was the opposite when the ADD PATH feature was not enabled. Do you have any explanation for that?
LORENZO COGOTTI: Thank you very much for your question. Yes, I should have made that more clear, probably.
The ADD PATH capability really inflates the data but the data is still available into one single packet. We went to a great deal of pain to try not to over‑allocate the memory in that case. So basically, probably I have not examined the BGP dump code, but I assume that they perform more allocation to try to accommodate this capability. While we can process the packet in the same amount of memory. So probably that's a pathological case in BGP dump, but I'm not sure actually, that's just speculation, keep in mind.
AUDIENCE SPEAKER: Fabian Raab from the Technical University of Munich. Thanks for the talk. I would be interested in which format the tool will output the data. I mean, it gives you a nesting in aggregation in something like dicta, something like that plain BGP format can be very, very unhandy to analyse. Thus, does the tool pre‑possessing the data and all the prefix or in which form does it output the data?
LORENZO COGOTTI: In which form does it output? So, the tool can handle regular MRT dumps. It doesn't handle vast variety of obsolete MRT data formats, so basically it handles the MRT table dump too and RIB snapshots in input. It produces the output in a format which is ‑‑ which is this one, so a text format which is typically grep friendly. This because we have a lot of tools that rely ‑‑ a lot of tools in the Isolario project that rely on this format so we couldn't really switch to any other popular format such as BGP dump, but a BGP output mode is in the works, we intend to provide it very soon and even a JSON dump so that it can be easily integrated with a lot of web‑based tools.
AUDIENCE SPEAKER: It would be nice. Does it mean that it's also limited to fixed hard coded attributes?
LORENZO COGOTTI: Of BGP attributes? Well right now, yes, but I ‑‑ if the demand arises we could make it programmable. I don't think it will be a huge problem because the whole filtering engine is basically similar to RVM so it can be configured at Runtime. So it could be done, probably.
AUDIENCE SPEAKER: It would be cool. Thank you.
PETER HESSLER: Okay. Thank you very much.
Next up is Florian Streibelt.
FLORIAN STREIBELT: Today I'm presenting a joint work with a couple of people you see on the slide. And if one of my colleagues would give this talk, you all know it would probably look like this.
So, what we did is, we looked at the usage of BGP communities on the Internet and we noticed some strange effects or some strange things that could happen, and for me as a network researcher with some operator background, I feel it's my responsibility to raise awareness here or to at least tell people what we found in this data.
Let's get started at the beginning.
During the last couple of years we see a huge increase in usage of BGP communities. They increased almost threefold, one in threefold during the last eight years, and this means for me, as a researcher, it makes sense to look into what's happening there, what they are used for, what is the impact of BGP communities.
But what are BGP communities? They are just an Internet 32‑bit field. Defined ages ago. And the problem we have just this convention that its written as an AS number and value and the AS number here can be the recipient or the sender of a community, so you can use it to signal something or to attack something. And the problem with this way of defining a communities is, it's up to the peers to agree up on the meaning of this value. It's not even fixed that ASN here is an AS number. It could be anything.
We now also have large communities, they were needed to enable ASes with 4‑byte AS numbers to use communities again. And they introduced something that's called the global administrator. So now we have the fixed field of the AS number in front and we know who is responsible for defining the meaning of the community value.
But unfortunately they are not really used right now. We hope this will change, but for this talk I will have to focus on the traditional short communities that we are widely used in contrast.
So how are they being used? We have two groups of communities: We have the informational communities, they are sort of passive semantics used for location tagging or RTT tagging, things like that; and we have action communities, which have action semantics, they trigger blackholing and things like that. The problem here again is, the peers can define the meaning by itself. By looking at the community value you cannot tell what kind of community it is, what it's doing, if it's doing anything at all or if it's just random information added to a path.
Now, given the increase in popularity of BGP communities that we have just seen on the first slide, and the ability of triggering actions for me as a researcher, it comes just another question into my mind, and this is, what could possibly go wrong?
And the answer isn't so easy to get. Because another thing that we noticed is that communities do propagate through the whole Internet and some people seem to not expect this. If you look at the data, you'll see that 14% of transit providers are propagating communities they receive, and while this ratio seems to be small, because the network is so densely connected today, this means that the communities do propagate a lot. It shouldn't be surprising because we have two RFCs that define the propagation behaviour of communities. They say they are a transitive option or they should be propagated and the other one says you should scrub just your own that you use for signalling internally but forward all of the others. But still, many people seem to not have noticed that communities do propagate that much.
Now, why is this a problem? I see a potential for misuse there because these propagated values or communities can trigger actions, multiple hops away in other ASes and because we have no way of knowing what they mean because they are just on the peers to agree what they do, there is no way to find out if it's intended behaviour or a random artifact of configuration issue. Even if you know the meaning, it's hard to spot who set the community. The question is, are there also unintended consequences in the usage of communities? And our assessment in the end is that there is a high potential for attacks using communities.
But let's go back to our observations first.
We use the BGP dump from all of the publicly available BGP collectors and we find that more than 75% of the announcement messages do carry at least one community, so they are really widely in use. And if you look at the transit distance of communities, what you would see is, here we have an ETCF. In the X axis you have the travel distance of communities on the AS path and you see that 10% of communities have a hop count of more than 6, so they travel more than 6 AS hops in the routing infrastructure. And 50% have a hop count of at least 4 ASes and if you know that the mean distance of ASes or the median length of the AS paths on the Internet is around this, this means that they travel quite a lot.
We also observed one community travelling 11 hops. So we have quite long AS paths still there.
Taking a step back now looking at infrastructure or at its topology to explain what I mean with propagation or how communities do propagate. Here we have 4 ASes. One is announcing some prefix and now taging with a community, 1, 2, 3, so it wants to signal something to the AS3, which is on the path on the upper right. But AS4 is also receiving its community because AS1 is just adding this to the regular announcement and not doing any selective announcement or things like that because it just talks to AS2. So, while this is intended to signal something to AS3, AS4 is also receiving this announcement with the community added. And our work, what we do is we look at the route collectors and if you see AS3 and 4, they both receive the same AS path with the same communities, and in AS3 we call this community to be on path because the AS number 3 is also present in the AS path of the announcement. So this community would be on path when we investigate it in Layer3, while in AS4 we call this community to be off path in our work to distinguish this from a community that's in the AS path. So there you'll see AS3 is not in the AS path 421 but still present in the BGP table.
If we now cut off the AS number and forge these by off and on path. What we see is different distributions of the values being used. On the left side you see the off path communities and there you see the blackholing communities are used quite a lot. So 66 is one of the defined blackholing communities, and this is present off path more than on path. And on path, you see these values seem to be chosen by operators because they look nice, they are easy to memorise. And we think this is because the blackholing community is being announced to other peers to trigger blackholing, leaking of path to other ASes that do not implement remote trigger blackholing, so they do not understand the value or the meaning of this value and are not filtering it because they do not implement the RFC for this, so they do not advertise or export. It's propagated to other peers and could potentially trigger actions there.
And to assess this we did some experiments, first in the lab and then also on the Internet. And I will show two scenarios from our paper here in this talk. Of course there are others in the paper and the paper is also linked in the end of the presentation.
So the two scenarios I will talk about are, first, remote‑triggered blackholing and the other is a traffic redirection attack which we were both able to conduct on the Internet.
Let's first talk about how remote triggered blackholing is supposed to work. A lot of you will probably know it. Others maybe not. So we have AS1 and again announcing a prefix P, just regular BGP. It will receive traffic for prefix P and we all know sometimes you get more traffic than you want. You call this service attack, and one way to mitigate this is AS1 would announce the remote triggered blackholing community together with a P or more specific to AS2. This supports remote trigger blackholing and what will happen here is AS2 will continue to announce P to the other peers but it should announce without remote triggered blackholing community. But also drop the traffic towards prefix P, so the link between AS1 and AS2 is relieved of the source attack traffic to prefix P. In order for this to work, you should install some safeguards. First of all, the blackholing provider AS2 should ensure that only customers can drop traffic to their own prefixes and you should have different policies for customers and peers. And on receiving the remote triggered blackholing community the RFC says you should install or add no advertising no export as well so this does not propagate to other members. And we have seen this in our experiments that sometimes one upstream is even translating the remote trigger blackholing community and communicating it to another stream which also did blackholing which was not intended by us or anybody else. So in the end we would try and do a selective blackholing but end up with a complete blackholing of the prefix.
And how it should not work. AS1 is again announcing the prefix P to 3 ASes receiving regular traffic we have a backup bath or not nice, a longer path over AS2 and then AS2 gets the role of an attacker and starts to add the blackholing community. And it should not be possible in AS is now reacting to this and drops packets but it actually does. So, we have an attacker that is not on the preferred path, that is able to completely drop packets to a prefix P, so this is a more dangerous attack than just traverse AS2, then you would be able to drop parts of the traffic and not the complete traffic to P which is possible with this attack. A variant of this attack is if AS2 is not on the path, it could use hijacking and do the same. And we have noticed that this can be a problem because some people are not implementing proper filtering on the prefixes they receive so they have prefix filters but only for regular BGP announcements.
And we confirmed both of these attacks, they work. And we have seen some other reasons as well, the blackholing prefix sometimes of course is a more specific, like a 32 and then they are exceptions in the configuration to accept these more specifics for blackholing and they sometimes circumvent all the regular prefix filters. Some people apply these blackholing community filters even before any prefix filters or to I go indicator setting anything and the problem here is there are no validation for the origin of a community, cannot say who added the community on the path, if it was the original AS that is allowed to drop packets or blackholing prefix.
The second time type of attack we did was a traffic redirection attack. Now we have again two paths. One is a preferred shorter path, one is a longer path. In AS6 you see that there are two AS paths available, 5, 4, 2, 1 and 3, 2, 1, of course the shortest path is the one that is preferred. We assigned roles again. AS2 is now an attacker and starts to announce prefix tagged with a community to trigger part representing on AS3 so the path 3 is getting longer than the path over AS4 and 5 and traffic is then being redirected via the link of AS4 to AS5. So the traffic is rerouted and why is this a problem? We all know that we don't want to have other people defining where your traffic flow goes, but there are special things that can happen here, there could be malicious traffic tap there and AS2 could be forced to collaborate and do this attack using communities and the attack is only between AS4 and 5 and it's hard to spot that there actually is an attack between other ASes that are not the one AS that is announcing this community attack.
Or, it could just be a link that's already congested and somebody wants to completely block the traffic on this link and overload it. I'm sure you come up with a couple of other things that could happen there as well.
So what now?
The problem with the communities is that they have certain shortcomings. The notation of ASN value with the shortcoming is just convention. There is no defined semantics. They are used both for triggering of actions and for signalling. So you can't just say, well, I just filter all of the action communities because you don't know which community ASN action community. There is no cryptographic protection even with RPKI, this is not secure in the community values, and, as I have said there is no attribution possible because there is no integrity for who is setting a community on a part or on an announcement. The large community introduced the global administrator but they still have the same problems. The global administrator is the only person or only entity that defines the means of the value of the communities, so we still don't have a distinguishment between active and passive communities here.
So, this makes it a little bit of a problem to solve this, these issues.
But BGP communities, as they are being used, are not necessarily broken. This makes the problem so hard to communicate. The security usage of this system requires good operational knowledge and diligence and while I think that you all know in this room what you are doing with BGP, relying on it a global scale might not be the best idea. I think we learned this the hard way. So the question is, do we need less protocols and mechanisms to block misconfiguration from happening?
Some recommendations. If you read the RFC for the large communities, you will see them there as well. You should filter incoming informational communities for your ASN. You should publish community documentation. So please enable other people to find out what the communities you add actually do and what they can do in your network if they forward communities from their customers to their upstreams. A good idea would be to monitor and lock received communities so in the aftermath you can look what has happened. And probably you should talk to your downstreams and peers that they should remove communities that other people set forwarding when they forward prefixes to you, because it might not be desired that customers can trigger actions in your AS or your upstream. And if you provide a Looking Glass it might be a good idea to also display BGP communities there and have the real BGP interface and not just some website that gives some static dumps.
Another topic is the authenticity, as I mentioned several times now. So communities can be modified by every AS that is on the path. There is no attribution possible, no cryptographic protection. But we see operators still relying on a lot of community values, they are implementing routing policies based on the community values and they use it for various stuff. Large communities partially improve the situation, but as I have explained, in principle, the problem stays the same. So the question is, how can we achieve authenticity there or at least attribution to know who to call or to blame when something goes wrong?
The other thing that makes this a problem was the transitivity. And the question is, well, communities can help in the debugging. They add a lot of value, you learn where a prefix has been learned, which IXP this was announced. And they are an easy and low overhead communication channel. The question is, in today's Internet, if the problems that we see here are still worth the risk because they are often only used between neighbours or two or three hops away and should not need to be propagated through the whole Internet.
And monitoring, this usually is the solution. You must monitor what happens and then you know who is the culprit is really hard here because you know in BGP there is no global state and the route collectors we only have the data which is on the end of the path, because this already traversed 5, 6, or 11 ASes and all of these could change something. So referring modifications between ASes is almost impossible if you don't manually look into Looking Glass or script this to look into the state of every AS on the path in realtime. And again the meaning of a community still is known. There is no way of changes in the community.
So monitoring this in a global state really is hard.
Standardisation could help. Currently, there is only limited set of standardised communities. Many of the AS still don't implement these as well. I heard people saying standardising things could help attackers but I think personal security ops never works. We should standardise at least some sets of communities or try to get ranges of communities to say well these are active communities, these are informational communities, so you have a better possibility to filter action communities, for example.
And finally, the lack of documentation really makes it hard to research this topic. Some ASes document it in the WHOIS, so you have an interface you can query, but then there is natural language where you have just text explaining, well, if you add 5 XX we will path times to this AS. Some ASes document it on their website, which also is very hard to parse. Some ASes provide documentation only to customers. And some casts do not provide any documentation at all. We see the communities they react to communities but we don't know what they are doing and they do not provide any information. So this really is hard to understand what's going on here.
So, to summarise our findings. We see communities are widely used. They propagate through the Internet. They have foundation for many writing policies, but their use relies heavily on mutual trust to your peers and up‑ and downstreams. There is no authenticity and security in place. Attribution is impossible. They are hard to detect attacks. And one fun fact from our research, while our prefix hijacks were noticed and even publicly Twittered, our community attacks resulted in traffic reaction were not noticed by anybody. So the question is, if there are other attacks that were not noticed as well besides others.
If you want to get the full paper, it will be published in a couple of days. There is a preprint linked here. And because the PC likes cats, here is a cat. And I'm happy to take questions.
DMITRY KOHMANYUK: Thanks. So we have some time now.
AUDIENCE SPEAKER: Wolfgang Tremmel, Internet educator at DE‑CIX. You brushed a bit over the things that AS numbers are 32‑bit nowadays, and I wonder if you made any checks how often private AS numbers are used in the first part because that's a thing that's a real issue. Did you do any checks on private AS numbers?
FLORIAN STREIBELT: I did a check, they are used a lot, because I cannot attribute how many ASes are using this. On the AS path I see the private AS number. I cannot say how many people are using this but they are used a lot and for this research we had to filter them out. So we didn't look at the private AS numbers in detail, or we were not able to because we needed to find out where a community was inserted.
AUDIENCE SPEAKER: Rough percentage?
AUDIENCE SPEAKER: Nikolay. I also wanted to ask the private AS numbers because many networks seem to use them to do different stuff with the same numbers, so they use 65,000 and XX to prepend to an ASN and another uses to blackhole and another one uses to not announce to it.
FLORIAN STREIBELT: So the problem with this is we worked with the route collector data. So you get the AS path and the communities. And if I have a private community there or a community with a private AS number, I have no way of finding out who on the path added this and is using this. If I don't manually inspect it on Looking Glasses.
AUDIENCE SPEAKER: Have you tried to figure out how many collisions there are which cause opposite things with the same community?
FLORIAN STREIBELT: This comes to the other problem of lack of documentation. We tried to parse who is on websites and finally gave up because in WHOIS, you have natural language that it's really hard to process if you filter on things like path prepending or blackholing, you end up with garbaging output because people tend to format things differently and use different language to describe the same things. So, finding collisions in usage would mean I need a set of well use where they know what they are doing. And if I don't know if the community value A is used for blackholing here and for path pre‑pending there, because there is no documentation on the usage, I cannot find these collisions.
AUDIENCE SPEAKER: Rudiger Volk. Deutsche Telekom. Well, okay. I have ‑‑ I would have lots on this. First of all... my recent thinking on BGP peering realisations and responsibility is, well, okay, essentially you are responsible for the bilateral relation and the use of your code points for communities in the first place is a bilateral agreement on that single hop. And kind of, yes, an extension, if you know and if you communicate what you are doing beyond that like passing on some other communities, well, okay, that's fine, but the correct thinking, the responsible way of dealing with it is, you really should consider this as a bilateral thing and in that situation, it doesn't matter what collisions you have on any of the code points. In particular, on the code points with a private ASes that have been used and are being used in cases where people did not have sufficient code space for coding the required and wanted functionality with the large communities we fortunately get into a situation where we can deal with this and where, for example, your request I would like to have a defined code space for informational ‑‑ for informational communities and I would like to have something for requesting something, you can set this up and, by the way, I have been working on documentation method that actually gives you symbolic access to the code space which make things much easier to work ‑‑
DMITRY KOHMANYUK: We need a time limit for questions.
RUDIGER VOLK: I didn't even ask something...
DMITRY KOHMANYUK: Oh, so there was a prelude. We are kind of timing out on this session.
FLORIAN STREIBELT: The problem ‑‑ so peering is special, in peering you have a bilateral agreement, you have contracts, you can come up with regulations. But if you have a customer provider relation and you get ‑‑
RUDIGER VOLK: You have a bilateral contract there as well.
FLORIAN STREIBELT: Yes, but if you have customers of customers of customers announcing a prefix, they could trigger actions in the upstream of the upstream and this ‑‑
RUDIGER VOLK: My customer is responsible for what he is sending me and if he is sending stuff from his customers that he doesn't know about, well, okay, he is irresponsible.
FLORIAN STREIBELT: Agreed. Also we have people doing misconfiguration but still this is a problem.
DMITRY KOHMANYUK: I know this is a community discussion, but any remote questions from chat or...?
Thanks a lot, Florian.
I'd like to call to the stage Maria Apostolaki.
MARIA APOSTOLAKI: Hello everyone. I am here to present our work on routing attacks in Bitcoin. This is a joint work with my adviser. The names are here.
Routing attacks quite often make the news, you will have probably heard about this Russian high speed that hijacked large tanks of Internet traffic registered to Mastercard or other financial services or other Canadian ISP that managed to steal 83,000 dollars from a mining pool. The price gets higher. There was another BGP hijack that managed to steal thousands from my wallet. That's only the tip of the iceberg in routing manipulations. We found that hijacks are in fact much more prevalent in the Internet than we think to believe. In the Y axis of this graph you can see the number of monthly hijacks that we detected in each of the months shown in the X axis. As you can see, thousands of hijacks happen every month.
So the first question that we're trying to answer is, could routing attacks impact Bitcoin or other block chain applications?
Well, Bitcoin should be robust against routing attacks, right. After all, it's highly decentralised around the globe. Bitcoin is highly centralised from both the mining and the routing viewpoint. To give you the idea of our findings in this graph you can see the cumulative percentage of mining power as a function of network hosting it. Clearly, mining power is very centralised. For example, you can see that we only 10 networks we can find almost 70% of the mining power. And the same is for actual Bitcoin compliance and for the Bitcoin traffic.
So, all the centralisation allows for two kinds of attack. The partition attacks in which the attacker splits the network to half and then the delay attack. In this talk, we will only focus on the partition attack. Which is a visible attack, as the attacker has to hijack traffic first but it's also extremely effective, even in the network as a whole and it can also generalise to all applications that are using block chain and the Internet.
So, that's the outline of my talk. I'll give you some background information for Bitcoin and BGP. Then I'll talk about the attack in detail. And I'll finish the talk with counter‑measures, including our proposal.
So Bitcoin is a distributed network of nodes that establishes random connections. Its node keeps a ledger of all transactions ever done in Bitcoin. This is called a block chain. The block chain is a chain of blocks that is extended by miners. Miners have to use a lot of computational power for this and they are compensated for their efforts using block rewards. Because this is a risky process, they tend to collaborate forming mining pools and the mining pools are using gateways which are just regular Bitcoin clients to interconnect with actual Bitcoin network.
For example, this grey pool has two gateways, C and E. Now, of course, Bitcoin connections travel the Internet and, because of that, they also use BGP. And what's important to note is that Bitcoin traffic travels unencrypted and without any integrity guarantees. That means that any AS in the AS path can eavesdrop, delay or drop the messages. What's even more important to remember is that, even if Bitcoin traffic was encrypted, an AS in the path would still be able to drop the traffic.
So, I'll now talk about the partition attack in detail.
The goal of the partition attack is to split the network into two disjoined components so that no information can be exchanged between them. This is very serious for Bitcoin because it's a consensus protocol and if information cannot be exchanged between the two components they cannot reach consensus. Or in more detail, a partition is an effective denial of service attack as the transactions cannot longer propagate from one component to the other, it can also cause revenue loss as blocks that are mind within one of the two components will be eventually discarded, that's thousands of dollars. And it can also cause double spending attacks.
So let's see how it works. Let's assume we have this attacker, that is the middle AS, the AS in the middle, and she wants to create a partition denoted by the red line. Intuitively, what our attacker will do is that she will attract all traffic destined to nodes in the right and they will then drop the connection crossing the partition. To understand how this is done we will focus on node F, which is the green node in the right end of the slide, and we will explain how this will happen. I do understand that the next seconds will be super boring for most of you, but it's important to keep also people that are not super familiar with BGP hijacking.
So this node F has an IP, this IP belongs to its provider, which is AS6 and AS6 is responsible for creating a BGP advertisement that covers the IP of the green node. Then this advertisement will be propagated AS by AS until everyone knows how to reach node F.
For example, AS1 will reach it via AS7. Now, the weird thing is is that BGP doesn't check the validity of advertisement. That means that Anycast in the Internet can actually advertise any prefix. So, let's assume that the attacker advertises a longer prefix that covers the IP of the node F. Because routers in the Internet will choose the more specific maths, all ASes will choose to route their traffic via the attacker. So, suddenly the attacker gains access over the traffic to node F. This is what we call BGP hijacking and this is exactly what the attacker will do to split the Bitcoin network. They'll hijack all prefixes of nodes in the right, she will then drop the connections and, boom, the partition is created. Of course, it's not as trivial or as I just described. In fact, there are partitions that cannot be created and that's because there are connections that the attacker will not be able to cut.
Such connections are, for example, connections within an AS, because there we don't use BGP; connections within a mining pool or connections, private connections between pool because for those, the attacker has no idea.
But even in those cases the attacker can detect that the partitions she created is not effective and she can instead create a smaller sizeable one. To give you an idea of how this is done, let's assume that the same attacker will try to create this partition. Again, she is try ‑‑ let's also assume that there is a mining pool, the grey mining pool there. So, again, the attacker will try to hijack all traffic destined to the orange node there. She will attract the traffic, drop the connection crossing the partition and the partition is created but is not effective because the attacker failed to cut this connection, which is a still connection. In fact, the attacker would not be able to cut this connection because she didn't know about it. In this case, what the attacker can do is, she can monitor the connection she already hijacks and realise that is there is leakage of information between the two components. That would be, for example, that a block that is mined by the black nodes is advertised by the orange ones. In fact, the attacker can even detect the first node which has propagated those, which is the node that has the stealth connection. In this case this node will be node D. So the attacker can change the partition she wanted to create in this one which is actually visible. In the paper, we explained the algorithm the attacker would use to do this and we prove that using this algorithm the attacker will always manage to isolate the maximum feasible subset of nodes she is could isolate.
So we evaluated our attack in terms of practicality and effectiveness.
For practicality, we had to infer the Bitcoin topology, which we augmented with routing information. Using this, we found that splitting the Bitcoin to half, which is the worst case scenario for Bitcoin it's possible by doing only 100 prefix hijacks and that's negligible compared to the hijacks that happen in the Internet everyday. So give you an idea. In this graph, you can see the maximum number of prefixes that happened at once in each of the months showed in the X axis.
And we also evaluated our attack in terms of time efficiency, so do that we had to implement our attack in the wild, so we hosted a few Bitcoin clients who were actually connected to the Bitcoin network and whose prefix we advertised via Amsterdam. All the traffic to our nodes was routed via Amsterdam. Then we attacked our own nodes by hijacking their prefixes via the Cornell University. We measured the time that it takes for all connections to be routed via the attackers AS. We summarise our result in this graph.
Where you can see in the Y axis there is a percentage of connections that were intercepted by the attacker as a function of the time this took. It will take less than two minutes for the attacker to do this, so it will take less than two minutes for the partition to be created. And there we have to think that mitigating the partition would be much slower, it would be in the orders of hours, taking into consideration that this is a human‑driven process. For example, it took Google hours to resolve the Pakistan hijack.
So luckily there are counter‑measures for our attack, both short‑term and long term counter‑measures. The easiest to do, the short‑term one, would be to host all our clients in /24 prefixes that would reduce the attack service because the attacker would now need to compete with the original advertisement. A much better and long‑term counter‑measure would be of course to deploy secure routing protocols because that would not allow BGP hijacking in the first place.
And now you look a little bit like disappointed and I totally agree with that. This is not practical, I buy that. That's fine. Yet the attack is practical and it, in fact, does not only affect Bitcoin, it affects any block chain application that is using the internet. And that's not practical because cannot host all our Bitcoin compliance in /24 because that will increase our routing tables and we need also collaborations from all ISPs. And it would be also if we could deploy routing protocols but it's not there. So we need to do something.
What we advocate for is a network that would act as a secure channel via which all Bitcoin clients would be able to communicate even if the peer‑to‑peer network is partitioned by a hijacker. So we built this network which we call SABRE, this is just a set of special Bitcoin clients, and like to give you an illustration, if we had these Bitcoin topology and we wanted to implement SABRE on top of that, we just host a few Bitcoin clients that will be connected to reach other and to all other Bitcoin nodes in a way such that all Bitcoin clients would be able to talk to each other via the SABRE network.
So now let me tell you what is special about this relay network we are trying to build. First, the relays are strategically located in ISPs such that an AS adversary is enabled to partition the relay network itself and it's also very unlikely to be able to cut the connections from the relay network to the Bitcoin client. And the second reason why our relay node is special is that the way that it's implemented. This node is implemented.
So let me first explain how we host our relay node such that we secure them against hijack. The first thing we are doing is that we host those few relay nodes in /24 prefixes. That's super important because with this, the attacker would be able to attract only half of the connections she would attract if she were able only to do a longer prefix advertisement.
If the attacker was able to do a longer prefix advertisement, she would be, in this case, she would be able to attack all the ASes that are now red‑ish.
So, but if the attacker had to compete with the original advertisement because the relay node was hosting a /24, then she would only attract some of the traffic destined to the origin.
So already we did a lot by hosting our relay nodes in /24, now the next question is, who defines which advertisement the ASes would chose? Well that is based on economic criteria and also the proximity to the attacker and the origin. So for example, a customer of the attacker would not be willing to pay extra for using the malicious path.
So she would not be vulnerable to this attack.
This all properties is what we use and what we used in the second thing that we take into consideration in SABRE. So we choose our relay nodes and we locate them in ASes that peer directly to each other and have no customers. And if you think about that, what this buys us is that there is no attacker that can actually advertise a better prefix.
So, as an illustration, if we wanted to create a SABRE network, compose the two relays it will look like this. So it would be two relays that peer directly to each other and have no customers. But even in this case you might think that, well, this link is just a link, it can be cut, the peering agreement might be revoked or something might happen. And I totally agree. This is why we have the third thing that we take into consideration, which is that the relay network itself should be k‑connected. That means that, in order for this to be split, somehow K edges need to be cut, which is much more unlikely.
So, to to go back to the example. In this case, we would need to deploy an additional relay node which would be located in an AS that peered directly with the ASes of the other relays, in which case, even if one of the links fails or the peering agreement is somehow lost, we'll still have a relay network that is connected.
The final thing that we take into consideration is to optimise for the connection from the relay nodes to the Bitcoin compliance clients, like we are trying to locate our relay nodes in ASes that are topologically close to the Bitcoin clients, so in paths that are preferred by Bitcoin clients.
So, having explained how we position our relay nodes, I'll now tell you why ‑‑ what is this about our design that we think is special.
So, we built our relay nodes in a hybrid way of ‑‑ so we have a software and the hardware component. The software component is only responsible for validating the new blocks. And for updating the hardware component. The hardware component is implemented in a network programmable hardware you maybe already familiar with that because of the first tutorial that happened in the first day of the RIPE meeting by Aaron. So we implement this in v4 and this hardware compoent is able to serve the Bitcoin request. So it's able to answer with a block to multiple Bitcoin clients.
So now let me tell you why this is possible and it wasn't before. So, first, it's the fact that we now have this programmable hardware and the language that is flexible enough to describe something that is similar to the Bitcoin protocol.
Second, the state that we have to cast in the hardware it's updated very rarely. In fact, we have only one block every ten minutes, so it's very efficient way of caching.
And finally, Bitcoin as a protocol is communication heavy as opposed to computational heavy. We do know there are things that are missing, but still it's able to describe this part.
And now let me tell you why we think this is needed, where we want the hybrid approach and not just the software thing.
Well, first, we expect the relay network to be in high demand because all the Bitcoin clients will want to connect to it so it will have to serve multiple clients, so it's much better if it can serve it from the hardware itself.
And second, P4 allows us to have a lot of dynamic network defences like white lists and black lists and defend against spoofing, and so on.
With this, I finish my talk. If you were to remember one thing from that it should be that Bitcoin and all block chain applications that are using the Internet are vulnerable to routing attacks, that the potential impact is at least worrying. Deploying secure routing protocol is the best we can do and SABRE is a practical alternative. Thank you.
PETER HESSLER: Thank you very much. Please remember to state your name and affiliation and keep your questions short and your statements saved for the lunch break.
AUDIENCE SPEAKER: Maksym Tulyev from NetAssist. The question is, did you ever think about to make an isolated Layer2 network for Bitcoin nodes that will be not connected with Internet? Will it help?
MARIA APOSTOLAKI: So a Bitcoin should stay open, that means that you can have nodes everywhere in the world. So, I don't see how Layer2 could be used for that.
AUDIENCE SPEAKER: If this network is be able to connect to some nod, so over asked for this closed network to connect. It's not only connected by this closed network, but it's not kind of backup network.
MARIA APOSTOLAKI: Still, it will have to connect the whole ‑‑
AUDIENCE SPEAKER: Yes, it connects to Internet but it also connects to this Layer2 network. Will it help?
MARIA APOSTOLAKI: What I miss from that is how would it be possible to connect any node anywhere in the world ‑‑
AUDIENCE SPEAKER: Not any, but the majority of networks.
MARIA APOSTOLAKI: So the majority is distributed, right, so that's hard to be done. Like, I do agree with ‑‑ if we don't use BGP, we don't have this problem. But we need it.
AUDIENCE SPEAKER: So, is it possible to connect a core of nodes so this ‑‑ some K nodes to an isolated network and if BGP split will be, that it will be picked up by this, in fact, VLAN between these nodes.
MARIA APOSTOLAKI: Sure, just isolate as if they were in one AS, but still they are isolated so the core cannot help if it's isolated on its own. But you are right, if we don't ‑‑ if we have like a smaller network that uses the block chain that doesn't use BGP, it's perfectly safe.
AUDIENCE SPEAKER: Will van Gulik. I was wondering what actually is the average usage from one client that needs to keep the connection to the Bitcoin network?
MARIA APOSTOLAKI: What's the usage ‑‑
AUDIENCE SPEAKER: Average usage.
MARIA APOSTOLAKI: Per client that connects to SABRE network, right, this is what you ask?
AUDIENCE SPEAKER: Yeah.
MARIA APOSTOLAKI: Okay. That's minimal because it's a UDP connection and that's all it needs and it will maximum transfer one block which is 1 megabyte per ten minutes, it's nothing, it's just a connection...
AUDIENCE SPEAKER: Thank you very much for this talk, I think it's quite informative. Have you been also looking at DDoS attacks on nodes, but you could leave your routing infrastructure in place and simply focus on DDoSing some of the key nodes and releasing them slowly to try to create a 51% attack, have you been looking at that too?
MARIA APOSTOLAKI: So, the reason why we need the hardware component, one of the reasons that we need it is to actually defend ourselves against such attacks. For example, if you have multiple nodes that try to connect to relay, the relay is smart enough to understand that these nodes are acting weird, so they should act as regular Bitcoin clients. Only thing that they can do, so it's a hardware thing, it's much harder to be DDoSed. It can still be DDoSed if we have a volume‑based attack. For this, we don't protect against. Like if you have flood the whole network, then okay the relay would be off. But if it's a more sophisticated attack, then the system should be able to sustain it because it's only hardware based and it has widely some blacklists and spoofing detection and a lot of things.
AUDIENCE SPEAKER: I mean, we have seen a bunch of DDoS attacks, so it cannot be too difficult to DDoS 51% of miners and just take over the chain, right?
MARIA APOSTOLAKI: So, again, we need to also take into consideration that this should not be the only thing that is out there to help Bitcoin. Bitcoin is already a peer‑to‑peer network that works well enough such that we don't have such large‑scale attacks, but the relay network that we are building is supposed to help against routing attacks. So even if you DDoS that, if you don't have a routing in the same place, Bitcoin will not be affected.
PETER HESSLER: I am going to cut the line but we can continue with the people currently standing.
AUDIENCE SPEAKER: Andrei Robachevsky, Internet Society. So your approach with this, is it specific to Bitcoin like setup or can it be used in some other case to say mitigate routing security attacks?
MARIA APOSTOLAKI: SABRE basically has two components: A networking part and the application part, how the node is implemented. The working part can be used by any application that wants to defence itself against routing attacks, against hijacking, because it makes sure that the relay network itself would stay connected. Now the node, the way that we implement it and we use P4 might or might not be something that can used in our crypto currencies or in other block chain. Like, something in the top of my mind, if the ‑‑ if you are trying to use an application that is encrypted, P4 cannot do this already. So, it's...
AUDIENCE SPEAKER: Benno Overeinder. Andrei, just asked the question I wanted to ask but I have a second one. A little off topic, not specific for Bitcoin or the presentation but you mentioned 100 K route hijacks, route leaks every month. What's the source ‑‑ or how do you detect that? Do you get it from BGPStream? A little background, so I present also an academic approach to this, it's interesting, but operational didn't make a lot of sense so we detected all kind of route exchange route leaks, because what's in one region may be opposite in other region. Maybe we should have this discussion in another place, the Routing Working Group but I'm very interested.
MARIA APOSTOLAKI: I need to mention two things. First, of course, we cannot say from our side whether that was malicious hijacks or just mistakes that people did. But from our perspective, that's not ‑‑ that doesn't make any difference. What we wanted to prove is that this is practical; hijacks do happen, regardless of whether they are on purpose or not.
The second thing that I want to say is that the way that we infer the number of hijacks is not accurate. This is a huge problem and it's not ‑‑ it's a problem on its own, so we used, like, a similar heuristic to do this. I can explain.
AUDIENCE SPEAKER: Thank you. I agree, I totally agree. But it's more the academic or how can we improve on the route hijacks mentioning which are real, intentional, not intentional or just three peers that happened to move traffic between each other and, well, peer, peer, peer is considered to be a routing hijack, that kind of things. Thank you very much.
PETER HESSLER: All right. Thank you very much.
And last, we are going to have Daniel Karrenberg talk to us about remembering Jon Postel.
DANIEL KARRENBERG: Hello. I'm here because Jon Postel passed away on this day 20 years ago. And so, some of you may think, why should we remember this hipster who died 20 years ago? Well, I'll try to convince you in the next five minutes that we should remember Jon because his actions do influence what we are doing in this room today.
And we should remember Jon because his style should influence how we do things today. And I'll just take three random examples.
Jon, the RFC editor; Jon, the country code top level domains; and Jon and RIPE and the RIPE NCC. And I won't take more than five minutes.
Jon felt very strongly that the quality of RFCs, and he was the RFC editor for a long time, was very important. He was a true editor at the time ‑ he read all the text ‑ and he made sure that it was understandable and readable, and these are two different things. Before the stuff got published, and I can tell you from experience when writing RFCs in those days, at least in my personal back of my head was how Jon is going to read this so it better be good, otherwise questions would happen.
The style, and I'm there on the style, Jon did this editing not by suggesting new text or say you should write it like this, but by asking questions. And the worst answer you could give to a question about, you know, what does this mean by saying, hey, Jon, you know what that means, because when he asked the question, it was quite clear that it wasn't clear, the answer was not in the text, even if he might understand.
So, when writing documents, I think we should strive to emulate the style of the good RFCs of the time; make them readable, make them understandable, don't use pompous language. I think it would help us also with documents we do here.
Jon and the country code TLDs. In his role and the IANA, the Internet Assigned Numbers Authority, and it's now names and numbers, but it was then too, Jon did not act in a top‑down way, although the authority was in that particular title. To the contrary; he worked hard to enable true and inclusive bottom‑up structures. He would always ask, did you really talk to everyone concerned? Is there true consensus about that?
Jon's style was to hold IANA decisions until the consensus of the affected community was clear. Even when it would have been easier to just say, okay, let's do it like that, he had patience. And he would not yield to authority even when that was hard. So, in the country code TLDs, I can recall a couple of instances when governments, or parts of governments, were sending irate letters about, you know, this should be organised like this or this entity should run the ccTLD, and Jon would hold out as long as he could against this stuff just to make sure ‑‑ and made sure that an all‑inclusive bottom‑up agreement would be reached within that particular country. And I think we can learn from that.
And lastly, Jon and the RIPE and the RIPE NCC. Well, in the time when the Internet was spreading around the world, Jon was asked very often for answers and for opinions about how to develop the Internet outside the United States. And more often than not, instead of answering that question and giving his opinions or answers, he would just put the people in that particular region or in that particular country in contact with each other, because it happened quite frequently that they didn't know about each other that they were doing Internet stuff.
And as a consequence, Jon also supported the Internet number distribution in the way we do it today, regionally organised and through ISPs. And if he had not done that, we would today discuss Address Policy inside ICANN, just to remember ‑‑ just imagine that that would be.
But, we also did our homework, I mean as RIPE we made this decision easy for Jon by organising a well‑documented regional consensus and a bottom‑up structure before even going to IANA and saying delegate some numbers to us.
In true style, Jon supported us by asking us the hard questions: How will this RIPE of yours scale in the future? How will you raise the money to do the work? This kept us sharp at the time and allowed the RIPE NCC to serve as an example to other regions.
So, I hope I convinced you that what Jon Postel did influences our work today. We should remember him, learn about his style and emulate it. Thank you, Jon.
I have one more little thing to say. In order to sort of help us sort of remember, there are not very many people who actually interacted with Jon left, I think, and who knew him, so, in order to make it easier for us to remember I would ask the people who are here who knew Jon and who are willing to talk about it, to make themselves known by wearing one of these stickers I have here and Mirijam has another set. So if you are comfortable with that, and you want to help us to remember Jon, please see me and get one of the stickers.
DMITRY KOHMANYUK: Thanks a lot. I only tried to by e‑mail, it was years ago.
I have a few announcements to make and, yes, it's very important to know the people who are before us, you know, and to keep the transition.
Well, as lunch approaches, I'd like to remind people there is now a different kind of lunch, a Women in Tech Lunch. Everyone is welcome. It's going to be on the first floor. Then if you you go to the main ripe 77.ripe.net page you can rate every talk that you see, you can click to "Plenary" and rate talks. You can also nominate yourself for PC elections by mailing email@example.com, the deadline is 3:00pm today. Last, but not least, we have a handy app, which ‑‑ I'm not a big app fan, but still, if you go to the same web page, you'll see that ICANN for the RIPE Lab and there are folks outside that can help you, you can use that app to chat with other attendees and network with them.
So, with that being said, go to lunch, please, and we have another Plenary after that at 2 p.m..
LIVE CAPTIONING BY
MARY McKEON, RMR, CRR, CBC