Plenary session
16 October 2018
4 p.m.
CHAIR: Okay. So, welcome to your afternoon Plenary session. We will have two talks, three lightning talks, but before that we will have the candidates for the two open positions in the Programme Committee and Brian is going to give you a small introduction on how all this is going to work.
BRIAN NISBET: Thank you. I get to do this because I am one of the people on the Programme Committee who doesn't get voted on for ancient historical reasons.
We have seven people who wish to be on the Programme Committee, which is great, new people, new voices, more people than we have ever had before. They are going to get about 30 seconds or so, maybe 60, to introduce themselves. And then on the front page of ripe 77.ripe.net there is a link to their biographies and the piece to vote, so you can vote on that until Thursday and then we'll announce it on Friday.
All very simple and it's great to see interest and participation and, you know, the whole Plenary seems to be going very well. If you wish to get involved in it, there will be another chance next year ‑‑ or in May even, which is next year. And we'd like to thank Isobel and Leslie, who are the two members finishing their terms at this point in time, Leslie is wandering off, so she will not be standing again.
First up... first candidate in the alphabetical order [Asilreti Vaserie], who unfortunately isn't at the meeting today, unfortunately, due to some visa issues. Next is Dmitry Burkov.
DMITRY BURKOV: It's nice to see you. I will try to be short and don't want to waste your time. My first RIPE meeting was more than 25 years ago, and I was on Programme Committee, I never was Working Group Chair, just four terms as a board member. But I still think that my experience could be useful in such a role as Programme Committee member. Thank you.
BRIAN NISBET: So, Igor Glinka.
IGOR GLINKA: Hi, I'm from Russia and I am a lawyer, also I am a technician from Moscow Institute which are working on the government information system, and it is ‑‑ it's my first RIPE meeting and I hope I can supply a little bit more interesting for participants. So that's all I want to say. Thank you.
BRIAN NISBET: And next up, Maria Isabel Gandia.
MARIA ISABEL GANDIA: I am Maria Isabel, I come from Barcelona and I have been working with networks for the last 20 years or more. I have been working in the research community and the Internet exchange. I am also part of the Programme Committee right now but I would like to apply for a second term. And that's it, most of it. So... I would like to give my feminine view from the southwest of Europe.
BRIAN NISBET: Thank you.
MATTHIJS MEKKING: Hello, Matthijs Mekking. I am mainly a software developer, I have worked in ‑‑ I have been in RIPE meetings for a long time and know you want to have some more contributions to the community and as well academics and commercial background, I'd like to help this meeting, keeping up the good work for everyone from all kinds of regions and all kinds of different places. Thank you.
BRIAN NISBET: Thank you. And now Maximilian Wilhelm.
MAXIMILIAN WILHELM: I am from German. By day, I am an infrastructure architect, so I have an operations background and architect background, it's a non‑commercial wi‑fi networks for all the people who need it. I am a hacker of biorouting, so have protocol background, and I'm in the Programme Committee of the German FrOSCon so I hope I can give some more views to selection.
BRIAN NISBET: The eighth candidate, so, is Samaneh Tajalizadehkhoob.
SAMANEH TAJALIZADEHKHOOB: Hi, I am Samaneh, my name is not there because I just submitted my applications five minutes ago, but I am a security specialist at ICANN, started two weeks ago. I hold a Ph.D. in Internet security from research technology and I work in domain name abuse and Internet measurement and web security for the past six years. I am Programme Committee of some conferences and workshops and I know RIPE very well from some years ago by participating in the meetings and being involved in some discussions and research. I will be more than happy if I can be on the committee and be more active in the research.
BRIAN NISBET: Thank you. And last but not least, Stefan Wahl.
STEFAN WAHL: Stefan. I was here when the Internet was young, not stressing it's a little bit more than 20 years now. I am right now working with an Australian company called Megaport. And I have the position there of legal and compliance officer but still strong in the networking measurement stuff as I have a background in microelectronics and technical informatics, and that's it, keep it short.
BRIAN NISBET: Thank you all very much. The bios are up there, you can take a look. You have two days to vote, please do. Thank you for putting yourselves forward and I will now return you to your chairs.
CHAIR: Good afternoon, my name is Pavel Lunin and my colleague Francesca are going to start the next Plenary session and the first talk is machine learning with networking data.
KATHARINE JARMUL: Good afternoon everyone: So, today we're going to talk a bit about machine learning with network data, and the joke for today is how many machine learning engineers does it take to convince a room full of network engineers that machine learning actually works? Hopefully two, right. This is what I'm hoping for.
The reason maybe why we have to convince people that machine learning works is that a lot of times when I tell people that I work with machine learning they either look in awe as if I'm some sort of magician or wizard or they roll their eyes like, oh, my gosh, this lady, what is she talking about? This is because perhaps the hype around AI or machine learning. AI is a really fancy marketing term for machine learning but we use it because marketing is important.
What we can see here is this is the gotten a hype cycle, this is an agency that talks about what is hyped and what is real and we can see that last year machine learning is there marked in red, white a lot of hype, right, quite a lot of expectations and maybe not so much follow through.
But I ask, is it all hype? And this is because machine learning is definitely beyond hype. Right. So, what we have here is Cisco's threat analysis which one of my friends, versus on a valuer os helped work on and I remember when I first met her and she had just been assigned to this team and she said, I am a malware analysis and I'm working with all this machine learning people, what are we going to be able to do? And less than two years later they have produce this had product that is actively being sold and working on finding security threats within networks, mainly corporate networks I believe with Cisco.
I use this as just one example of how machine learning actually works in production in real life. It is possible, but we're going to go through today some of the different discrepancies and the things that you need to do to make sure you can do machine learning properly using network data.
So, hopefully you are a little bit interested now or just the fact you are here you are interested. How many people in the room are actively already working on machine learning with networking data? How many have dabbled in it maybe a little bit? And then it's primarily new. Great.
We'll first talk about is why we should do machine learning? So I'm here to help reinforce that machine learning is probably a better way than some of the ways that you are doing in solving current problems.
And this is because we can think of machine learning as essentially adaptive or dynamic pattern matching. So any time that we want to identify a pattern, let's say classified in some way, or find an anomalous behaviour, something like this, we can use machine learning to help us do so. And it is much more dynamic and adaptive than any hard coded rules that you are going to do. So what I see a lot of it machine learning going into replace, let's say, very brittle rule sets that are easily, especially from a security perspective, out dated over time. And this is because we can continue to label and train on new data and therefore adapt our pattern matching, right.
The second thing is you can still use your domain knowledge. So, the most important resource that you have within your team is your network engineers and is all of the knowledge, the domain knowledge, that you have on your team. What we have is a feature extraction method or step within machine learning which Andreas will talk a little bit more about, and this is where you can really apply the knowledge that you already have on your team to extract the right information from the data so that you can use it properly within the machine learning algorithm or model that you are going to use.
Finally, I want to point out that machine learning is not one‑size‑fits‑all solution. There are many different types of machine learning, and depending on your data and the problem you are trying to solve, there is usually an array of different choices for you. So, if one approach doesn't work or if you don't have the data to support one approach, there is usually more approaches that you can try. And for those of you who are familiar, you can have supervised, semi‑supervised, unsupervised and we can talk ‑‑ we don't have time to go into the details ‑‑ but if you want to chat about the differents between these different types of machine learning, Andreas and I will be available for you to chat with you throughout the rest of the conference.
Now that perhaps you are a little bit more interested in using machine learning, what actual networking problems can it help with? First and foremost, security, this is because it adapts well to this type of problem set. This is why probably most of the machine learning that you have seen in the machine learning within research is focused ‑‑ in networking is focused on security, right. And so we can do things like malicious traffic identification, malware identification and data loss prevention by looking at things by flow data. We'll go in into a case study outlining how that was done.
Then we have traffic classification. So, currently you are probably using some data analysis to do this. This can also be done in an adaptive way with machine learning. This can also be used to identify quality of service changes and so forth. You can use this essentially to define and segregate traffic based on anomalies and other issues which you noticed in your service.
Finally, machine learning itself is just optimisation, so if you have an optimisation problem that you can define within a datasets, then you can utilise machine learning to help you with this.
Some people say that it can be used for predictive maintenance, this is going to very, very much depend on how standardised your equipment is.
Finally, if you have logs you have been sitting on or other data that you want to start analysing perhaps for machine learning but perhaps a nice first step is data mining data analysis to look at what's already there in the data, start to figure out if there are patterns that you'd like to learn from or a statistical patterns that you'd like to model, this is a great way to first get started.
Now, Andreas will introduce a little bit more about DCSO and the work on machine learning with data there.
ANDREAS DEWES: This project has been done in cooperation with the DCSO at the beginning of this year and last year. For those of you who don't know, they are the German cyber security organisation, it's a spin‑off of other German corporations. It was founded at the end of 2015 and it provides both research and security service to immore security and is not solely focused on the profit, but also on sharing and improving best practices in the community. The project was done in the labs team which is headed by Andreas who is also here today, you are free to approach him.
And to quickly give you a motivation or the history of this project. Let me briefly explain one of the core services that the DCSO is offering. It's a TDH, or threat detection and hunting, which some of you might know as network security and monitoring and the idea here is that you have a customer infrastructure consisting of many different hosts that you want to protect from attacks from, for example, adversaries or malware or other threats, and the idea here is that you can monitor the network traffic in this infrastructure and then detect anomalies in that and use that to alert the customers about some problems in the network. And on the left side you see an example customer infrastructure typically this can consist of thousands or hundreds of thousands of hosts, and now what we want to do here is to monitor all of the traffic and this is done with a so‑called visibility point or network censors, typically many of those situated at different points in the network and they basically obtain a copy of the entire network traffic through the network backbone through a mirror port.
Of course, in order to do something, something with this traffic, to analyse it for anomalies or problems you need some kind of rules or patterns that you can look for. And there are several sources where we can get these from, we use both public and private threat intelligence to get both indicators, so like an IP address or a domain that in the past was associated with like malicious activity like phishing or malware, and patterns like, for example, specific signatures or pay loads that are occurring in a network traffic that be used to detect some kind of attack and we are having in our own infrastructure, a threat intelligence engine that aggregates all of this information, evaluates and then sends it to the visibility point who can then scan the network traffic for any of these indicators or patterns that have been fed into it. And if it detects something unusual, it alerts the back end and then finally an analysis sees this alert and decides what to do with it.
This is working very well, so this is in production at many large companies in Germany and has found many threats already, so you might ask yourself why would we need to add machine learning to that? The reason is that attackers are getting more and more sophisticated, so for example, on phishing, phishing campaigns we can see that am so of the IOCs, the domains that are being used are active for only say 20 minutes or several hours only. That means the window in which you can actually detect these things using this indicator based approach is getting smaller and smaller. And so we are seeing machine learning not as a replacement for this approach but as a compliment in a sense it can maybe help us detect attacks for which we don't have good rules or indicators yet just by using the behaviour of the attacker in the network.
In order to test that hypothesis, we are built like a POC project with one of our customers who provided his infrastructure for that. So what we did there is to run a modified visibility point that collected so call flow information from the network. And then we created another secure data processing infrastructure on our side to perform the machine learning and data engineering on this flow data. I'm going to go into the details that have in a second. So here I just show you the overview view.
You can see that we have a message queue that collects the flow, then transfers them to our infrastructure and before we can do anything else with them, we actually need to secure them because as you might know, flows contain for example IP addresses which are sensitive personal related information sometimes, so we have to protect them before we can use them for data analysis. And we do that by size anonymising them. We sent it to our infrastructure, so some custom‑written programmes. Basically group the flows that we receive from the sensor into groups for organised by hosts or by protocol and then these group flows are getting stored for short and long term analysis.
The motivation, we have two storage systems here that we want to both perform real or realtime machine learning. For example, we want to look at the flows from the last five minutes, perform the machine learning on them and then see the results immediately and feed that back into the intrusion text system and we also want to keep the flows for a longer time in order to train our models on them and improve the accuracy and for these two problems we have two different storage architectures based on Redis, on one hand, which is very good for realtime analysis, and Cassandra, on the other hand, which is very well suited to storing the sequential flow data in the long run.
Now, it sounds of course very easy to apply machine learning to flow data, but there is a lot of data engineering involved so here I want to show you a bit of data processing pipeline that the flow information from the sensor runs through before being analysed. As you can see on the left the information we get from the network sensor is in an eve JSON format. And it contains both the time stamp and information about the source and destination IP as well as other things like the ports and flags of the traffic. What it doesn't contain is any of the actual data that was being transmitted. So in that sense it's only metadata that we are analysing here.
As I said before, it's personal related and sensitive information so we need to protect it. We do that by pseudonymising the IP addresses and the names, what we receive is a new set of flows which is anonymous and we can easily process in our own infrastructure. Since like the sensor or network, like many different flows from all of the hosts that it sees, in order to use them in a sensible way, we need to first group them as well, so we need to have a way for example to group them by app Griffin protocol and by communications partners. And this we are doing in the so‑called inflow system which groups these flows together so that we get a time series of individual flow events that we can process afterwards.
Now, this is still difficult to feed into a machine learning system because it actually contains a lot of information that is not easy to process for at least the models that we have chosen here. What we need to do is to generate some useful features and in our pilot project we did that by picking two characteristics of the flow data. On the one hand the size of each flow, so how many bytes has been transmitted and on the other hand the waiting time between the two flow events.
And we then created like a number of packets so you can imagine we have like one packet for small flow events, one packet for medium sized one packet for large flow event and each flow event would be put into one of these packets. That's very nice for presentation like for a machine learning system because it's a vector that has only zeros and a single one in the packet that corresponds to the flow. This is very easy to digest for this deep learning based approach that we're using.
Of course other approaches and other methods to generate or extract features from these flows.
The models that we tested in our POC are based on deep learning, so we have several architectures that we built, it's kind of like there is a large zoo of different possibilities that you can use for performing machine learning on these flows, and there are two models which we have seen as very successful, so‑called recurrent neuro networks on the one hand and convolution on the other. So we tested both of these approaches and in our project we did not only train a single model to detect an anomaly but we trained a lot of different models for specific types of anomalies or classification tasks. And the advantage here is is that for each of these models the task at hand is very simple ‑‑ and so that you can generate a lot of high quality training data and, at the same time, it's also possible to train other models for, for example, specific detection of a given malware.
And in the end, what comes out of these models then is basically a binary information or a probability that says yes, these flows that you put into me are basically anomalous or I think they are normal. And again, to train the models, you need labelled flow data, so you need to tell the model which flows are anomalous and which are normal and there are various approaches for that. You can do it either by hand, which is tedious, or you can use it using other pre‑classified flow data which we in use flows that we classified using the indicator or rule based approach that we already had in the system.
And we have like a very preliminary result only now, we are hoping to publish a detailed analysis in a paper in.2019. What we saw is, we are able to classify in the short flow sequences here, for example, that's an example showing protocol classification where we can achieve an accuracy of 80% on a very short flow stat, which sounds bad, but you can imagine if you average that over time the position can be good here. So that means that basically, these models, if they work they can be used to detect anomalies and we can train them in a very flexible way to detect any kind of activity that we have labelled data for. So that's very nice.
What we also saw is that we need a lot of very large data sets to produce very good models and that's a bit of a problem because we saw that all of the publicly available data sets are very old or not very useful, and if you look at the reason for this, it's of course that, a big reason is the privacy concerns associated with collecting and sharing flow data because as I said before, IP addresses for example are considered personal identifiable information by the GDPR, so it's very difficult to share and process that data.
So what we did in the project to address these problems was, were two things. We developed a new method which is based on a cryptographic approach to pseudonymise IP addresses which allows us to format a prefix and format encryption of each IP address in a given flow which makes it difficult to identify the original address. And we also employed a scheme to and randomly shift the time stamps of each flow, which makes it again for an adversary more difficult to see an original flow came from.
We can do this not only with a single parameter set, but with many of them and since we are storing like all of this data in our backend database, an adversary would need to download a large amount of this data to find the flow information that he is looking for. In that sense by chunking up the data we can keep it secure and make it difficult for someone to steal it.
Now, Katherine is going to briefly introduce a tool that we have built for that purpose.
KATHARINE JARMUL: We are going to ‑‑ we actually released today, and it's available for you, that we have the ability to create secure PCAPs, so who here uses PCAPs or other network data that you potentially need to share with partners or somebody that you don't want to see the actual internal structure of your network let's say, or of the packets that you captured.
Well, here is one demo and you can see more about it at the URL below, but this demo allows us to take a packet capture and to pseudonymise it. We can use structures to preserve this so things like subnets and relationships between internal IP addresses are maintained and then this PCAP is. This can be depseudonymised at any time with a cryptographic key, so you can reveal or decode the information if necessary if you need to get back to the raw PCAP that you originally had, but this helps you ‑‑ I hope this helps you when you are securing your network data and thinking of how to share it. And this is really also essential whenever you do mining data analysis and machine learning so that you know ‑‑ so that you are protecting the actual internal data structures and that is if that data or that model gets to somebody else, that it is indeed secure.
So final summary is despite the hype machine learning is real. It can solve real network problems. And I hope that you feel a little bit inspired to get started on working with machine learning and network data.
The most important thing I hope is also shown is that you need to define a specific problem. And you need to think about the data engineering, you need to think about how you are going to architect the models themselves and these are all difficult problems. This is not simple. And this is why machine learning is not magic. It is actually just math and engineering.
So, hopefully that is a helpful thing.
And finally, that you can use pseudonymisation as a way to protect the private or proprietary data within your network infrastructure and this pseudonymisation can help protect either the model which might be attacked or also any data sharing that you do with yourself and other trusted partners.
So thank you very much for your time today. Andreas and I will take questions
Andreas is here for the conference so if you want to chat with him about DCSO and all the things they are working on, then please get in touch.
[CHAIR:] Thank you. Questions?
AUDIENCE SPEAKER: I am Shane Kerr, I work for Oracle but I'm speaking for myself. This is very interesting stuff. Have you thought about the possibilities that once you start using machine learning to detect attackers that attackers will try to probe your networks in ways that corrupts your training models?
KATHARINE JARMUL: So, I mean, attacks against machine learning models are quite common actually. But there is usually specific ways that you need to attack a machine learning model. Just simply introducing more anomalous traffic is not necessarily going to produce an error. Usually we would talk specifically about model extraction attacks or potentially adversarial attacks. I gave a talk on adversarial if you want to check it out at the last congress I presented at.
SHANE KERR: I guess that's kind of what I expected, which makes me think, is it important that you hide the models that you are using then or is it ‑‑ are there scenarios where an attacker can know the model that you are using and still be resistant to attack?
KATHARINE JARMUL: There is a few different ways. So if you are using deep learning and you are approached with a problem of adversarial examples, this is potentially like the most common form of attack. There is the few different ways to mitigate this. The most important way is to either and approach we call feature squeezing or essentially you just need to detect whether the input data is valid before it enters the machine learning space. So these attacks usual usually, they essentially ‑‑ a short way to describe it ‑‑ they essentially use the boundaries to present something that's not possible in reality, but that fools the model because it will accept it as valid data. If we have this validity check before it reaches the machine learning model, this is one potential way and there is a lot of other research on it.
AUDIENCE SPEAKER: Thank you very much. I am Abdulkarim from the University of York. I want to probably thank you for this presentation because each time when I talk to network engineers about machine learning, they are always say you don't know what you are talking about, so I am delighted you are talking about this. One question they always ask, and which I found really difficult is when do you stop the learning process? Because, learning process takes time, and at times they tell that you the network is so dynamic and when you have to learn you have to do the learning before you start exploiting whatever you have learnt. So when do you stop the learning process? Thank you.
ANDREAS DEWES: In general as I said before the idea is not to train like a single model but to have a lot of different models that you train on specific types of anomalies and you would normally continuously sort of train these models with the data that you are aggregating. Basically, you would, for example, have a labelled datasets that you generate using indicators or rules, then you would use that to boot‑strap a machine learning model and you would use it to detect additional traffic and anomalous and then you would have an manual process using analysis or maybe you could semi‑automatically do that, see if these predictions were correct or not and then improve the model with that. It's a combination I would say of like manually labelling data and using automated to produce pre‑labelled datasets. The training of the models they would sort of say to happen in parallel.
KATHARINE JARMUL: Sorry, just to quickly jump in, because I really like that question. Is ‑‑ our goal, right, as machinelearning engineers, is to create a model that generalises well. This response is like, well, you are describing like a very specific but small time‑sensitive problem, and what we want to do of course with machine learning is we want to find something generalisable that, as Andreas referred to in the slides, that we can notice in attack essentially before it's recognised by malware analysts or other analysts because it follows a similar pattern.
ANDREAS DEWES: It's important to keep them from over‑fitting those... as they tend to do that.
AUDIENCE SPEAKER: Thank you very much for the talk. I am Cyrus, I'm from Twitch. I sort of didn't catch your general thesis of approach in the talk. Can you talk a little bit about why you are going with deep learning and what is your general thesis in terms of we're approaching the security problem in this way, we're going to approach training in this way, the models in this way, etc.
ANDREAS DEWES: So in general we are also open to use other methods, we start with deep learning because it seemed an approach that hasn't been explored so much yet. I mean, there are other models for traffic classification like mark of chains and like statistical models. What's attractive about deep length to me is that you have like, compared to other models, a very high complexity which can be also problematic, but the models in that sense are capable of generalising, using a very large parameter space and you can basically accommodate many different signals in a single model in that sense you have like the models are more flexible, they are become harder to explain of course as well, but for traffic characterisation where you already know what kind of anomaly you want to detect or classify, explaining is less important than detecting something with high if I dealt in my opinion. So ‑‑ and deep learning is very well suited for these types of tasks, with this platform we are trying to build a system that allows other approaches to machine learning as well because we also saw that like deep learning is not suited to all of the traffic classification problems that we have.
KATHARINE JARMUL: One quick add‑on there is that RNNs, for example, are very good with sequential data so something like flow data over a period of time this works very, very well with that, it's much more difficult to achieve potentially some Bajan models but not all, so this is kind of the sweet spot is to use RNNs when dealing with sequential data.
CHAIR: I must close the mic. We have no more time.
ANDREAS DEWES: We will be around.
KATHARINE JARMUL: Thank you very much.
(Applause)
CHAIR: The next talk is called Technical Debt in Anycast Story.
TOM STRICKX: Hi. I am Tom, I work at CloudFlare, I am one of the network engineers, technically a network hooligan because I am a network software engineer which has this weird thing that I break things more often than the normal network engineers. I am a contributor at NAPALM automation and at Saltstack.
You can find me on GitHub, you can find find me on Twitter where I tweet about network‑related stuff as well as just other personal things.
So a small summary of CloudFlare. Most of you will probably know who we are and what we do. But, we're a pretty big CDN. We provide services for about 7 million zones or domains. We do authoritative DNS services. We're the biggest and the largest. We're also the fastest, we do about 30% of the Internet queries when it comes to that. We recently, about seven months ago, launched 1.1.1.1, which is a recursive resolver. We also have an IPv6 address which I don't know by heart. We have about 150 Anycast locations globally. We keep growing, we keep expanding our network. We're still ‑‑ we keep looking for new places to put our locations to improve the quality of the Internet of local people. We're in about 74 countries. We keep growing. We have a lot of networking devices. This might be routers, this might be switches, this might be all kinds of things.
So, what am I going to talk about today? I'm going to give a brief introduction on what Anycast is. What our technical is exactly because we're a company of eight years old, every company gets technical data, so I'm going to explain what our specific subset is of it. Configuration changes using SaltStack and then how we modern our changes that we did.
Anycast, all the things. Right.
It's a great thing if you know how to use it. So, Anycast basically what we do is we have a specific subset of our IP addresses that have been assigned that we announced globally in all our data centres, so we announce the same IP addresses in over 150 locations on both IPv4 as well as IPv6. So we rely heavily on the health of the BGP system as well as some intrinsics of the BGP system.
Our Anycast network is about 250 IPv4 prefixes, and about 15 IPv6. We keep expanding, we keep adding to them. And as I said, we announce them globally in over 150 locations.
But, as I said, we're a company of about eight years old now. We grow, we keep adding things, we keep adding features and products which means that at some point you need to realise that you make decisions at a previous time that are no longer really relevant or valid for the way that you are currently doing things. This is called technical debt. And we have this I think, most of the companies have that in their entire stack somewhere where this is this thing that you see I'd rather not touch this, because if I touch it, it will break.
We had that. Right.
This is a bit of our technical debt. This is an extract of the BGP routing table, the global table about three months ago. And as you can see, one of the prefixes that I'm showing there, we have prepends on there, two, to be exact, for this case. Prepends had their use, they had their use for us, because when CloudFlare was a small company, they weren't present in over 150 locations just at once, right. That's a very difficult task, ask Karen. We had a few tier 1 providers. So we didn't have the NTTs or the Telias or the level 3s. We had only about 10 PoPs distributed semi‑globally so what we did is we started prepending on our Anycast address to say make sure that we steer traffic to the right location. Because otherwise, for example, Japan traffic might end up on the west coast or the east coast of the US, which is detrimental to your latency, it's not something you want to see happen especially if you know that you have a PoP in Tokyo, we have a presence in Tokyo, why are you going to San Jose or Seattle? Like, this is straightforward, right.
What we did, we were prepending to make sure that the traffic that we wanted to hit in specific locations started hitting us in specifics locations, but we started growing and adding tier 1 providers. So the use case of adding these prepends started going down, like we didn't really needs to prepend any more, we started normalising this, so instead of removing them we started adding these prepends globally for all our locations on all our addresses. So on our Anycast addresses we basically added two prepends in every single location. And, I mean, this fixed it, right, this is now a consistent configuration, this is awesome.
Not really. This is this tiny thing called BGP hijacks, and the basic idea of this, it was wonderful to hear this earlier in the Plenaries, this is the exact use as has been mentioned before, is because we have an AS BoF of at least three times 13335. It's a lot easier for a malicious party to start hijacking our prefixes, especially if they are peered with enough networks that do not do prefix filtering. You can just start announcing any of our addresses and it's very likely that suddenly that path will be picked because now the AS bot is a single AS number instead of 3. And BGP, one of the major selectors for this, is the shortest AS path.
One of, like, several incidents happened, Working Group BGP hijacks, and has raised a bit more in the public view besides Job complaining about this constantly, is authoritative DNS targeting. So, for example, as mentioned earlier as well, is that the Ethereum hijack of the DNS resolvers so, route 53, one of the /24s of them was hijacked and the destination started returning malicious authoritative A results for Ethereum wallet. And suddenly, people lost money. Or one of the other issues that we have encountered is when we turn up new locations, we also have to make sure that we add those prepends to our Anycast addresses, because if we don't, well, then suddenly the entire traffic, our global traffic starts landing in a single location.
Unfortunately I don't think we have ever invented enough compute to do all of that single location, which causes problems.
So, we are heavily incentivised over the past year to analyse what's the problem, how can we fix this and how can we fix this as quickly as possible without causing any impact.
So, there is multiple solutions to, especially the BGP hijack question, is you have RPKI, which we're investigated in, which we're actively deploying in our network. Louis is going to give a Plenary talk about this on Friday, I think.
We can reduce the AS path right. Instead of prepending everything, we just remove the prepends and that's the end of that. We are heavily peered. We are present in a lot of IXs, we peer with a lot of networks, we have an open peering policy. If you want to peer with us, just ask us and we will. And that also reduces the chance of being hijacked and what we also do is /24 everything, because now nobody can announce a more specific, so that fixes the problem as well.
Unfortunately, as said, we have about 250 Anycast addresses, most of those are aggregated into /20s. If we were to /24 all of those we'd be announcing like 160 times more, so we'd be at 2,500 IPv4 addresses. The BGP table is big enough as it is, so I don't think we need to contribute to that by exploding it.
What we did is we picked, let's reduce the AS path. Let's just remove the AS prepends that we are not using any more, they serve no use. So what we did is, we developed this plan of let's do a staggered deployment in about six stages. So, we made sure that we were doing it on all our infrastructure, but we ‑‑ one of the advantages that we have as a company is, we have different payment plans for the service that is we offer, and one of those payment plans is you don't pay us. It's free. Which gives us this major benefit of testing deployments on our free users, like, it's a thing right, I mean...
So, initially, we did an initial test on unused prefixes and everything went fine and then we moved to the free prefixes, then we moved to the people that pays a bit of money. Then we went to the people that pay us a bit more money and then we went to the people that pay us a lot of money.
So we wanted to do this as quickly as possible because the longer this takes, the longer the impact to customers is possible. So that's something that we wanted to avoid. We had extensive internal and external monitoring. So this is the change that we proposed.
It's fairly straightforward, right. We just went to our policies, our export policies, and we just made sure that we removed the prepend.
That's what we did. We did a global rollout and we did it globally within two minutes.
For this, we heavily relied on SaltStack, this is an automation and orchestration platform we use heavily within CloudFlare as well for our network devices as well as our servers, it's used by both the SRE teams as well as the network teams so that gives us a shared pool of knowledge. It is an Open Source platform which is very important. It heavily used Python, Jinja 2 and Yaml as a data source. It's highly escapable. It's very fast. And it's vendor neutral, thanks to some features that were added as well as NAPALM. And we use it, as I said, across our fleet.
So, one of the things that we did before we rolled out the change is, we did pre‑checks, we wanted to make sure that the configuration that we're currently returning on our network devices actually matched the idea we had in our head of what our config looked like. What we did is, we modelled our config and our prefixes and prefix list in Salt and then used some additional features from Salt to write our own execution module that basically ran through the motions of, this is what the change would look like, and these are the things that don't actually match with the thing you said it should look like. So, for example, this is one of the cases where we noticed this, is one of our like prefix lists missed a /23. I mean, usually that's not really a big deal, but that's something that we should be aware of, right.
And we also noticed that we weren't announcing two prefix lists to our public peers.
This is the actual change. I'm not sure if this is very readable. So the config generation was entirely written in Python, we could do this for both Juniper as well as Arista devices. It ran concurrently, so it was done globally within roughly two minutes. So what you are seeing here is the generated diff on the device as well as the configure that we loaded on the device to make sure that any issues generating a diff, or Arista device are always easy to read, so what are the config that has been loaded on the device.
To make sure that everything was fine we used internal and external metrics. For external we used a lot of information we have available which is stored in Clickhouse or ‑ and then we used Grafana to visualise that. We used NetFlow data, SNMP data. ClickHouse is this great database management system, a bit like MySQL, MariaDB,
PostgreSQL, stuff like that. But unlike the other database management systems I have mentioned it's a column‑orientated database management. This means that it's a lot easier to start aggregating on specific column data, which was very, very good for the use cases that we wanted to do.
It's Open Source, and currently in the use case that we currently have is we insert about 100 gigabit a second in our data in our ClickHouse cluster, we currently have 3 petabytes of data on disk. This is an example of how we use ClickHouse. This is one of the databases that stores our flow data. And we can do that and what we can do is we generate this query and then we get flow information, right. It gives us the location. It gives us how many flows we are seeing, how many packets we're seeing and then it derives from that the rate of packets that we're currently seeing. So what we did is, basically, in this, we are assessing what are our biggest CoLOs.
We also use Prometheus, it's a time series database that we use heavily within the organisation as well as for network monitoring as server monitoring, it's a monitoring platform which is also Open Source.
So here you can see the rollout that we did. And you can clearly see when we started rollout, when it ended the rollout and this was all using external Looking Glasses.
So, Grafana, it's used for analytics, it does time series visualisation, it has a lot of plug‑ins, it's also Open Source. We love that. We use this to easily visualise this is the thing right, like this is the thing you need to look at.
These are the internal metrics. On this you can see the traffic that we noticed the moment that we made our change and on top you can see our actual traffic globally. As you can see, there wasn't any change in the amount of traffic that we were receiving at the time. But you can clearly see that there was definitely some changes in where traffic was going or where traffic was landing and where traffic was going out. So, as you can see here in San Jose there was a big drop of outbound traffic, but a lot of smaller locations around the US picked up that slack and just started doing more traffic.
So, for external metrics, we stored either in Prometheus or we just displayed it in a better format to the network engineer. What we used was RIPE Atlas as well as external Looking Glasses.
So what's RIPE Atlas? It's a great service provided by RIPE that offers a the low of global probes around the world where you can do pings, traits, DNS queries, you can do HTTPS requests from anchors. It has REST API and we could use that determine the routing before and after the change.
Looking Glasses. So we used two. We used routviews, which is a great project. Then we also used AS 57335 from Aaron there. He basically built that in 24 hours when I told him we're doing this thing, could you please help. And he did that. So we had Looking Glasses. We also used IX Looking Glasses. We noticed we needed more of those and especially more of those with REST APIs. We need to be able to ingest this data and not just do it typing in manually and then having to do 'I'm not a robot' thing.
So we collected all this information and put it in Prometheus and then visualised it using Grafana. This is how we used the ‑‑ scraped the metrics.
And this is how we inserted it in Prometheus. Prometheus has this Python client. It's straightforward to insert any kind of data into this database. And that's what it looked like. So, this is a Grafana visualisation of all the information we ingested from all those Looking Glasses and we saw here is the location, this is the prefix that you are looking at and this is the last hop AS, this is the peer that is giving us this information.
So this is how you can see that the information is combined. So this is an overlay image of the Looking Glass information as well as our traffic information. You can see a slight delay because the Looking Glasses aren't super realtime. There is a bit of a delay of about 30 seconds because it takes a while to scrape and insert the data into Prometheus. You can still see it clearly shows when the change was happening and also it showed us if there was any stragglers or if there was going to cause any issues. It made it straightforward to figure out what happened.
So, what are the take a ways from this? Like, are you really telling us that like, you did this and didn't cause any issues to customers?
As far as we know, there was negligible customer impact. Nobody wrote in. People likely noticed that there was an increase in latency, but it wasn't enough for them to actually mailing in and tell us what are you doing? We had route fluctuations for about 2 minutes and it took us about an hour to entirely complete the change pre‑checks and after checks included. And it took us two days to prepare all of this. We instantly detected and resolved minor issues both through our request data as well as through Looking Glasses. We were heavily reliant on Open Source tooling and community. Which is the most important bit. Organisations like RIPE have a super important role.
It brings the community together and that makes it stronger as a network.
So, I hope everyone kind of understood what I was trying to get at. If you have any questions, please do not hesitate to ask. Otherwise, thank you.
(Applause)
AUDIENCE SPEAKER: Hi. Thanks for representation, very interesting work. I also work previously with Anycast. And I think it would be terrified to do exactly what you did. It's very scary. So, I like your approach to what you did like, you just changed your routes first and some clients and then gradually you moved to the other clients later on. We have done some work, not me, but people from the group that I work with and all the researchers, and they developed a tool that actually allows you to test how Anycast is going to change when it changes your routes for everybody. And called ‑ Booter ‑‑
TOM STRICKX: Yeah, he worked with us.
AUDIENCE SPEAKER: That one allows you to make an informed choice before ‑‑
TOM STRICKX: Unfortunately, we rolled out the change I think two or three weeks before the thing hit production inside our network. So, he was a bit late unfortunately. Otherwise yes we definitely would have used that. It's a great project, super interesting. And we definitely would have used it but unfortunately it wasn't production ready by the time that we wanted to.
AUDIENCE SPEAKER: All right. Cool. Thanks.
CHAIR: Any more questions? Which is good, because we are running slightly over time. Thanks, Tom.
(Applause)
Next up, we have three lightning talks. We start with Vesna and she will present us the results of the Quantum Internet Hackathon. Enjoy.
VESNA MANOJLOVIC: Hi everyone. How many of you have been to some of our previous hackathons, RIPE NCC hackathons? Let me see... good to see you guys.
It's mostly guys. So my name is Vesna and I'm here today to talk about the results of the Quantum Internet Hackathon. First, I want to start with expressing my gratitude to everybody who took part in that hackathon, which is primarily the QuTech department of the Technical University in Deflt. They suggested the topic, they brought the quantum expertise, they made this happen, together with Juniper who was our logistics sponsor, RIPE NCC who did the organisation in the back office and the people who are present at the location, all the participants that were there.
TORE ANDERSON: Of course all the families that also took part in supporting everybody who is away for the weekend having fun and coding and they had to stay home and cook and clean and then welcome them back and then nurse them again to come to this meeting today. So thank you very much.
What are the hackathons? Well, there are several definitions as well as there are several definitions of a hacker. So what we mean is, not break into each other's computers and we also do not mean get some developers to code for you for free or for pizza and then lock up that code and don't show it to anybody. No, we do the opposite.
So we actually want to share the results of all the work that is done, so we make the code to be deliberately free software and Open Source.
And we also don't offer any monetary prizes. So the best thing you can do is get a lot of stroopwafels.
So, we have been organising these hackathons at the RIPE NCC. We started by wanting to bring together the network operators, the researchers and the users of the RIPE Atlas and that worked so good we changed the topics into something more generically interesting to the RIPE community, for example tools for the Internet exchanges, IPv6, DNS, routing, and then we went to something more exotic as quantum Internet. Now, this was a super, super science fictiony topic and we didn't really know how well is it going to work out, and it was really wonderful, so it happened the weekend just before the RIPE meeting in a different venue, not in the Okura Hotel. We had 42 people that were supposed to be there, some didn't make it but I like that number so I just use it everywhere. And we ended up having 8 projects. We used up a lot of stroopwafels, we got T‑shirts and we had a lot of stickers. So the goals were slightly different this time. Since the quantum Internet is a thing of the future. So it's not so much of an interest currently for the RIPE community, but we wanted to combine the strengths of the RIPE community, of the actual network operators with the specialists in the quantum technology that came from initially University in Delft, but basically there were people from 20 different countries in Amsterdam working on this topic and that was a lot of creativity, a lot of combination of skills, we also had people from legal background, we had designers, we had students, and it was great to put all these people together in the same room and work very intensively and come up with the creative solutions and this cooperation is going to continue in the future. We don't know exactly when and how but you will hear from us.
So what did we want to work on? Well, we didn't actually have the quantum machines on the premises. So we worked on the simulator. And for those that are technically inclined, this is the architecture of that simulator and most of the challenges were actually based on using the simulator, improving it, finding interesting uses for it, and so on. And there are several other challenges suggested.
So, the resulting projects. At the end of this presentation, there is a slide for each one of these projects detailing technicalities and so on. I will not go into that because I can assume that not many people in this room would know what I'm talking about and I also wouldn't know what I am talking about, so I will spare you that.
But, kind of broadly speaking, and these ‑‑ this is the random order of the projects that we had. So, people worked on the quantum consensus. So, they implemented certain algorithms for the leader election. Their goal was the quantum crypto currency and I thought they stopped too short, they should have added a few more buzz words like quantum crypto currency IOT machine learning, but they just stopped there.
The next one is even more of a mouthful, it's advertising entanglement capabilities in quantum networks. This is a theoretical work. This is going to be submitted to Internet research task force as a draft by Friday this week, so if you want to join, there is a mailing list and you can see what they are going to submit there.
There is a project about visualisation. There was a project called Next Generation that actually sent 12 pull requests to the simulacron code and they were accepted during this weekend so those were great improvements to the simulator that we all used.
There were improvements made to the QChat, another application, so the actual chat capability over the quantum network.
The digital signature was also deployed, and two more projects that that are really unpronounceable.
So all of them are already on GitHub, so the code is there, the slides are there and if you are into that, you can go and see it. These are some images of what we went through. So, there was a lot of flipchart activity and we also exchanged many stickers and used a lot of stickers on those laptops too.
So, what can you do? How can you take part? Well the code is there, just use it. You can talk to me if you want to help organise one of the future hackathons with the RIPE NCC or talk to the QuTech if you are into quantum business. And we publish the list of event of the non‑commercial hackathons also on RIPE Labs, that was requested at the last RIPE meeting and I am happy to report that we implemented this.
If you want to read more, it's on there.
So, that's the end of my talk.
(Applause)
CHAIR: Comments, questions? No, okay. So we don't have questions, then. Thank you so much and you told us everything we wanted to know.
Then next up is Andy. Who will tell us 8 ways network engineers use Snabb, which is supposed to be a fast, easy‑using packet‑switching tool kit.
ANDY WINGO: Hi. Thanks for having me. My name is Andy. I work on the Open Source project called Snabb. And so in this talk I'd like to go into just a little brief presentation and then eight quick examples of things that folks are doing with Snabb.
Snabb is what we call a user space networking tool kit, which means it's something that people can use to make programmes that run on commodity servers that process packets in a fast way. And so, if anyone has a Linux system, they can just go and clone and build and you won't even have time for coffee because the build completes in less than a minute or so. So I wanted to just give a brief intro of things that people are going to ‑‑ that people do with Snabb.
I'll start with the dirtiest and get to the more production‑ready. One easy thing that we ‑‑ one common thing we do with Snabb is just bang a bunch of load on a system. That's our packet blaster app, you just run it just like this. You give it a PCAP file of packets, this is one of the ways you can run it, and you give it a PCI address and that shows that this is a user space application in the sense that it tells Linux to forget about that device and it does everything in user space and that's how it gets fast.
Because we write the drivers ourselves, we can do cool things like fill in the NIX transport scripters with a bunch of packets and never retiring them. I know it's impossible to read there, but all these little lines here indicate that we are pushing 64‑byte packets at 14 million packets a second on 20, 10 gig interfaces, which is cool.
Also down below you can see there are just two little green lines showing we're using just two cores, one per socket in this case, which is fun stuff.
Additionally, we are ‑‑ we use Snabb to test systems, for example to find the no‑drop rate on some kind of system you are using, which will take some sort of load and then bisect it across your machine's capabilities and finding what's the point at which we get all the packets back. And then the simple built‑in test is, are we getting the packets back? But additionally, we can shell out to some script to say check registers or something. So it's a nice useful utility.
Additionally, still saying on the sort of packet‑pushing and testing side of things, we have the sort of the load use case, specific load generation. So in this case we're going to be generating packets that are specific to a lightweight force use case. Here I give the indication as just a help. I'd like you to run it and see, there are quite a few options there. You can programmatically generate traffic. Think of it like Scapy, but not as flexible but fast enough to run in realtime. That goes for packet generation and also for packet processing.
Getting a bit more to things more on the production side, there is a Layer2 VPN using RFC 4664, I can't remember the shorthand abbreviation. And this one was deployed by an engineer at Switch, the Swiss academic ISP, because it wasn't an RFC then. He was still developing it. He was developing a service that wasn't even for sale from any vendor and he wrote this himself to solve his needs and it's in production. And you can check that out. It's in a branch there.
Additionally, there is IP Sec implementation and its author is somewhere here. If you want to talk IPsec, he is over there. And that's, thankfully, funded by the NLNet foundation.
Just as a little pause here, all this is Open Source, you can check it out and run it, no licensing fees, you can give it a go. And additionally, there are a number of companies that work on it and feel free to talk to me afterwards, I'll point to you a bunch of them.
Network monitoring as well. We have an unsampled NetFlow implementation, it gets about 5 million packets on a core or so. To scale beyond that you need to do some parallelism, but we need ‑‑ that is totally possible and one user has done this. You can additionally customise the information you take, for example attach AS numbers if you have a nice way to ingest that.
Many of you saw at RIPE 76 in Marseille a presentation on a lightweight 6 over 4 border implementation. Definitely check out that presentation.
And additionally, there is a small firewall that can also do packet inspection and extraction of interesting information.
So those are eight quick ways that network engineers are suing Snabb in practice. Feel free to find me afterwards and pick my brain.
Additionally, we have a talk that goes deeper into Snabb and how it's built up and what components are built in and how can you extend it ‑ Thursday, somewhere between 4 and 5:30, in the Open Source track.
So, you should definitely give it a go and questions welcome.
CHAIR: Thank you, Andy.
(Applause)
Do we have any questions about this? Okay. Then thank you very much.
Next up we have Ignas, who will tell us about the current state of VX LAN.
IGNAS BAGDONAS: Hello. So, continuing on the festival of VXLAN that is happening since the beginning of the meeting, RIPE 77, as you would see and you may see the popularity of who prefers how to name that.
It has been around as a technology coming to be close to a decade. What can be wrong? Everyone is is just happily deploying new shiny boxes with that. But, first, let's step back a little bit into the story.
VXLAN started at one particular vendor's internal solution to the requirements of one of the product families that they had at the time. It does not fully IETF developed mechanism in a sense that it was an individual submission. Somebody wrote up something and it was at a time it was mostly for the commenting the current way of thinking. It is not a product of IETF Working Group because the Working Group which was focussing on those technologies did not exist at the time.
And the actual problems, which will be described in a moment, were known at the time and that was a conscious decision. That we are designing this thing for a very bounded use case in a closed environment and therefore we're happy about that.
So, what actually happened is we ‑‑ well it was an accident but a successful one. It got used and abused in many ways and today we have what we have.
There are three large areas of the problematic aspects with VXLAN. So, first, it just does not have any indicator of the pay load that it is carrying. It doesn't have any ability to indicate that what is being carried is not a client pay load but something else and the problem which prevents solving those two is an ability to extend that.
Looking into the details. Protocol identifier. Initially VXLAN was defined to carry ethernet frames but it happily does that. And the answer to the problem what if I want to carry multiple pay loads, just use parallel nodes. Certainly from the technical point of view, that works. However, the limiting factor is not in the identifying space which goes to many millions but in how many tunnels or tunnelling points the platform can support. So if I want to run two or three sets of parallel tunnels, that effectively limits my platform capacity much faster than I will limit that on a protocol level.
Then, whatever you put into one end of the tunnel emerges on the other. And it is assumed that all pay load is client generated and client consumed. So, if I want to run something which measures the performance or the liveness of the VXLAN tunnel. There is no easy way to do that. Certainly it's possible, but that means that the client needs to cooperate and this is assumed. So, if you are providing just a set of tunnels you need to coordinate with your clients how you will terminate the liveness. So basically this precludes the usage of the big portion of IP OAM technologies and mechanisms.
And just to top that, it's not extensible. All the fields, while not all of them are used but all of them are pre‑defined and therefore cannot talk about vendor interoperable extensions. Yes, there are VXLAN extensions and you might be happily using that even not knowing why your favourite platforms do that. If we are talking about vendor interoperability, it just does not exist.
Recognising that this is an important use case, IETF started looking into this problem more, in a more coordinated fashion. There was a dedicated Working Group, network virtualisation over Layer3 created, and it started looking into, say, more generic and more end‑to‑end approach of that.
First, this needs to be extensible. It needs to be hardware‑friendly, it needs to be friendly to environment over which it is used. It's not only a closed data centre which uses this type of technologies; it's much broader than that. And what we had briefly touched this morning is the security aspects which did not exist in the original VXLAN. Some of it needs to be accommodated for that.
What that Working Group did: There was a multitude of proposals, some fancy, some really funny, and three major ones. The direct backwards compatible expansion to the VXLAN which was considered to be too limited and just because of the backwards compatibility it was considered not to be extensible enough. The generic UDP encapsulation which in did too much was too flexible and hard to implement, we're talking about a hardware implementations, not only software. Software is easy. Trying to implement the flexible parts for the protocol which doesn't have regular structure is not a trivial task if we are talking about the hardware implement. Geneve, it's not an abbreviation, it's a name of a protocol, the realtime behind that stands for generic network virtualisation encapsulation, which was considered to be just the right pick.
So, what we have today is, Geneve is a successor of the VXLAN encapsulation. Very briefly what that protocol allows you to, it has extensibility in terms both from a future, from a protocol perspective and from the vendor perspective. There is an ability for vendors to define their private extensions in a way that it doesn't clash with others, which is by using PCI identifier name space. It allows for possibility of authenticating the header itself, therefore you are described injection attacks will be a little bit harder to implement. It also allows ability to actually sign and encrypt the pay load itself if really needed. It has mechanisms for indicating what is being carried inside and whether that's a client pay load.
In summary, that works just like VXLAN should have been done from the very beginning if we knew all the use cases and potential problems that we will see while we are deploying that.
So, what is important from a design perspective? First, this is only a data plane component. So if you have your environment that controls the tunnelling point. The control plane stays the same. You just need to make it aware of the new encapsulator type and that's it.
From the perspective of the ability to use a special OAM and validation mechanisms, now you have ability to use what IETF has developed a set of technologies, BFD and PLS ping family, all of that is compatible and can be used.
The current status of implementation:
Most of the, say, credible relevant vendors do have hardware components, either shipping or sampling. System vendors are catching up. The majority of them have announced that products will be available really soon.
Now, the remaining part is that operations community and those dealing with the design aspects need to be aware about these changes.
In summary, do not panic. VXLAN is not going away; it's just starting to go away. And all your new shiny routers which you bought yesterday, they will still continue to be applicable and usable and will happily forward your packets. What is important is that if you are starting to think about the new designs, overlay designs in general, seriously consider looking into something else than VXLAN. You might regret the decision if you make that today. Just because of inability to put the functionality which I might discover that you need the second day when you start operating your network.
That's it.
(Applause)
CHAIR: So we start with Peter.
PETER HESSLER: Has this been published as an RFC?
Not yet. It is in a stable process. It's in the process of being published. But the specification is stable enough that the component vendors are implementing it.
AUDIENCE SPEAKER: Tom Hill, Bytemark. I remember feeling very unsympathetic when certain bloggers were trying to explain that MPLS was far too complicated for them to run in data centres and that VXLAN was heralded as the new simple protocol that they needed. And it did just exactly what it had to do. No more, no less. What I'm quite sort of confused about is as to why we would add on to this when we already have MPLS transport, and if you need these things that you are proposing to put into Geneve, why not sort of pressure your vendors to actually allow you to use the MPLS tools that we use in service provider calls?
IGNAS BAGDONAS: Right. So, if you want a short answer, you certainly can. You are not forced to. If you want a slightly long answer, and if you have three hours, so, you know, what MPLS is and how that works. Well, I think I also have some clue how that works. Do you believe that majority of not necessarily tier 1 players in general term understand how that technology works and what is needed in order to build the connectivity product which can be built by something slightly a little less complex and at the same time slightly more broken. MPLS is a perfect technology as such. However, the cost of understanding how it works is one aspect. The cost of vendors implementing that in their products and in their product families which are targeted for your favourite use cases is a second aspect.
And while the first is more of the technical discussion, the second is purely product positioning. There is no technology fight, there is no technology battle here, it's purely a business decision of particular vendors. How they see their addressable market being easily achievable in general if they go one technology path. You can use any of the tunnel technology which you like, a recommendation would be to use the one which your operations people also like and understand that.
And it's probably counterproductive to try to push one lead out of that. MPLS‑based solutions will be there and users who will be deploying that in ‑‑ even in the same scenarios. There will be others who will be deploying, for example, IP‑based only tunnels. There will be others who will be deploying VXLAN derivatives and something like that. The answer is not only the technical one. The answer is mostly administrative one. Is your environment in which you are in, are your operations teams ready and able to operate what you design? Well, designing something on a wide board and saying we will use this new shiny protocol is just easy, running all of that later is a little bit more complicated.
I use just five minutes of three hours. We can continue that afterwards.
CHAIR: We would prefer the five‑minute version.
Any more questions because we still have four minutes for questions. Pavel?
AUDIENCE SPEAKER: Just a little comment on this. But the problem with MPLS ‑‑ contraplane ‑‑ I'm sorry, I am Pavel from Scaleway.
And the complexity of the MPLS is just the same or even slightly less than the complexity of EDPM, VXLAN and all that stuff. So it's really comparable and what we are doing to today in our data centre is completely the same as we have been doing for ages on the backbone. So it's not about complexity. I agree that it's rather commercial and marketing.
IGNAS BAGDONAS: Right. Do we have another three hours? So yes and no at the same time. First thing, you cannot compare MPLS in general to VXLAN in general because those are two very different things. One is a set of control plane and data plane encapsulations, the other is just a [following]‑on mechanism. If you want to compare you get all the complexity, your BGP, your favourite VPN and all sorts of other things. And at the end, the stack that is on the wire and the stack on the control plane side looks not the same but equivalent. The difference is what exact components are you using, how they are implemented and how your operations, both operations teams and the tooling is ready to use that. That is the main difference. And of course you need to factor in the practical vendor aspect. You might design something from an architectural point of view which works perfectly, but if none of the vendors implement that, you are left with nothing.
CHAIR: Okay. Then, thank you very much.
(Applause)
And, with that, we close the session. As always, please remember to rate the talks because us, as a Programme Committee, we need some input on whether you like the talks that we selected this time, and as at the beginning of the session, we got candidates for the MTPC seats. Please go to the website and vote. You have two votes, please use them.
And now we have half an hour of break. Afterwards, here, we will have a BoF by Shane and we can talk to him about what we think about our community, how we discuss about our community and what are some of our future ideas for our community.
(Coffee break)
LIVE CAPTIONING BY
MARY McKEON, RMR, CRR, CBC
DUBLIN, IRELAND.