Dirk Kutscher

Personal web page

Archive for the ‘Blogroll’ Category

Reflexive Forwarding for Information-Centric Networking

without comments

In most Internet (two-party) communication scenarios, we have to deal with connection setup protocols, for example for TCP (three-way handshake), TLS (three-way key agreement), HTTP (leveraging TLS/TCP before GET-RESPONSE). The most important concern is to make sure that both parties know that they have succesfully established a connection and to agree on its parameters.

In client-server communication, there are other, application-layer, requirements as well, for example authenticating and authorizing peer and checking input parameters. Web applications today, typically serve a mix of static and dynamic content, and the generation of such dynamic content requires considerable amount of client input (as request parameters), which in results in considerable amounts of data (Google: "Request headers today vary in size from ~200 bytes to over 2KB.", SPDY Whitepaper).

When designing connection establishment protocols and their interaction with higher layer protocols, there are a few, sometimes contradicting objectives:

  • fast connection setup: calls for minimizing the number of round-trips;
  • reliable connection and security context setup: reliable state synchronization requires a three-way handshake); and
  • robustness against attacks from unauthorized or unwanted clients: could be done by filtering connection attempts, by authentication checks, or other parameter checks on the server.

The goal to minimize the number of round-trips can contradict with robustness: For example, in a dynamic web content scenario, spawning a server worker thread for processing a malicious client request that will have to be declined can be huge resource waste and thus make the services susceptible to DOS attacks.

These are general trade-offs in many distributed computing and web-based systems. In Information-Centric Networking (ICN), there can be additional objectives such as maintaining client (consumer) anonymity (to the network) to avoid finger-printing and tracking (ICN does not have source addresses).

Current ICN protocols such as CCNx and NDN have a wide range of useful applications in content retrieval and other scenarios that depend only on a robust two-way exchange in the form of a request and response (represented by an Interest-Data exchange in the case of the two protocols noted above).

A number of important applications however, require placing large amounts of data in the Interest message, and/or more than one two-way handshake. While these can be accomplished using independent Interest-Data exchanges by reversing the roles of consumer and producer, such approaches can be both clumsy for applications and problematic from a state management, congestion control, or security standpoint.

For RICE, Remote Method Invocation for ICN, we developed a corresponding scheme that addresses the different objectives mentioned above.

In draft-oran-icnrg-reflexive-forwarding we have now provided a formal specification of a corresponding Reflexive Forwarding extension to the CCNx and NDN protocol architectures that eliminates the problems inherent in using independent Interest-Data exchanges for such applications. It updates RFC8569 and RFC8609.

The approach that we have taken here is to extend the ICN forwarding node requirements, so in addition to the general state synchronization problems, this Internet Draft raises the question of evolvability of core ICN protocols.

Discussion on the ICNRG mailing list.

Written by dkutscher

April 3rd, 2020 at 5:06 pm

Posted in Blogroll,IRTF

Tagged with , , ,

Back to Humboldt — or How to Organize your Teaching in covid-19 Times

without comments

Many university-level teachers have switched or will have to switch to online teaching and coaching. My university switched relatively seamlessly a few weeks ago already, and I have a received quite a few requests for advice, so let me share some thoughts here.

The TL;DR summary:

  • keep calm and carry on;
  • understand your objectives and teaching methodology;
  • balance technology and didactics concerns;
  • avoid tool chaos;
  • use existing infrastructure;
  • understand scalability requirements and infrastructure constraints;
  • record everything;
  • leverage new possibilities;
  • and if you have pick one online teaching tool, use BigBlueButton (see below).

First of all, it's interesting to see how different universities in different countries approach the covid-19 crisis. Some have switched immediately to online teaching (or just extended their already existing online courses). Others announced extended Easter breaks, and there are even discussions of just canceling the summer term.

While the prospect of Corona holidays (or just more time for other work) may sounds attractive, I would strongly advise against it for two reasons:

  1. Extended breaks, suspended periods of teaching etc., will most likely result in more stress for everybody (professors, students, admin staff) later. In a situation with many uncertainties that may well hurt us more in the end.
  2. The lock-down (in whatever regional variant) is necessary, but it's obvious that you cannot lock down everything (hospitals, food production etc.). Every social and business activity that is locked down will hurt society in some way. There are some activities that cannot continue right now and absolutely have to be suspended to avoid community transmission, which is causing enough problems (small shops, artists but also larger scale factories). Luckily, there are some professions that can be re-organized and continue in some way -- university-level teaching is one of those. These activities should continue just to minimize societal damage.

In my university, the term started on March 1st. Luckily, the executive leadership had been quite up-to-speed regarding covid-19-related measures in the weeks before in terms of communication, sanitation, travel advice etc. When the federal state of Lower Saxony in Germany announced the suspension of presence-based teaching in universities in schools on March 13th, it did not come unexpected, and we continued most courses online in the next weeks.

Obviously, not everything went perfectly, and there were a few lessons learned that I will summarize in the following. I do research and teaching in Networked Systems (Computer Science), so there is a certain technology bias here.

Avoid Tool Chaos -- Use Existing Infrastructure as Much as Possible

Most universities are using some kind of learning management or e-learning system such as Moodle. They are never perfect of course, but it is really a good idea to use them as much as possible, because:

  • your students are already enrolled and typically know the system well;
  • Moodle and similar systems provide a ton of collaboration features that you may have ignored so far but that are really useful such as Wikis, forums, etc. They may not always be super-fancy as some individual externally-hosted services -- but think about your priorities in crisis times…
  • Your learning management and e-learning systems are production tools that contribute to your university's core business, so there is a good probability that they are actually well-provisioned and well maintained -- good in times of fast-growing demand.

Of course everyone has their favorite Wiki, shared editing, online collaboration tool etc., but just going for these incurs cost on two sides:

  • You have to select, assess, set-up and configure them. When things break because of exploding demand, you have to re-iterate etc.
  • The combinatorial explosion for students that have to deal with all the different preferred tools is significant.

Understand Your Objectives and Teaching Methodology

When presence-based teaching is suspended, many people would probably think: "OK, I have to do Zoom lectures now".

First of all, translating all presence-based activities directly to online lectures will most likely be extremely stressful for both you and your students. You would not imagine the fatigue that sets in after a full day of different online courses. So, it's not unreasonable to scale down both the density of individual lectures as well as their cadence.

Moreover, not everything has to be done in synchronous online meetings. Luckily, there are many ways to share knowledge, engage in discussions that are used productively in non-covid-19 times such as shared editing, Wiki, forums (see above).

Finding a good balance between synchronous and asynchronous teaching/collaboration can also really transform your courses from traditional teaching & examination to something more interesting.

Also, reach out to the didactics experts in your organization (or elsewhere). They may actually have good tips for unconventional methods that were never quite practical but might just be useful now.

In that context, most people would probably agree that good education ("Bildung" in german) is more than programming skills into brains, so the crisis could be a good opportunity to double down on (r)evolving education to the Humboldtian model of higher education.

Specifically, I am referring to promoting self-determination and responsibility, encouraging self-motivated learning, and combining research and teaching.

So when thinking about methodology and tools, do not only think about the tools that you use -- also think about what you offer to enable students study, discuss, research without you. Luckily, most students don't need to be taught with respect to good online collaboration tools, but it may still be useful to provide a space on a reliable platform, at least for kickstarting things.

With respect to tools, personally, I have converged to:

  • asynchronous collaboration tools (forum, Wiki, tests) in Moodle;
  • file sharing, collaborative software development with gitlab;
  • online teaching with BigBlueButton (see below);
  • online discussion (smaller groups, chat rooms for students) with Jitsi Meet (see below);

I have been using many of those before anyway, so no big change.

As a meta-remark, I would also recommend to manage everyone's expectations (especially your own ones): crisis means change for everyone, lots of improvisation, hopefully lots of volunteer efforts. There is really no need to expects (and demand) perfection. While it's good to carry on and make sure students get their education and degrees, nobody will be angry if there is slow-start, some slack in-between and some prioritization of fewer topics -- maybe quite the opposite.

Online Teaching and Tools

Having sorted out asynchronous vs. synchronous collaboration above, there is still a lot to be said about online teaching tools. Again, it's quite important to understand your objectives -- and what the individual tools are actually intended for. Also, when deciding on a particular tools or platform, it does not hurt to understand some technical basics, such as how Internet multimedia works, what scalability means, how infrastructure constraints may affect your experience etc.

It's very natural: when we have worked with some online communication/collaboration tool, and it worked OK-ish, we tend to use it again, sometimes also for unintended purposes. However, it's important to understand two things:

  1. Online teaching can mean different things (making video available vs. interactive online classrooms), and although you can stretch things sometimes, there is not one tool that fits all purposes.
  2. Just the fact that a tools worked once for you, and the user experience was not completely horrible, does not imply that it will work well in your lecture.

Different Forms of Online Teaching

This might be self-explanatory to most, but let me just quickly explain the basics:

  • Lecture Streaming/Broadcasting (on YouTube, Twitch and similar platforms) is great for distributing recorded or live content to audiences. Although there are chat-based feedback mechanisms, it's not the same as virtual classrooms with interactive discussion, collaborative editing etc. Don't get me wrong, I have used YouTube for live lectures myself and it's OK if you can accept the constraints. I would use it for public, pre-recorded content mostly.
  • General-purpose online multimedia communication (with Skype, Jitsi Meet, Google Hangouts, WebEx, Zoom etc.) is great to discuss in your team, family etc., and sometimes they can also scale to larger conference calls, but they are not primarily intended for online teaching. For example, these tools would often lack collaborative editing, integrated document sharing, integration with learning management and e-learning systems etc. Of courses, some of them are also quite feature-rich and certainly usable (I have done lectures with tools like that without problems), but it's better to treat them as a fallback -- for example in crisis times, when you need a fast solution.
  • Online Teaching tools (for example BigBlueButton) are specialized multimedia conferencing tools that provide extra functionality to ensure a good teaching and learning experience. For example, they make sure that presentation material sharing works really well (and is not just window sharing in a video stream), they use reasonable resolution and bandwidth settings that balance quality and resource efficiency, somebody has thought about UX design for teachers' view, they make it easy to share session recordings, and they integrate with your LMS.

The point here is not that one of these is better than the others -- these are really just different categories for different purposes. If you can, pick the right one for your needs.

Technical Constraints

Believe it or not, thanks to relentless research and engineering efforts by the networking community, namely the IETF and IRTF, (interactive) multimedia real-time communication is technically a solved problem. Still, sometimes things get screwed up badly -- why?

With most hosted online communication tools, there is really no point in extrapolating one's one-time experience to general applicability in a class room scenario. Even if tool A worked well in your class today, it does not have to mean it will well for your colleague tomorrow. There are different factors that affect, for example scalability and usability, especially in crisis times.

For example, when talking about scalability, there are two dimensions:

  1. How many participants can you have in one session? Obviously, this also depends on what you do, e.g., is there one video sender or 100? But independent of that, some tools may have design characteristics (video formats and encoding options, protocols, scalability of the server software) that make them support larger crowds better or worse than others.
  2. How many conferences can you have the at the time? Assuming you are a university, this would be an interesting question. For hosted systems, this fundamentally depends on the available (or allocated) resources, i.e., servers at your provider.

In other words, there can be systems that are great, a pleasure to use from a software design perspective, but in order to make a credible statement on applicability to your online teaching, you need to consider:

  • Who is hosting the system?
  • How does performance isolation work?
  • How oversubscribed is the service (now and at peak times)?
  • What is the latency between you (your participants) and the server(s), i.e., where are they hosted?

Some conference systems work with static resource allocation, for example one virtual machine per personalized conference server. This can work well, depending on how many VMs are allocated to one physical server. Others may use modern cloud-native auto-scaling. In general, great -- but it still depends on how generous you are with respect to resource allocation.

The point I am trying to make is that it is often not very helpful to recommend your popular tool if you cannot say anything about the deployment parameters and the particular scaling approach.

Third-party Hosting vs. Self-Hosting

With all these uncertainties with externally hosted services one might ask: isn't it better to run a self-hosted conferencing server (farm), for example a licensed commercial system or an Open-Source system?

Well, this depends a lot on your infrastructure: Assuming you are using interactive online teaching with at least one video stream at a time, with todays technology you would need about 1 MBit/s per participant, i.e., 100 MBit/s for a class of hundred (on average -- can easily be double or more, depending on video quality). A university has many simultaneous lectures, so you might be reaching 1GBits/s with 10 simultaneous lectures already. That's not necessarily too much -- it depends on your institutions internal network and access to the Internet.

Video servers are also relatively resource-hungry. Nothing that cannot be handled by a few powerful servers, but you would have to set them up with a load balancer, maintain them etc.

This can all be done, and your typical sysadmins should be able to do that -- but it's probably not the right choice when your provost tells you that everybody has to switch to online teaching tomorrow.

My Recommendations

I am using a mix of asynchronous and different synchronous collaboration tools, i.e., as mentioned above:

  • Moodle-based management and collaboration (mailing lists, basic Wiki, forum, tests etc.)
  • gitlab (git repository plus Wiki mostly)
  • Self-hosted Jitsi Meet for online meetings
  • University-hosted BigBlueButton for online teaching

In the spirit of full transparency, I am also using WebEx as a fallback (kindly sponsored by Cisco) just to have some redundancy.

I am considering to use some form of instant messaging system (probably Jabber) to create a more inclusive, connected community for courses, but have not found the time yet to set this up. I used Slack before for university projects, but I don't want to make it a rule for all courses.

Jitsi Meet

Jitsi Meet is an Open Source video conferencing tool that you can use using Jitsi's server or host on your own infrastructure. It's using WebRTC (i.e., media streaming in your browser). Communication between your browser and the server is encrypted. The server is not mixing video (like on some systems) but is using selective forwarding (i.e., switching the main video stream depending on configuration and on who is currently talking).

It's a great tool that is sometimes underestimated because you don't see the different options when you just use the public service. For example Jitsi can do recording, youTube live streaming, collaborative editing (through Etherpad). There are also ways to run it with load-balancers for better scalability and availability, and you can even network the video bridges for better experience in global, large-scale conferences.

Load of our Jitsi VM with three parallel conferences (lectures, meetings)

There is still work to do with respect to video codecs (at least the transmission rates can be quite high sometimes), the way that recording and streaming is implemented (through a pseudo client that grabs video from a Chrome client) and usability of the server software (nothing crazy).

My recommendations for running your own Jitsi Meet server:

  • Use the docker installation option that runs the different server components in docker components and makes the initial setup really easy (including automatic Let's Encrypt certificate installation);
  • Encourage your users to use the desktop client application (instead of running it in a browser). The desktop client contains the same WebRTC code in a package. It works a bit better (performance- and reliability-wise) compared to using your average Chrome or Firefox -- I suspect because of potential feature interaction with Addons (and I use quite many).
  • Use the options for muting participants on joining and enforce certain rules for larger meetings (i.e., mute everybody except the main presenter, turn off video unless needed etc.)
  • Jitsi will happily use the best video quality that your camera can produce. Often, that is not needed -- you can configure lesser quality (which can reduce server/network load significantly).
  • The bottleneck in a Jitsi-Server is typically the network interface, so try to run in on server with a 10Gbit/s-interface for better results.

In my group, we set up our own server in the week before presence-based teaching got suspended, and it has proven to be super-valuable, especially when many of the centralized server-based systems failed on day one. Some of my colleagues are using it regularly for their courses.

BigBlueButton

BigBlueButton is a web conferencing system designed for online teaching and learning:

  • You can maintain several rooms (say one per class, each of which with its own configuration, recordings etc.). Every room has a unique URI -- that's all students have to know.
  • the UI is designed to enable tracking video, shared material, participants roster, chat windows;
  • presenting slides etc. is not done via screen/application sharing but though a dedicated distribution channel: you can upload PDFs and other formats to the server that then distributed this to the clients -- works much better than screen sharing in a video stream;
  • live multi-user whiteboards;
  • user polling;
  • recordings and replay on the platform; and
  • learning management system integration.

I am really convinced by the overall look and feel of the system and its performance and scalability. So far, I have used in lectures with up to 60 participants (on a spare, not high-end server) without any problems. The resource requirements seem comparatively lower compared to Jitsi Meet, probably also because of a more careful configuration of default video sending rates.

BigBlueButton Management Console (in Firefox)

From a security perspective, BigBlueButton uses encryption for all communication between your browser and the server (but the server still needs to have access, like the Jitsi Meet server).

My recommendation for running your own BigBlueButton server:

  • The client software runs well in a number of browser (tested Safari, Chrome, Firefox) so far. If you want to use WebRTC desktop/application window sharing, make sure you use it with Firefox or Chrome.
  • For university-scale deployments there is a load balancer for BigBlueButton that would allow to add more servers as you grow.

In summary, I am convinced that BigBlueButton addresses most if not all online teaching requirements. If you find a way to run or it (or have it hosted), you should use it.

Recording

Many of us had experimented with recording lectures before (for example, for flipped classroom setups). Now, with the general shift to online lectures, recording essentially becomes a "by-product", i.e., you can just turn it on (if your system supports it).

In the current crisis period, recording is actually not only nice-to-have -- I would even say that it's a crucial feature:

  • Having the possibility to provide access to recorded lectures can remove a lot of pressure in times of distress. There is a lot to process for each of us and just knowing that there will be recordings creates additional assurance.
  • Online teaching sessions are peak-utilization periods. Having videos for asynchronous consumption can help distributing the load because not everybody has to join the simultaneous live streaming.
  • Depending on the region your students live in, access networks may not be perfect, especially not if you have to share a low-bandwidth link with another student who is supposed to follow online lectures.
  • In times of lock-downs and travel restrictions, some of your students may actually be out of the country without a chance to return any time soon. They may not even be in the same timezone…
  • Although you would expect that everyone is currently practicing home-sheltering and should have lots of time on their hands, don't forget that crisis times can actually mean real crisis for
    individual people: they may have to take care of family members or
    themselves, queue at supermarkets, doctors' offices etc. -- so just because you are sitting at in your home office, does not have to imply that everybody is.

In summary, consider recording everything (with participant consent) and make it available whenever possible.

To Zoom or not to Zoom?

Zoom has been getting a lot of bad press recently and I'm getting many questions from colleagues and friends about it.

First of all, Zoom is a modern video conferencing service with excellent scaling properties, so performance is typically good, and it's also very easy to use.

Should you use it?

No

While some of the recent online articles are hyperbole and based on an incorrect understanding of how these systems typically work, there are, in my opinion, some strong arguments to stay away from Zoom:

  • Zoom is presenting itself as a (paid) conferencing service. However it has turned out that they also work with quite a few infamous tracking systems, i.e., they share data about you (and everyone who is using it) with the online tracking industry. Many websites do that because it's their main business model, and many users haven't been aware before GDPR at least forced them obtain your consent. It's not obvious why a commercial conferencing service has to do that, though.
  • For some websites and services, we have gotten used to ubiquitous tracking and we may either accept or not, or find ways to contain it (difficult). Personally, we may even be OK with tracking. However, it's a different thing for online teaching where you have a captive audience. By using a tracking-encumbered system in your lecture, you are essentially forcing your students to use it, too -- and to become a subject of tracking and surveillance themselves.
  • The Zoom client runs in a web browser (where expert users may be able to contain the tracking to some extent), however Zoom is trying to force users to install the standalone application.
  • Unfortunately, Zoom has demonstrated time-over-time that they do not understand basic system security, for example: webcam hack, malware installation trick on MacOS, use of single AES-128 key in ECB mode.

Bruce Schneier has summarized the most critical issues. Zoom is apparently another company that adopted the "grow fast, apologize later" approach and is now trying surf on the covid-19 wave to accelerate their growth at whatever cost.

These models used to be standard in the web and advertisement industry. Time is changing though, and as more people understand the problems of intransparent, uncontrolled surveillance, aggregation and unlimited storage, these business ethics will become increasingly unacceptable. In a few years we will look at it bewildered -- like we look at Weinstein-type misogyny in #MeToo times.

I am hoping that Zoom as a company gets the message, but I am not confident to be honest.

Luckily, for online teaching, we don't have to care because there are better alternatives anyway.

So, make wise choices.

Updates

  • 2020-04-04: Fixed formatting and other nits; added Zoom AES-128-ECB vulnerability and links to Citizen Lab's and Bruce Schneier's blog postings about it.

Written by dkutscher

April 2nd, 2020 at 11:01 pm

Information-Centric Networking RFCs on CCNx Published

without comments

The Internet Research Task Force (IRTF) has published two Experimental RFCs specifying the node behavior, message semantics, and the message syntax of the CCNx protocol: RFC 8569 (Content-Centric Networking (CCNx) Semantics) and RFC 8609 (Content-Centric Networking (CCNx) Messages in TLV Format). CCNx is one particular variant of ICN protocols. These specifications document the implementation of an available Open-Source implementation and are intended to encourage additional experiments with Information-Centric Networking technologies.

Background

Information-Centric Networking (ICN) is a class of architectures and protocols that provide "access to named data" as a first-order network service. Instead of host-to-host communication as in IP networks, ICNs often use location-independent names to identify data objects, and the network provides the services of processing (answering) requests for named data with the objective to finally deliver the requested data objects to a requesting consumer.

Such an approach has profound effects on various aspects of a networking system, including security (by enabling object-based security on a message/packet level), forwarding behavior (name-based forwarding, caching), but also on more operational aspects such as bootstrapping, discovery etc.

The CCNx and NDN variants of ICN are based on a request/response abstraction where consumers (hosts, applications requesting named data) send INTEREST messages into the network that are forwarded by network elements to a destination that can provide the requested named data object. Corresponding responses are sent as so-called DATA messages that follow the reverse INTEREST path.

Sometimes ICN has been mis-characterized as a solution for in-network caching, possibly replacing CDN. While ICN's location-independent access and its object-security approach does indeed enable opportunistic in-network data caching (e.g., for local retransmissions, data sharing), it is actually not the main feature -- it is actually rather a consequence of the more fundamental properties of 1) accessing named data, 2) object-security and integrated trust model, and 3) stateful forwarding.

Accessing Named Data

Each unique data object is named unambiguously in a hierarchical naming scheme and can be validated in a means specified by the producer, i.e., the origin source. (Data objects can also optionally be encrypted in different ways). The naming concept and the object-based validation approach lay the foundation for location independent operation, because data validity can be ascertained by any node in the network, regardless of where the corresponding messages was received from.

The network can generally operate without any notion of location, and nodes (consumers, forwarders) can forward requests for named data objects directly, i.e., without any additional address resolution. Location independence also enables additional features, for example the possibility to replicate and cache named data objects. Opportunistic on-patch caching is thus a standard feature in many ICN systems -- typically for enhancing reliability and performance.

Naming data and application-specific naming conventions are naturally important aspects in ICN. It is common that applications define their own naming convention (i.e., semantics of elements in the name hierarchy). Such names can often directly derived from application requirements, for example a name like /my-home/living-room/light/switch/main could be relevant in a smart home setting, and corresponding devices and application could use a corresponding convention to facilitate controllers finding sensors and actors in such a system with minimal user configuration.

Object-Security and Integrated Trust Model

One of the objection validation approaches is based on Public-Key cryptography, where publishers sign objects (parts of messages) and can name the Public Key in the message, so that a validator can retrieve the corresponding object (containing the Public Key and a certificate that would bind the key to a naming hierarchy). The certificate would be an element of a typical trust hierarchy.

Public-Key cryptography and PKI systems are also used in the Internet/Web today. In CCNx/NDN-based ICN, the key/certificate retrieval is directly provided by the network itself, i.e., it uses the same INTEREST/DATA protocol, and the system is typically used in a way that every object/message can be linked to a trust anchor.

Where that trust anchor resides is defined by the application semantics and its naming conventions. Unlike the Internet/Web today, it is not required to link to centralized trust anchors (such as root Certificate Authorities) -- instead it is possible to set up local, decentralized trustworthy networked systems in a permissionless manner.

Stateful Forwarding

In CCNx and NDN, forwarders are stateful, i.e., they keep track of forwarded INTEREST to later match the received DATA messages. Stateful forwarding (in conjunction with the general named-based and location-independent operation) also empowers forwarders to execute individual forwarding strategies and perform optimizations such as in-network retransmissions, multicasting requests (in cases there are several opportunities for accessing a particular named data object) etc.

Stateful forwarding enables nodes in the network to perform similar function as endpoints (i.e., consumers), so that there is not a strong distinction between these roles. For example, consumers and forwarders can control INTEREST sending rates to respond to observed network conditions. Adapting in-network transport behavior can thus be achieved naturally, i.e., without brittle, in-transparent middleboxes, TCP proxies etc.

ICN Scenarios

ICN is a general-purpose networking technologies and can thus be applied to many scenarios. I am highlighting a few particularly interesting ones in the following sections.

Scalable Media Distribution

The "Accessing Named Data" paradigm also implies that CCNx/NDN-based ICN is fundamentally connectionless. While there can be collections of Named Data Objects that are requested (and transmitted) in a flow-like manner (as a consecutive series, sharing paths), a server (producer) does not have to maintain any client or connection state -- one factor for making servers more scalable.

ICN forwarders can aggregate INTEREST received from different (for example, downstream) links for the same Named Data Object. Instead of forwarding the second, third etc. INTEREST for the same object, a forwarder (as part of its forwarding strategy) could decide to just record those INTERESTS (and note the interfaces they have been received from) and then later distribute the received object via all of these interfaces.

For live or near-live media distribution, this can enable an additional factor for scalability: 1) less INTERESTs are hitting the producers and 2) less INTEREST and DATA messages are transmitted over the network. Effectively, this behavior implement an implicit multicast-like tree-based distribution -- without any explicit signaling and (inter-domain) multicast routing.

Finally in-network caching can further reduce upstream traffic, i.e., by answering requests for currently popular objects from a forwarder cache.

The corresponding gains have been demonstrated in Proof-of-Concept implementations, for example in Cisco's hICN DASH-like video distribution system.

Multi-Access & Multi-Path Networking

Multi-Access networking is getting increasingly important as most mobile devices already provide at least two radio interfaces that can be used simultaneously. For example Apple's Siri can use Multipath TCP for trying to obtain better performance by combining mobile network and WLAN interfaces and by jointly managing the available resources.

ICN communication is inherently multipath in a sense that ICN is not connection-based and that any forwarder can make independent forwarding decisions for multipath INTEREST forwarding. ICN's location independence also enables a multidestination communication style: Named Data Object can be replicated in the network, so that the network could not only provide different paths to one producer but to many producers, which can increase network utilization and performance further.

These properties in conjunction with ICN's stateful forwarding model enables several optimizations (both for window- as well as rate-based congestion controlled multipath communication) of MPTCP's end-to-end control loop. An example of such an approach has been described by Mahdian et al..

Internet of Things (IoT)

IoT is a broad field, but often refers to 1) networking constrained devices and 2) communicating in local networks (that are not or should not be connected to the Internet on a permanent basis).

In low-power wireless networks with challenged connectivity, frequent power-saving and potentially node mobility, ICN can typically outperform IP-based technology stacks with respect to implementation simplicity, data availability and performance. The implementation simplicity stems from the ICN model of accessing named data directly, i.e., with integrated security and without the need for any resolution infrastructure and application layer protocols (in some IoT scenarios).

The data availability and performance improvements are caused by the stateful forwarding and opportunistic caching feature that are useful for multi-hop mesh networks with frequent connectivity changes due to sleep cycles and mobility. The stateful forwarding enables ICN to react more flexibly to changes, and in-network caching can keep data available in the network so that it can be retrieved at some time offset, for example when a sleeping wakes up and resumes communication with a next-hop node. Gündoğan et al. have performed an extensive analysis comparing NDN with CoAP and MQTT on large-scale IoT testbeds that demonstrated these benefits.

Computing in the Network

Recent advances in platform virtualization, link layer technologies and data plane programmability have led to a growing set of use cases where computation near users or data consuming applications is needed -- for example for addressing minimal latency requirements for compute intensive interactive applications (networked Augmented Reality, AR), for addressing privacy sensitivity (avoiding raw data copies outside a perimeter by processing data locally), and for speeding up distributed computation by putting computation at convenient places in a network topology.

Most application layer frameworks suffer from being conceived as overlays, i.e., they can enable certain forms of optimization (such as function placement, scaling) -- but do typically require centralized orchestration. Running as an overlay means, connecting compute functions through protocols such as TCP, requiring some form of resolution system that maps application-layer names to IP addresses etc.

Approaches such as Named Function Networking (NFN) and Remote Method Invocation for ICN (RICE) have demonstrated how the ICN approach of accessing named data in the network can be extended to accessing dynamic computation results, maintaining all the ICN security and forwarding/caching properties.

In such systems, computing and networking can be integrated in new ways, for example by allowing compute node to include knowledge about the ICN networks routing information base, currently observed availability and performance data for making offloading and scaling decisions. Consequentially, this enables a promising joint optimization of computing and networking resource that is especially attractive for fine-granular distributed system development.

Also see draft-kutscher-coinrg-dir for a general discussion of Computing in the Network.

The CCNx Specifications

The work on CCN started about 11 years ago in project led by Van Jacobson at PARC -- in parallel with many other research projects on ICN such as NetInf, PURSUIT etc. The CCN work split up into branches later: NDN (maintained by the NDN NSN projects) and CCNx (maintained by PARC).

In 2016, Cisco acquired the CCNx technology and the software implementations from PARC and continued working on them in research and proof-of-concepts, and trials. The software has been made available as a sub-project in the fd.io project and is now called CICN, featuring support for the VPP framework in fd.io.

This implementation largely follows the specification in the now published CCNx RFCs which are products of the IRTF ICN Research Group.

RFC 8569 describes the core concepts of the Content-Centric Networking (CCNx) architecture and presents a network protocol based on two messages: Interests and Content Objects. It specifies the set of mandatory and optional fields within those messages and describes their behavior and interpretation. This architecture and protocol specification is independent of a specific wire encoding.

RFC 8609 specifies the encoding of CCNx messages in a TLV packet format, including the TLV types used by each message element and the encoding of each value. The semantics of CCNx messages follow the encoding-independent CCNx Semantics specification.

Both of these RFCs have been authored by Marc Mosko, Nacho Solis, and Chris Wood.

More Information

The IRTF ICN Research Group is an international research forum that covers research and experimentation work across the different ICN approaches and projects. Its goal is to promote experimentation and validation activities with ICN technology.

There is also a yearly academic conference under the ACM SIGCOMM
umbrella. The 2019 ICN conference takes place from September 24 to 26 in HongKong. Previous editions of the conference:

Written by dkutscher

July 11th, 2019 at 3:02 pm

Posted in Blogroll,IRTF

Tagged with , , , ,

Great Expectations

without comments

Protocol Design and Socioeconomic Realities


(PDF version)

The Internet & Web as a whole qualify as wildly successful technologies, each of which empowered by wildly successful protocols per RFC 5218's definition [1]. As the Internet & Web became critical infrastructure and business platforms, most of the originally articulated design goals and features such as global reach, permissionless innovation, accessibility etc. [5] got overshadowed by the trade-offs that they incur. For example, global reach —intended as enabling global connectivity — can also imply global reach for infiltration, regime change and infrastructure attacks by state actors. Permissionless innovation — motivated by the intention to overcome the lack of innovation options in traditional telephone networks — has also led us to permissionless surveillance and mass-manipulation-based business models that have been characterized as detrimental from a societal perspective.

Most of these developments cannot be directly ascribed to Internet technologies alone. For example, most user surveillance and data extraction technologies are actually based on web protocol mechanisms and particular web protocol design decisions. While it has been documented that some of these technology and standards developments have been motivated by particular economic interests [2], it is unclear whether different Internet design decisions could have led to a different, "better" outcome. Fundamentally, economic drivers in different societies (and on a global scale) cannot be controlled through technology and standards development alone.

This memo is thus rather focused on specific protocol design and evolution questions, specifically on the question how technical design decisions relate to socio-economic effects, and aims at providing input for future design discussions, leveraging experience from 50 years of Internet evolution, 30 years of Web evolution, observations from economic realities, and from years of Future Internet research.

IP Service Model

The IP service model was clearly designed to provide a minimal layer over different link layer technologies to enable inter-networking at low implementation cost [3]. Starting off as an experiment, looking for feasible initial deployment strategies, this was clearly a reasonable approach. The IP service model of packet-switched end-to-end best-effort communication between hosts (host interfaces) over a network of networks, was implemented by:

  • an addressing scheme that allows specifying source and destination host (interface) addresses in a topologically structured address space; and
  • minimal per-hop behavior (stateless forwarding of individual packets).

The minimal model implied punting many functions to other layers, encapsulation, and/or "management" services (transport, dealing with names, security). Multicast was not excluded by the architecture, but also not very well supported, so that IP Multicast (and the required inter-domain multicast routing protocols) did not find much deployment outside well-controlled local domains (for example, telco IP TV).

The resulting system of end-to-end transport over a minimal packet forwarding service has served many applications and system implementations. However, over time, technical application as well as business requirements have led to additional infrastructure, extensions and new way of using Internet technologies, for example:

  • in-network transport performance optimization to provide better control loop localization in mobile networks;
  • massive CDN infrastructure to provide more scalable popular content distribution;
  • (need for) access control, authorization based on IP and transport layer identifiers;
  • user-tracking based on IP and transport layer identifiers; and
  • usage of DNS for localization, destination rewriting, and user tracking.

It can be argued that some of these approaches and developments have also led to some of the centralization/consolidation issues that are discussed today – especially with respect to CDN that is essentially inevitable for any large-scale content distribution (both static and live content). Looking at the original designs, the later understood commercial needs and the outcome today, one could ask the question, how would a different Internet service model and different network capabilities affect the tussle balance [5] between different actors and interests in the Internet?

For example, a more powerful forwarding service with more elaborate (and more complex) per-hop-behavior could employ (soft-) stateful forwarding, enabling certain forms of in-network congestion control. Some form of caching could help making services such as local retransmissions and potential data sharing at the edge a network service function, removing the need for some middleboxes.

Other systems such as the NDN/CCNx variants of ICN employ the principle of accessing named-data in the network, where each packet must be requested by INTEREST messages that are visible to forwarders. Forwarders can aggregate INTERESTs for the same data, and in conjunction with in-network storage, this can implement an implicit multicast distribution service for near-simultaneous transmissions.

In ICN, receiver-driven operation could eliminate certain DoS attack vectors, and the lack of source addresses (due to stateful forwarding) could provide some form of anonymity. The use of expressive, possibly application-relevant names could enable better visibility by the network —however potentially enabling both, more robust access control and (on the negative side) more effective hooks for censoring communication and monitoring user traffic.

This short discussion alone illustrates how certain design decisions can play out in the real world later and that even little changes in the architecture and protocol mechanisms can shift the tussle balance between actors, possibly in unintended ways. As Clark argued in [3], it is important to understand the corresponding effects or architectural changes, let alone bigger redesign efforts.

The Internet design choices at a time were motivated by certain requirements that were valid at the time — but may not all still hold today. Todays networking platforms are by far more powerful, more programmable. The main applications are totally different as are the business players and the governance structures. This process of change may continue in the future, which adds another level of difficulty for any change of architecture elements and core protocols. However, this does not mean that we should not try it.

Network Address Translation

Network Address Translation (NAT) has been criticized for impeding transport layer innovation, adding brittleness, and delaying IPv6 adoption. At the same time NAT was deemed necessary for growing the Internet eco system, for enabling local network extensions at the edge without administrative configuration. It also provides a limited form of protection against certain types of attacks. As such it addressed shortcomings of the system.

The implicit client-initiated port-forwarding (the technical reason for the limit attack protection mentioned above) is obviously blocking both unwanted and wanted communication, which makes it difficult to run servers at homes, enterprise sites etc. in a sound way (manual configuration of port forwarding still comes with limitations). This however could be seen as one of the drivers for the centralization of servers in data centers ("cloud") that is a concern in some discussions today. [4]

What does this mean for assessing and potentially evolving previous design decisions? The NAT use cases and their technical realization are connected to several trade-offs that impose non-trivial challenges for potential architecture and protocol evolution: 1) Easy extensibility at the edge vs. scalable routing; 2) Threat protection vs. decentralized nature of the system; 3) Interoperability vs. transport innovation.

In a positive light, use cases such as local communication and dynamic Internet extension at the edge (with the associated security challenges) represent interesting requirements that can help finding the right balance in the design space for future network designs.

Encryption

Pervasive monitoring is an attack [7], and it is important to assess existing protocol and security frameworks with respect to changes in the way that the Internet is being used by corporations and state-level actors and to develop new protocols where needed. QUIC is encrypting transport headers in addition to application data, intending to make user tracking and other monitoring attacks harder to mount.

Economically however, the more important use case of user tracking today is the systematic surveillance of individuals on the web, i.e., through a massive network of tracking, aggregation and analytics entities [6]. Ubiquitous encryption of transport and application protocols does not prevent this at all — on the contrary, it makes it more difficult to detect, analyze, and, where needed, prevent user tracking. This does not render connection encryption useless (especially not because surveillance in the network and on web platforms complement each other through aggregation and commercial trading of personally identifying information (PII)) but it requires a careful consideration of the trade-offs.

For example, perfect protection against on-path monitoring is only effective if it covers the complete path between a user agent and the corresponding application server. This shifts the tussle balance between confidentiality and network control (enterprise firewalls, parental control etc.) significantly. Specifically for QUIC, which is intended to run in user space, i.e., without the potential for OS control, users may end up in situations where they have to trust the application service providers (who typically control the client side as well, through apps or browsers, as well parts of the CDN and network infrastructure) to transfer information without leaking PII irresponsibly.

If the Snowden revelations led to a better understanding of the nature and scope of pervasive monitoring and to best current practices for Internet protocol design, what is the adequate response to the continuous revelations of the workings and extent of the surveillance industry? What protocol mechanisms and API should we develop, and what should we rather avoid?

DNS encryption is another example that illustrates the trade-offs. Unencrypted DNS (especially with the EDNS0 client subnet option, depending on prefix length and network topology) can increase of privacy violations by on-path/intermediary monitoring.

DNS encryption can counter certain on-path monitoring attacks — but it could effectively make the privacy situation for users worse, if it is implemented by centralizing servers (so that application service providers, in addition to tracking user behaviour for one application, can now also monitor DNS communication for all applications). This has been recognized in current proposals, e.g., limiting the scope for DNS encryption to stub-to-resolver communication. While this can be enforced by architectural oversight in standards development, we do not yet know how we can enforce this in actual implementation, for example for DNS over QUIC.

Future Challenges: In-Network Computing

Recent advances in platform virtualization, link layer technologies and data plane programmability have led to a growing set of use cases where computation near users or data consuming applications is needed — for example for addressing minimal latency requirements for compute-intensive interactive applications (networked Augmented Reality, AR), for addressing privacy sensitivity (avoiding raw data copies outside a perimeter by processing data locally), and for speeding up distributed computation by putting computation at convenient places in a network topology.

In-network computing has mainly been perceived in four main variants so far: 1) Active Networking, adapting the per-hop-behavior of network elements with respect to packets in flows, 2) Edge Computing as an extension of virtual-machine (VM) based platform-as-a-service to access networks, 3) programming the data plane of SDN switches (leveraging powerful programmable switch CPUs and programming abstractions such as P4), and 4) application-layer data processing frameworks.

Active Networking has not found much deployment due to its problematic security properties and complexity. Programmable data planes can be used in data centers with uniform infrastructure, good control over the infrastructure, and the feasibility of centralized control over function placement and scheduling. Due to the still limited, packet-based programmability model, most applications today are point solutions that can demonstrate benefits for particular optimizations, however often without addressing transport protocol services or data security that would be required for most applications running in shared infrastructure today.

Edge Computing (just as traditional cloud computing) has a fairly coarse-grained (VM-based) computation-model and is hence typically deploying centralized positioning/scheduling though virtual infrastructure management (VIM) systems. Application-layer data processing such as Apache Flink on the other hand, provide attractive dataflow programming models for event-based stream processing and light-weight fault-tolerance mechanisms — however systems such as Flink are not designed for dynamic scheduling of compute functions.

Ongoing research efforts (for example in the proposed IRTF COIN RG) have started exploring this space and the potential role that future network and transport layer protocols can play. It is feasible to integrate networking and computing beyond overlays, potentially ? What would be a minimal service (like IP today) that has the potential for broad reach, permissionless innovation, and evolution paths to avoid early ossification?

Conclusions

Although the impact of Internet technology design decisions may be smaller than we would like to think, it is nevertheless important to assess the trade-offs in the past and the potential socio-economic effects that different decisions could have in the future. One challenge is the depth of the stack and the interactions across the stack (e.g., the perspective of CDN addressing shortcomings of the IP service layer, or the perspective of NAT and centralization). The applicability of new technology proposals therefore needs a far more thorough analysis — beyond proof-of-concepts and performance evaluations.

References

[1] D. Thaler, B. Aboba; What Makes for a Successful Protocol?; RFC 5218; July 2008

[2] S. Greenstein; How The Internet Became Commercial; Princeton University Press; 2017

[3] David Clark; Designing an Internet; MIT Press; October 2018

[4] Jari Arkko et al.; Considerations on Internet Consolidation and the Internet Architecture; Internet Draft https://tools.ietf.org/html/draft-arkko-iab-internet-consolidation-01; March 2019

[5] Internet Society; Internet Invariants: What Really Matters; https://www.internetsociety.org/internet-invariants-what-really-matters/; February 2012

[6] Shosanna Zuboff; The Age of Surveillance Capitalism; PublicAffairs; 2019

[7] Stephen Farrell, Hannes Tschofenig; Pervasive Monitoring is an Attack; RFC 7258; May 2014

Change Log

  • 2019-06-07: fixed several typos and added clarification regarding EDNS0 client subnet (thanks to Dave Plonka)

Written by dkutscher

June 4th, 2019 at 11:30 am

Posted in Blogroll,Posts

Tagged with ,