Dirk Kutscher

Personal web page

Dagstuhl Seminar on Greening Networking: Toward a Net Zero Internet

without comments


We (Alexander Clemm, Michael Welzl, Cedric Westphal, Noa Zilbermann, and I) organized a Dagstuhl seminar on Green Networking: Toward a Net Zero Internet.

Making Networks Greener

As climate change triggered by CO2 emissions dramatically impacts our environment and our everyday life, the Internet has proved a fertile ground for solutions, such as enabling teleworking or teleconferencing to reduce travel emissions. It is also a significant contributor to greenhouse gas emissions, e.g. through its own significant power consumption. It is thus very important to make networks themselves "greener" and devise less carbon-intensive solutions while continuing to meet increasing network traffic demands and service requirements.

Computer scientists and engineers from world-leading universities and international companies, such as Ericsson, NEC, Netflix, Red Hat, and Telefonica came together in a Seminar on Green Networking (Toward a Net Zero Internet) at Schloss Dagstuhl – Leibniz Center for Informatics, between September 29th and October 2nd, 2024. Organized by leading Internet researchers from the Hong Kong University of Science and Technology (Guangzhou), the University of Oxford, the University of Oslo and the University of California, Santa Cruz, they met to identify and prioritize the most impactful networking improvements to reduce carbon emission, define action items for a carbon-aware networking research agenda, and foster/facilitate research collaboration in order to reduce carbon emissions and to positively impact climate change.

Interactions between the Power Grid, Larger Systems, and the Network

In addition to pure networking issues, the seminar also analyzed the impact of larger systems that are built with Internet technologies, such as AI, multimedia streaming, and mobile communication networks. For example, the seminar discussed energy proportionality in networked systems, to allow systems to adapt their energy consumption to actual changes in utilization, so that savings can be achieved in idle times. Such a behavior would require better adaptiveness of applications and network protocols to cost information (such as carbon impact).

Moreover, networked systems can interact with the power grid in different ways, for example adapting energy consumption to current availability and cost of renewable energy, which can be helpful for joint planning of grid and network/networked-systems/cloud, achieving maximum efficiency/savings.

The seminar attendees are working with international research and standardization organizations such as the Internet Engineering Task Force (IETF) and ETSI, and it is expected that the seminar will make contributions to future research and standardization agendas in such organizations to bring the Internet to Net Zero emissions.

Organizers

  • Alexander Clemm (Los Gatos, US)
  • Dirk Kutscher (HKUST - Guangzhou, CN)
  • Michael Welzl (University of Oslo, NO)
  • Cedric Westphal (University of California, Santa Cruz, US)
  • Noa Zilberman (University of Oxford, GB)

References

Written by dkutscher

October 2nd, 2024 at 11:30 am

Networked Metaverse Systems

without comments

The term ‘Metaverse’ often denotes a wide range of existing and fictional applications. Nevertheless, there are actual systems today that can be studied and analyzed. However, whereas a considerable body of work has been published on applications and application ideas, there is less work on the technical implementation of such systems, especially from a networked systems perspective.

In a recently published open access journal article, we share some insights into the technical design of Metaverse systems, their key technologies, and their shortcomings, predominantly from a networked systems perspective. For the scope of this study, we define the ‘Metaverse’ as follows. The ‘Metaverse’ encompasses various current and emerging technologies, and the term is used to describe different applications, ranging from Augmented Reality (AR), Virtual Reality (VR),and Extended Reality (XR) to a new form of the Internet or Web. A key feature distinguishing the Metaverse from simple AR/VR is its inherently collaborative and shared nature, enabling interaction and collaboration among users in a virtual environment.

Building on Existing Platforms and Network Stacks

Most current Metaverse systems and designs are built on existing technologies and networks. For example, massively multiplayer online games such as Fortnite use a generalized client-server model. In this model, the server authoritatively manages the game state, while the client maintains a local subset of this state and can predict game flow by executing the same game code as the server on approximately the same data. Servers send information about the game world to clients by replicating relevant actors and their properties. Commercial social VR platforms such as Horizon Worlds and AltspaceVR use HTTPS to report client-side information and synchronize in-game clocks across users.

Mozilla Hubs, built with A-Frame (a web framework for building virtual reality experiences), uses WebRTC communication with a Selective Forwarding Unit (SFU). The SFU receives multiple audio and video data streams from its peers, then determines and forwards relevant data streams to connected peers. Blockchain or Non-Fungible Token (NFT)-based online games, such as Decentraland, run exclusively on the client side but allow for various data flow models, ranging from local effects and traditional client-server architectures to peer-to-peer (P2P) interactions based on state channels; Upland is built on EOSIO, an open-source blockchain protocol for scalable decentralized applications, and transports data through HTTPS. Connections between peers in Upland are established using TLS or VPN tunnels.

Many studies have focused on improving various aspects of Metaverse systems. For example, EdgeXAR is a mobile AR framework using edge offloading to enable lightweight tracking with six degrees of freedom (DOF) while reducing offloading delay from the user’s view; SORAS is an optimal resource allocation scheme for edgeenabled Metaverse, using stochastic integer programming to minimize the total network cost; Ibrahim et al. explores the issue of partial computation offloading for multiple subtasks in an in-network computing environment, aiming to minimize energy consumption and delay. However, these ideas for offloading computation and rendering tasks to edge platforms often conflict with the existing end-to-end transport protocols and overlay deployment models. Recently, a Deep Reinforcement Learning (DRL)-based multipath network orchestration framework designed for remote healthcare services is presented, automating subflow management to handle multipath networks. However, proposals for scalable multi-party communication would require interdomain multicast services, unavailable on today’s Internet.

Disconnect Between High-Level Concepts and Actual Systems

In practice, there is a significant disconnect between high-level Metaverse concepts, ideas for technical improvements, and systems that are actually developed and partially deployed. A 2022 ACM IMC paper titled Are we ready for metaverse?: a measurement study of social virtual reality platforms analyzes the performance of various social VR systems, pinpointing numerous issues related to performance, communication overhead, and scalability. These issues are primarily due to the fact that current systems leverage existing platforms, protocols, and system architectures, which cannot tap into any of the proposed architectural and technical enhancements, such as scalable multi-party communication, offloading computation, rendering tasks, etc.

Rather than merely layering ‘the Metaverse’ on top of legacy and not always ideal foundations, we consider Metaverse as a driver for future network and web applications and actively develop new designs to that end. In our article, we take a comprehensive systems approach and technically describe current Metaverse systems, focusing on their networking aspects. We document the requirements and challenges of Metaverse systems and propose a principled approach to system design for these requirements and challenges based on a thorough understanding of the needs of Metaverse systems, the current constraints and limitations, and the potential solutions of Internet technologies.

Article Overview

  1. We present a technical description of the ‘Metaverse’ based on existing and emerging systems, including a discussion of its fundamental properties, applications, and architectural models.
  2. We comprehensively study relevant enabling technologies for Metaverse systems, including HCI/XR technologies, networking, communications, media encoding, simulation, real-time rendering and AI. We also discuss current Metaverse system architectures and the integration of these technologies into actual applications.
  3. We conduct a detailed requirements analysis for constructing Metaverse systems. We analyze applications specific requirements and identify existing gaps in four key aspects: communication performance, mobility, large-scale operation,and end system architecture. For each area, we propose candidate technologies to address these gaps.
  4. We propose a research agenda for future Metaverse systems, based on our gap analysis and candidate technologies discussion. We re-assess the fundamental goals and requirements, without necessarily being constrained by existing system architectures and protocols. Based on a comprehensive understanding of what Metaverse systems need and what end-systems, devices, networks and communication services can theoretically provide, we propose specific design ideas and future research directions to realize Metaverse systems that can meet the expectations often articulated in the literature.

References

Written by dkutscher

September 8th, 2024 at 7:47 am

Posted in Publications

Tagged with , , ,

Affordable HPC: Leveraging Small Clusters for Big Data and Graph Computing

without comments

In our paper at PCDS-2024, we are exploring strategies for academic researchers to optimize computational resources within limited budgets, focusing on building small, efficient computing clusters. We analyzed the comparative costs of purchasing versus renting servers, guided by market research and economic theories on tiered pricing. The paper offers detailed insights into the selection and assembly of hardware components such as CPUs, GPUs, and motherboards tailored to specific research needs. It introduces innovative methods to mitigate the performance issues caused by PCIe switch bandwidth limitations in order to enhance GPU task scheduling. Furthermore, a Graph Neural Network (GNN) framework is proposed to analyze and optimize parallelism in computing networks.

Growing Resource Demands for Large-Scale Machine Learning

Large machine learning (ML) models, such as language models (LLMs), are becoming increasingly powerful and gradually accessible to end users. However, the growth in the capabilities of these models has led to memory and inference computation demands exceeding those of personal computers and servers. To enable users, research teams, and others to utilize and experiment with these models, a distributed architecture is essential.

In recent years, scientific research has shifted from a ”wisdom paradigm” to a ”resource paradigm.” As the number of researchers and the depth of scientific exploration increase, a significant portion of research computing tasks has moved to servers. This shift has been facilitated by the development of computing frameworks and widespread use of computers, leading to an increased demand for computer procurement.

Despite the abundance of online tutorials for assembling personal computers, information on the establishment of large clusters is relatively scarce. Large Internet companies and multinational corporations usually employ professional architects and engineers or work closely with vendors to optimize their cluster performance. However, researchers often do not have access to these technical details and must rely on packaged solutions from service providers to build small clusters.

Towards Affordable HPC

In our paper "Affordable HPC: Leveraging Small Clusters for Big Data and Graph Computing", we aim to bridge this gap by providing opportunities for researchers with limited funds to build small clusters from scratch. We compiled the necessary technical details and guidelines to enable researchers to assemble clusters independently. In addition, we propose a method to mitigate the performance degradation caused by the bandwidth limitations of PCIe switches, which can help researchers prioritize GPU training tasks effectively.

The papers discusses:

  1. How to build cost-effective clusters: We provide a comprehensive guide for researchers with limited funds, helping them to independently build small clusters and contribute to the development of large models.
  2. Performance Optimization: We propose a method to address the performance degradation caused by PCIe switch bandwidth limitations. This method allows researchers to prioritize GPU training tasks effectively, thereby improving the overall cluster performance.
  3. GNN for Network and Neural network parallelism: We propose a GNN (Graph Neural Network) framework that combines neural networks with parallel network flows in distributed systems. Our aim is to integrate different types of data flows, communication patterns, and computational tasks, thereby providing a novel perspective for evaluating the performance of distributed systems.

References

Written by dkutscher

September 2nd, 2024 at 5:25 am

Next Steps for Content Syndication

without comments

This is a follow-up on Mark Nottinhgam's blog post on What RSS Needs that I read with some interest.

RSS and Atom have been enabling non-mediated feeds for website updates that are very useful and once were quite popular until the Web took a different direction. Mark is discussing some areas that should be addressed for revitalizing such feeds, based on what we know today. He talked about Community, User Agency, Interoperability Tests, Best Practices for Feeds, Browser Integration, Authenticated Feeds, and Publisher Engagement. Check out his blog posting for details.

I would like to offer some additional thoughts:

Features that should be maintained from RSS/Atom

Receiver-driven operation

The user device ("client") should generally be in control and fetch updates based on its own schedule and requirements. This fits well with typical web interactions, i.e., HTTP GET. See below for additional ideas in section "Protocol Independence".

Aggregation

Aggregation, i.e., the combination of different input feed for forming a new feed as a feature in RSS and Atom. This should obviously be maintained. It may need some additional security (authentication) mechanisms – see below under "Data-oriented security".

User-controlled interaction with feed content

Mark mentioned some features such as feedback from feed readers to content providers, e.g., using so-called "privacy-preserving measurement". This should be made clearly optional, and the user should be offered opting-in, i.e., it should not be the default.

New Ideas

Learn from ActivityPub

In general, it would be good to study ActivityPub and see what features and design elements would be useful. ActivityPub is a decentralized social networking protocol based on the ActivityStreams JSON data format. It does a lot more than one would need for syndication (notably it is designed for bi-directional updates), but some properties are, in my opinion, useful for syndication, too.

Modularization

In RSS, a feed is typically a single XML document that contains a channel with items for the individual updates. When a feed is updated, the entire document is regenerated, and the receiver then has to filter updates that had been received before. Atom had a feed paging concept that allowed clients to navigate through paginated feed entries, but each of those is still a standalone document.

To enable better sharing, re-use of feed updated in different context and more scalable distribution, feed updates could provide a more modular structure, in similar ways as ActivityPub does.

Protocol independence

RSS and Atom are technically not bound to HTTP, although that is of course the dominant way of using them. However, it is theoretically possible to disseminate feed updates through other means, e.g., e-mail, and I think this should be considered for a future syndication system as well.

More specifically, push-based operation should be enabled (beyond e-mail). For example, it should be possible to receive feed updates via broadcast/multicast channels.

Another example may be publish/subscribe-based updated. There is a W3C Recommendation called WebSub that specified a HTTP-based pub/sub framework for feed updates. I am suggesting to use this as an example, but not necessarily as the only way to do pub/sub and pushed updated.

Moreover, it should be possible to use the syndication framework in "local-first" environments, i.e., with non-public-facing servers.

Data-oriented security

Thes use cases have some security implications. It must be possible to authenticate feed updates independent of the communication channel.

Written by dkutscher

August 25th, 2024 at 3:24 pm

Posted in Posts

Tagged with , , , , ,

Nordwest-IX Internet Exchange Point

without comments

DE-CIX and EWE TEL opened the new Nordwest-IX Internet exchange point in Oldenburg, Germany on 2024-08-15.

DE-CIX, the largest Internet Exchange in Europe and the second-largest in the world, has eight locations in Germany now: Oldenburg, Berlin, Düsseldorf, Frankfurt, Hamburg, Leipzig, Munich, Ruhr region. They have recently begun to decentralize their IXPs in Germany by opening new IXPs in addition to their main location in Frankfurt.

Can IXPs help with Internet Decentralization?

In the IRTF Research Group on the Decentralization of the Internet (DINRG), we are investigating root causes for and potential counter-measures against Internet Centralization. There are two aspects for centralization/decentralization and IXPs:

  1. Internet peering happens mostly at public IXPs, locally centralized exchange points in an otherwise logically decentralized network of Autonomous Systems. Big application service providers ("hyperscalers") are also engaging in so-called "Direct Peering" (or "Private Peering") where they connect their network directly to, typically, Internet Service Providers that provide Internet access and can benefit from a direct connection to dominant content/service providers. Often, it is the hyperscaler who benefits most in terms of cost saving. Decentralizing IXPs can provide incentives for such networks to connect at IXPs instead of doing direct peering, which is often seen as beneficial as it increases connectivity options and it reduces cost and latency.
  2. IP connectivity alone is not a sufficient condition for low latency and decentralization though, as most hyperscaler applications rely on some form of CDN overlay network. Even with potential local IP forwarding, CDN proxies may be hosted at central locations. To counter that, it is important to create co-location and local edge service hosting opportunities at or closed to IXPs, which can be a business opportunity for the connected ISPs, such we EWE TEL for Nordwest-IX.

The Internet is evolving, and new technologies might change the role of overlays in the future. For example, technologies such as Media-over-QUIC (MoQ) might lead to massive caching and replication overlay structures that will or will not be shared across applications and hyperscalers. IXPs and co-location data centers can be natural places for operating MoQ relays.

Written by dkutscher

August 15th, 2024 at 6:09 pm

Posted in Posts

Tagged with ,

IRTF DINRG at IETF-120

without comments

IRTF

We have an exciting agenda for our upcoming IRTF DINRG meeting (Wednesday, July 24th, 2024 at 09:30 in Vancouver) at IETF-120. If you do not attend the IETF-120 meeting locally, please consider attending online.

1 DINRG Chairs’ Presentation: Status, Updates Chairs 05 min
2 Exploring Decentralized Digital Identity Protocols Kaliya Young 20 min
3 DNS-Bound Client and Sender Identities Michael Richardson 20 min
4 Internet Fragmentation Sheetal Kumar 20 min
5 SOLID: Your Data, Your Choice Hadrian Zbarcea 20 min
6 Panel discussion: Internet Decentralization – Next Steps Chairs & Panelists 30 min
7 Wrap-up & Buffer Chairs 05 min

Documents and Links to Resources

Panel Description

Internet Decentralization – Next Steps

The previous DINRG meetings all had lively open mic discussions. However we noticed that those spontaneous conversations, while being interesting and insightful, tend to head to different issues in diverse directions. At this meeting we will continue/extend the previous discussions by gathering a small group of panelists and start the discussion with a list of questions collected from the previous meetings. We will have an open mic for all audience and share the list of discussion questions on DINRG list before the meeting; by gathering a panel and preparing a list of questions, we hope to make the discussions more effective and fruitful, moving towards our overarching goal of identifying an ordered list of issues that DINRG aims to address in coming years.

Links

Written by dkutscher

July 23rd, 2024 at 12:31 pm

Posted in IRTF

Tagged with , , ,

Secure Web Objects: Building Blocks for Metaverse Interoperability and Decentralization

without comments

In our upcoming paper at IEEE Metacom-2024, we propose a data-oriented approach for future Web and Metaverse system designs.

Abstract

This position paper explores how to support the Web's evolution through an underlying data-centric approach that better matches the data-orientedness of modern and emerging applications. We revisit the original vision of the Web as a hypermedia system that supports document composability and application interoperability via name-based data access. We propose the use of secure web objects (SWO), a data-oriented communication approach that can reduce complexity, centrality, and inefficiency, particularly for collaborative and local-first applications, such as the Metaverse and other collaborative applications. SWO are named, signed, application-defined objects that are secured independently of their containers or communications channels, an approach that leverages the results from over a decade-long data-centric networking research. This approach does not require intermediation by aggregators of identity, storage, and other services that are common today. We present a brief design overview, illustrated through prototypes for two editors of shared hypermedia documents: one for 3D and one for LaTeX. We also discuss our findings and suggest a roadmap for future research.

References

Written by dkutscher

July 23rd, 2024 at 10:55 am

Networked Systems for Distributed Machine Learning at Scale

without comments

On July 3rd, 2024, I gave a talk at the UCL/Huawei Joint Lab Workshop on "Building Better Protocols for Future Smart Networks" that took place on UCL's campus in London.

Talk Abstract

Large-scale distributed machine learning training networks are increasingly facing scaling problems with respect to FLOPS per deployed compute node. Communication bottlenecks can inhibit the effective utilization of expensive GPU resources. The root cause of these performance problems is not insufficient transmission speed or slow servers; it is the structure of the distributed computing and the communication characteristics it incurs. Large machine learning workloads typically provide relatively asymmetric, and sometimes centralized, communication structures, such as gradient aggregation and model update distribution. Even when training networks are less centralized, the amount of data that needs to be sent to aggregate several thousand input values through collective communication functions such as AllReduce can lead to Incast problems that overload network resources and servers. This talk discusses challenges and opportunities for developing in-network aggregation systems from a distributed computing and networked systems perspective.

Written by dkutscher

July 22nd, 2024 at 3:23 pm

ACM Conext-2024 Workshop on the Decentralization of the Internet

without comments

Sponsors

Recent years have witnessed the consolidation and centralization of the Internet applications, services, as well as the infrastructure. This centralization has economic aspects and factors as well as technical ones. The effects are often characterized as detrimental to the original goals of the Internet, such as permissionless innovation, as well as to society at large, due to the amount of (personal) data that is obtained and capitalized on by large platforms.

We are organizing a workshop at ACM CoNEXT-2024 to provide a forum for academic researchers to present and discuss on-going work on this topic and to create greater awareness in the larger community for this topic. The workshop would solicit work on specific topics including but not limited to:

  • investigation of the root causes of Internet centralization, and articulation of the impacts of the market economy, architecture and protocol designs, as well as government regulations;
  • measurement of the Internet centralization and the consequential societal impacts;
  • characterization and assessment of observed Internet centralization;
  • new research topics and technical solutions for decentralized system and application development;
  • decentralized (cloud-independent) distributed system design;
  • protocols and algorithms for decentralized distributed systems; and
  • decentralized security and trust architectures and protocols for real-world Internet systems.

Submission Instructions

Please see the workshop homepage for details.

Written by dkutscher

May 31st, 2024 at 2:11 pm

RFC 9556: Internet of Things (IoT) Edge Challenges and Functions

without comments

Many Internet of Things (IoT) applications have requirements that cannot be satisfied by centralized cloud-based systems (i.e., cloud computing). These include time sensitivity, data volume, connectivity cost, operation in the face of intermittent services, privacy, and security. As a result, IoT is driving the Internet toward edge computing.

We have published RFC 9556, outlining the requirements of the emerging IoT edge and its challenges. It presents a general model and major components of the IoT edge to provide a common basis for future discussions in the Thing-to-Thing Research Group (T2TRG) and other IRTF and IETF groups.

Today, many IoT services leverage cloud computing platforms because they provide virtually unlimited storage and processing power. The reliance of IoT on back-end cloud computing provides additional advantages, such as scalability and efficiency. At the time of writing, IoT systems are fairly static with respect to integrating and supporting computation. It is not that there is no computation, but that systems are often limited to static configurations (edge gateways and cloud services).

However, IoT devices generate large amounts of data at the edges of the network. To meet IoT use case requirements, data is increasingly being stored, processed, analyzed, and acted upon close to the data sources. These requirements include time sensitivity, data volume, connectivity cost, and resiliency in the presence of intermittent connectivity, privacy, and security, which cannot be addressed by centralized cloud computing. A more flexible approach is necessary to address these needs effectively. This involves distributing computing (and storage) and seamlessly integrating it into the edge-cloud continuum. We refer to this integration of edge computing and IoT as "IoT edge computing". RFC 9556 describes the related background, use cases, challenges, system models, and functional components.

Written by dkutscher

May 7th, 2024 at 11:12 am

Posted in IRTF,Publications

Tagged with , , , ,