Dirk Kutscher

Personal web page

COMETS accepted at IEEE TMM

without comments

Our paper on COMETS: Coordinated Multi-Destination Video Transmission with In-Network Rate Adaptation has been accepted for publication by IEEE Transactions on Multimedia (TMM)

Abstract

Large-scale video streaming events attract millions of simultaneous viewers, stressing existing delivery infrastructures. Client-driven adaptation reacts slowly to shared congestion, while server-based coordination introduces scalability bottlenecks and single points of failure. We present COMETS, a coordinated multi-destination video transmission framework that leverages information-centric networking principles such as request aggregation and in-network state awareness to enable scalable, fair, and adaptive rate control. COMETS introduces a novel range-interest protocol and distributed in-network decision process that aligns video quality across receiver groups while minimizing redundant transmissions. To achieve this, we develop a lightweight distributed optimization framework that guides per-hop quality adaptation without centralized control. Extensive emulation shows that COMETS consistently improves bandwidth utilization, fairness, and user-perceived quality of experience over DASH, MoQ, and ICN baselines, particularly under high concurrency. The results highlight COMETS as a practical, deployable approach for next-generation scalable video delivery.

Introduction to COMETS

Nowadays, large streaming events typically attract millions of viewers, and the demand for concurrent video consumption is also expanding dramatically. For example, the number of monthly sports streaming viewers have grown from 57 million in 2021 to more than 90 million in 2025, with more than 17% users participating in multiple streams simultaneously. This explosive growth exposes fundamental limitations in existing video delivery architectures: how to maintain consistent, fair Quality of Experience (QoE) when thousands of users compete for shared bottleneck resources.

Existing infrastructures are not designed for effective coordination and resource sharing among large numbers of simultaneous viewers, resulting in inefficient management of concurrent requests for the same content segments and insufficient coordination of network resource allocation among users of the shared infrastructure. These inefficiencies lead to redundant data transmission and suboptimal bandwidth utilization, ultimately impairing user QoE by increasing network congestion, unstable bitrates, and higher incidences of buffering, especially during peak usage scenarios. To address these challenges, an ideal video delivery system must possess coordinated, scalable, and adaptive capabilities to maximize bandwidth utilization while ensuring a fair, high-quality experience for all users. Such a system should aggregate requests for the same content to eliminate redundancy, make intelligent in-network decisions and distribute computational load to avoid bottlenecks.

a) Latency vs. User Load b) Mean Bitrate vs. User Load

Figure 1: Performance Comparison between baseline MoQ and server-optimized MoQ under increasing user load.

Current solutions exhibit fundamental limitations with respect to coordination and scalability. Client-adaptive approaches like Dynamic Adaptive Streaming over HTTP (DASH) enable individual clients to select video representations independently. However, their uncoordinated decisions, based on delayed and localized network views, lag behind the actual state of shared network bottlenecks, leading to bandwidth contention and bitrate oscillations. Server-side approaches address these limitations by centralizing adaptation logic, enabling optimal resource allocation through comprehensive network and user demand assessments. However, managing state and control interactions for numerous users introduces scalability challenges, and centralized decision architectures create single points of failure that compromise real-time performance. Our experiments (Figure 1) demonstrate that even state-of-the-art server-optimized Media over QUIC (MoQ) ultimately encounters the same scalability barriers as baseline approaches under high concurrency.

Key Insights

We observe that effective multi-user video streaming requires two properties: I). aggregation-aware delivery, where identical requests are merged to eliminate redundant transmissions, and II). distributed coordination, where adaptation decisions are made at points of request convergence rather than at centralized endpoints. This leads us to consider Information-Centric Networking (ICN). ICN provides inherent advantages for multi-user content distribution through in-network caching and request aggregation in systems like CCNx/NDN. While these features reduce redundant transmissions by merging duplicate requests at forwarders, existing ICN-based solutions focus on hop-by-hop adaptation rather than coordinated multi-user rate adaptation, suffering from decision lag and failing to ensure efficient convergence toward stable, fair rate allocations (i.e., equitable QoE distribution). To address these limitations, we present COMETS (Coordinated Multi-Destination Video Transmission with In-Network Rate Adaptation), a scalable, ICN-based multi-destination video streaming framework engineered to resolve challenges in large-scale video delivery: redundant data transmission, lack of scalable coordination, and inefficient system convergence.

Design Philosophy

COMETS is based on three principles that distinguish it from prior work: I). Group-aware rather than individual optimization. Instead of each client independently selecting bitrates, COMETS groups receivers with similar capabilities and network conditions, then aligns video quality across each group. This transforms the combinatorial complexity of individual decisions into tractable group-level optimization. II). Proactive rather than reactive adaptation. Unlike existing ICN approaches that react to congestion signals, COMETS uses a distributed Lagrangian framework where forwarders exchange dual variables (price signals) to anticipate upstream constraints. This enables proactive coordination without centralized state collection. III). Deployable overlay architecture. COMETS requires no modifications to network infrastructure. To ensure deployability, COMETS is architecturally flexible and can be deployed as an application-layer overlay network over existing Internet protocols (e.g., HTTP/QUIC over UDP), similar to Content Delivery Networks (CDNs) like Akamai or CloudFlare. It requires no infrastructure modifications and assumes trusted intermediate nodes under the same administrative domain, enabling immediate integration into today’s networks without network-layer changes. While COMETS shares MoQ’s vision of moving intelligence into the network, it avoids central bottlenecks by enabling per-hop optimization via ICN primitives, and is deployable over MoQ-capable infrastructures as an overlay.

Our Approach. COMETS transforms video streaming from isolated endpoint control into coordinated in-network negotiation, with four key contributions:

Range-interest protocol for coordinated adaptation. We introduce a novel protocol where clients express resolution ranges rather than specific quality levels. This enables forwarders to aggregate requests and optimize resolution assignments across user groups, shifting adaptation logic from endpoints to the network fabric.
Scalable architecture without central bottlenecks. COMETS distributes adaptation logic across forwarders, combining request aggregation with per-hop decision-making.
Distributed optimization with closed-form solutions. We formalize coordinated multi-destination video transmission as a unified Integer Linear Programming (ILP) problem and develop a two-stage distributed algorithm. Unlike prior ICN approaches that rely on heuristics or reactive congestion signals, our method derives analytical closed-form solutions for per-hop quality decisions, enabling proactive, group-aware rate allocation with provable convergence guarantees.
Implementation and Evaluation: Through extensive emulation on Mini-NDN with up to 300 concurrent clients, we demonstrate that COMETS achieves consistent QoE scores above 0.7 across all tested scales, while baselines degrade below 0.5 at high concurrency. COMETS maintains near-perfect fairness (Jain’s index ≥ 0.93) and achieves optimization convergence within 50ms—up to 3.7× faster than centralized approaches.

References

Yulong Zhang, Ying Cui, Zili Meng, Abhishek Kumar, Dirk Kutscher; COMETS: Coordinated Multi-Destination Video Transmission with In-Network Rate Adaptation; IEEE Transactions on Multimedia; 2026; pre-print: https://arxiv.org/abs/2601.18670

Written by dkutscher

January 28th, 2026 at 4:57 am

Posted in Publications

Tagged with , , , ,

Report from INET4AI Workshop at CoNEXT-2025

without comments

Organizers

  • Antoine Fressancourt
  • Dirk Kutscher

The 1st workshop on Inter-networking challenges for AI (INet4AI), collocated with ACM CoNEXt'25, was held on the 1st of December 2025 in Hong-Kong. The workshop was inspired by ongoing discussion in the IRTF on research challenges for (inter-)networking technologies for AI workloads.

This full day workshop explored some of the networking challenges of large-scale distributed AI workloads in environments, characterized by node and network heterogeneity, as well as dynamically changing resource availability and utilization. During this inaugural edition, researchers from both academia (HKUST, ETH Zurich, Politecnico di Milano, University of Napoli, Tsinghua University, TU Munich) and industry (Huawei, AMD, Microsoft, and others) discussed possible solutions to address the challenges raised by Internet-scale distributed AI systems with four workshop paper presentations and three invited talks. In this report, we will first give a summary of the workshop papers and invited talks. Then we will draw some general remarks regarding the ongoing efforts done in our community to address INet4AI challenges.

Check out the full report.

Program Overview

  • Invited talk — Tommaso Bonato — Uno: A One-Stop Solution for Inter- and Intra-Datacenter Congestion Control and Reliable ConnectivityPaper · Slides.
  • AI4Net paper — Shaked Leibzirer — Self-supervised Application-level Network Traffic InversionPaper · Slides.
  • Net4AI paper — German Sviridov — Latency-Optimal Load Balancing For Distributed MoE InferencePaper.
  • Invited talk — Mingxing Zhang — From Homogeneous to Disaggregated Architectures for Large Model InferenceSlides.
  • Net4AI paper — Jiaheng Xiong — SCALE-CCL: A Scalable Collective Communication Library for Wide-Area Distributed TrainingPaper · Slides.
  • Net4AI paper — Giuseppe Aceto — You’ve got a few GPUs, now what?! — Experimenting with a Nano-Cluster for Distributed Training of AI ModelsPaper · Slides.
  • Invited talk — Wenjia Wei — Debriefing the Open Innovation Platform for UnifiedBusSlides.

Written by dkutscher

December 24th, 2025 at 11:42 am

Nominations for ANRP-2026

without comments

Submit nominations for the 2026 award period of the Applied Networking Research Prize until November 17, 2025: https://www.irtf.org/anrp/



The Applied Networking Research Prize (ANRP) is awarded to recognise the best recent results in applied networking, interesting new research ideas of potential relevance to the Internet standards community, and upcoming people that are likely to have an impact on Internet standards and technologies, with a particular focus on cases where these people or ideas would not otherwise get much exposure or be able to participate in the discussion.

We encourage nominations of researchers with relevant research results, interesting ideas, and new perspectives. The award will offer them the opportunity to present and discuss their work with the engineers, network operators, policy makers, and scientists that participate in the Internet Engineering Task Force (IETF) and its research arm, the Internet Research Task Force (IRTF). Both self- and third-party nominations for this prize are encouraged.

Written by dkutscher

October 1st, 2025 at 7:42 pm

Posted in IRTF

Invited Talk at FNDC: Connecting AI: Inter-Networking Challenges for Distributed Machine Learning

without comments

I gave a talk at the Future Network Development Conference (FNDC) in Nanjing on August 20th, 2025. The title of the talk was Connecting AI: Inter-Networking Challenges for Distributed Machine Learning, and I talked about our recent work on PacTrain, NetSenseML, and some new work on in-network aggregation.

PacTrain is a novel framework that accelerates distributed training by combining pruning with sparse gradient compression. Active pruning of the neural network makes the model weights and gradients sparse. By ensuring the global knowledge of the gradient sparsity among all distributed training workers, we can perform lightweight compression communication without harming accuracy. We show that the PacTrain compression scheme achieves a near-optimal compression strategy while remaining compatible with the all- reduce primitive. Experimental evaluations show that PacTrain improves training throughput by 1.25 to 8.72× compared to state-of-the-art compression-enabled systems for representative vision and language models training tasks under bandwidth-constrained conditions.

NetSenseML is a novel network adaptive distributed deep learning framework that dynamically adjusts quantization, pruning, and compression strategies in response to real-time network conditions. By actively monitoring network conditions, NetSenseML applies gradient compression only when network congestion negatively impacts convergence speed, thus effectively balancing data payload reduction and model accuracy preservation. Our approach ensures efficient resource usage by adapting reduction techniques based on current network conditions, leading to shorter convergence times and improved training efficiency. Experimental evaluations show that NetSenseML can improve training throughput by a factor of 1.55x to 9.84x compared to state-of-the-art compression-enabled systems for representative DDL training jobs in bandwidth-constrained conditions.

Written by dkutscher

August 21st, 2025 at 5:59 am

INDS Accepted at ACM Multimedia

without comments

Our paper on INDS: Incremental Named Data Streaming for Real-Time Point Cloud Video has been accepted at ACM Multimedia 2025.

Abstract:

Real-time streaming of point cloud video – characterized by high data volumes and extreme sensitivity to packet loss – presents significant challenges under dynamic network conditions. Traditional connection-oriented protocols such as TCP/IP incur substantial retransmission overhead and head-of-line blocking under lossy conditions, while reactive adaptation approaches such as DASH lead to frequent quality fluctuations and a suboptimal user experience. In this paper, we introduce INDS (Incremental Named Data Streaming), a novel adaptive transmission framework that exploits the inherent layered encoding and hierarchical object structure of point cloud data to enable clients to selectively request enhancement layers based on available bandwidth and decoding capabilities. Built on Information-Centric Networking (ICN) principles, INDS employs a hierarchical naming scheme organized by time windows and Groups of Frames (GoF), which enhances cache reuse and facilitates efficient data sharing, ultimately reducing both network and server load. We implemented a fully functional prototype and evaluated it using emulated network scenarios. The experimental results demonstrate that INDS reduces end-to-end delay by up to 80%, boosts effective throughput by 15%–50% across diverse operating conditions, and increases cache hit rates by 20%–30% on average.

References

Ruonan Chai, Yixiang Zhu, Xinjiao Li, Jiawei Li, Zili Meng, Dirk Kutscher; INDS: Incremental Named Data Streaming for Real-Time Point Cloud Video; accepted for publication at ACM Multimedia 2025; October 2025

Written by dkutscher

July 7th, 2025 at 11:51 am

ACM CoNEXT-2025 Workshop on Inter-networking challenges for AI

without comments

Generative AI systems are approaching a scalability limit in their development. Due to power density issues, it will soon become infeasible to train large language models with an increasing number of parameters in a single datacenter. While the industry is actively pursuing an effort to scale up AI systems, it becomes necessary to explore the use of scaled-out, global distributed systems to train or serve generative AI models.

Besides, services based on generative AI ask for stringent quality of service levels to meet users demand. Meeting those requirements can be addressed by using systems mixing powerful computing instances residing in cloud platforms with localized edge platforms, using heterogeneous and distributed systems.

Those questions may find a solution in approaches adopted by federated learning systems, in which models are trained among several stakeholders. Yet, those systems also face scalability issues in dealing with models of a larger size.

The ACM CoNEXT INet4AI workshop aims at discussing the networking challenges raised by the distribution of generative AI workloads at a large scale. To that extend, we aim at receiving contributions from academic researchers, machine learning system developers or AI infrastructure providers. .

Submitted papers must be at most six (6) pages long, excluding references and appendices, in two-column 10pt ACM format. Authors of accepted submissions are expected to present and discuss their work at the workshop. All submissions will be peer-reviewed, and the review process will be double-blind. Per the anonymity guidelines, please prepare your paper in a way that preserves the anonymity of the authors. No information will be shared with third parties.

Please submit your paper using the INET4AI Submission Portal: https://inet4ai25.hotcrp.com.

Written by dkutscher

July 3rd, 2025 at 2:58 pm

AdaptQNet accepted at MobiCom

without comments

Our paper on AdaptQNet: Optimizing Quantized DNN on Microcontrollers via Adaptive Heterogeneous Processing Unit Utilization has been accepted at ACM MobiCom-2025.

Abstract

There is a growing trend in deploying DNNs on tiny micro-controller (MCUs) to provide inference capabilities in the IoT. While prior research has explored many lightweight techniques to compress DNN models, achieving overall efficiency in model inference requires not only model optimization but also careful system resource utilization for execution. Existing studies primarily leverage arithmetic logic units (ALUs) for integer-only computations on a single CPU core. Floating-point units (FPU) and multi-core capabilities available in many existing MCUs remain underutilized.

To fill this gap, we propose AdaptQNet, a novel MCU neural network system that can determine the optimal precision assignment for different layers of a DNN model. AdaptQNet models the latency of various operators in DNN models across different precisions on heterogeneous processing units. This facilitates the discovery of models that utilize FPU and multi-core capabilities to enhance capacity while adhering to stringent memory constraints. Our implementation and experiments demonstrate that AdaptQNet enables the deployment of models with better accuracy-efficiency trade-off on MCUs.

References

Yansong Sun, Jialuo He, Dirk Kutscher, Huangxun CHEN; AdaptQNet: Optimizing Quantized DNN on Microcontrollers via Adaptive Heterogeneous Processing Unit Utilization; The 31st Annual International Conference On Mobile Computing And Networking (MobiCom 2025)

Written by dkutscher

June 22nd, 2025 at 8:17 pm

Posted in Publications

Tagged with , , , , ,

IETF and IRTF Deep-Drive Training in Beijing

without comments

I gave a talk about the Internet Research Task Force (IRTF) at an IETF Standards Culture and Process Deep-Dive Training that took place in Beijing on May 8th, 2025. The training was hosted by the China Internet Network Information Center (CNNIC). My talk explained what the IRTF is, how it works, and how to best contribute to its work.

Resources

Written by dkutscher

May 9th, 2025 at 6:57 am

Posted in IRTF

Tagged with , , ,

NetSenseML accepted at Euro-Par

without comments

Our paper on NetSenseML: Network-Adaptive Compression for
Efficient Distributed Machine Learning
has been accepted at the 31st International European on Parallel and Distributed Computing (Euro-Par-2025).

Abstract:
Training large-scale distributed machine learning models imposes considerable demands on network infrastructure, often resulting in sudden traffic spikes that lead to congestion, increased latency, and reduced throughput, which would ultimately affect convergence times and overall training performance. While gradient compression techniques are commonly employed to alleviate network load, they frequently compromise model accuracy due to the loss of gradient information.

This paper introduces NetSenseML, a novel network adaptive distributed deep learning framework that dynamically adjusts quantization, pruning, and compression strategies in response to real-time network conditions. By actively monitoring network conditions, NetSenseML applies gradient compression only when network congestion negatively impacts convergence speed, thus effectively balancing data payload reduction and model accuracy preservation.

Our approach ensures efficient resource usage by adapting reduction techniques based on current network conditions, leading to shorter convergence times and improved training efficiency. We present the design of the NetSenseML adaptive data reduction function and experimental evaluations show that NetSenseML can improve training throughput by a factor of 1.55x to 9.84x compared to state-of-the-art compression-enabled systems for representative DDL training jobs in bandwidth-constrained conditions.

References

Yisu Wang, Xinjiao Li, Ruilong Wu, Huangxun Chen, Dirk Kutscher; NetSenseML: Network-Adaptive Compression for Efficient Distributed Machine Learning; 31st International European on Parallel and Distributed Computing (Euro-Par-2025); August 2025; Preprint, Euro-Par-2025 Proceedings

Trochilus accepted at USENIX ATC

without comments

Our paper on Trochilus, titled Learning-Enhanced High-Throughput Pattern Matching Based on Programmable Data Plane has been accepted at USENIX ATC-2025. This is joint work with Qing LI's group at Peng Cheng Lab, and the first author is Guanglin DUAN.

Abstract:
Pattern matching is critical in various network security applications. However, existing pattern matching solutions struggle to maintain high throughput and low cost in the face of growing network traffic and increasingly complex patterns. Besides, managing and updating these systems is labor intensive, requiring expert intervention to adapt to new patterns and threats. In this paper, we propose Trochilus, a novel framework that enables high-throughput and accurate pattern matching directly on programmable data planes, making it highly relevant to modern large-scale network systems. Trochilus innovated by combining the learning ability of model inference with the high-throughput and cost-effective advantages of data plane processing. It leverages a byte-level recurrent neural network (BRNN) to model complex patterns, preserving expert knowledge while enabling automated updates for sustained accuracy. To address the challenge of limited labeled data, Trochilus proposes a semi-supervised knowledge distillation (SSKD) mechanism, converting the BRNN into a lightweight, data-plane-friendly soft multi-view forest (SMF), which can be efficiently deployed as match-action tables. Trochilus minimizes the need for expensive TCAM through a novel entry cluster algorithm, making it scalable to large network environments. Our evaluations show that Trochilus achieves multi-Tbps throughput, supports various pattern sets, and maintains high accuracy through automatic updates.

References

  • Guanglin Duan, Yucheng Huang, Zhengxin Zhang, Qing Li, Dan Zhao, Zili Meng, Dirk Kutscher, Ruoyu Li, Yong Jiang, and Mingwei Xu. Learning-Enhanced High-Throughput Pattern Matching Based on Programmable Data Plane. Usenix ATC 2025. accepted for publication
  • Extended Summary by Peng Cheng Lab

Written by dkutscher

April 27th, 2025 at 5:50 am