Dirk Kutscher

Personal web page

Archive for the ‘Publications’ Category

COMETS accepted at IEEE TMM

without comments

Our paper on COMETS: Coordinated Multi-Destination Video Transmission with In-Network Rate Adaptation has been accepted for publication by IEEE Transactions on Multimedia (TMM)

Abstract

Large-scale video streaming events attract millions of simultaneous viewers, stressing existing delivery infrastructures. Client-driven adaptation reacts slowly to shared congestion, while server-based coordination introduces scalability bottlenecks and single points of failure. We present COMETS, a coordinated multi-destination video transmission framework that leverages information-centric networking principles such as request aggregation and in-network state awareness to enable scalable, fair, and adaptive rate control. COMETS introduces a novel range-interest protocol and distributed in-network decision process that aligns video quality across receiver groups while minimizing redundant transmissions. To achieve this, we develop a lightweight distributed optimization framework that guides per-hop quality adaptation without centralized control. Extensive emulation shows that COMETS consistently improves bandwidth utilization, fairness, and user-perceived quality of experience over DASH, MoQ, and ICN baselines, particularly under high concurrency. The results highlight COMETS as a practical, deployable approach for next-generation scalable video delivery.

Introduction to COMETS

Nowadays, large streaming events typically attract millions of viewers, and the demand for concurrent video consumption is also expanding dramatically. For example, the number of monthly sports streaming viewers have grown from 57 million in 2021 to more than 90 million in 2025, with more than 17% users participating in multiple streams simultaneously. This explosive growth exposes fundamental limitations in existing video delivery architectures: how to maintain consistent, fair Quality of Experience (QoE) when thousands of users compete for shared bottleneck resources.

Existing infrastructures are not designed for effective coordination and resource sharing among large numbers of simultaneous viewers, resulting in inefficient management of concurrent requests for the same content segments and insufficient coordination of network resource allocation among users of the shared infrastructure. These inefficiencies lead to redundant data transmission and suboptimal bandwidth utilization, ultimately impairing user QoE by increasing network congestion, unstable bitrates, and higher incidences of buffering, especially during peak usage scenarios. To address these challenges, an ideal video delivery system must possess coordinated, scalable, and adaptive capabilities to maximize bandwidth utilization while ensuring a fair, high-quality experience for all users. Such a system should aggregate requests for the same content to eliminate redundancy, make intelligent in-network decisions and distribute computational load to avoid bottlenecks.

a) Latency vs. User Load b) Mean Bitrate vs. User Load

Figure 1: Performance Comparison between baseline MoQ and server-optimized MoQ under increasing user load.

Current solutions exhibit fundamental limitations with respect to coordination and scalability. Client-adaptive approaches like Dynamic Adaptive Streaming over HTTP (DASH) enable individual clients to select video representations independently. However, their uncoordinated decisions, based on delayed and localized network views, lag behind the actual state of shared network bottlenecks, leading to bandwidth contention and bitrate oscillations. Server-side approaches address these limitations by centralizing adaptation logic, enabling optimal resource allocation through comprehensive network and user demand assessments. However, managing state and control interactions for numerous users introduces scalability challenges, and centralized decision architectures create single points of failure that compromise real-time performance. Our experiments (Figure 1) demonstrate that even state-of-the-art server-optimized Media over QUIC (MoQ) ultimately encounters the same scalability barriers as baseline approaches under high concurrency.

Key Insights

We observe that effective multi-user video streaming requires two properties: I). aggregation-aware delivery, where identical requests are merged to eliminate redundant transmissions, and II). distributed coordination, where adaptation decisions are made at points of request convergence rather than at centralized endpoints. This leads us to consider Information-Centric Networking (ICN). ICN provides inherent advantages for multi-user content distribution through in-network caching and request aggregation in systems like CCNx/NDN. While these features reduce redundant transmissions by merging duplicate requests at forwarders, existing ICN-based solutions focus on hop-by-hop adaptation rather than coordinated multi-user rate adaptation, suffering from decision lag and failing to ensure efficient convergence toward stable, fair rate allocations (i.e., equitable QoE distribution). To address these limitations, we present COMETS (Coordinated Multi-Destination Video Transmission with In-Network Rate Adaptation), a scalable, ICN-based multi-destination video streaming framework engineered to resolve challenges in large-scale video delivery: redundant data transmission, lack of scalable coordination, and inefficient system convergence.

Design Philosophy

COMETS is based on three principles that distinguish it from prior work: I). Group-aware rather than individual optimization. Instead of each client independently selecting bitrates, COMETS groups receivers with similar capabilities and network conditions, then aligns video quality across each group. This transforms the combinatorial complexity of individual decisions into tractable group-level optimization. II). Proactive rather than reactive adaptation. Unlike existing ICN approaches that react to congestion signals, COMETS uses a distributed Lagrangian framework where forwarders exchange dual variables (price signals) to anticipate upstream constraints. This enables proactive coordination without centralized state collection. III). Deployable overlay architecture. COMETS requires no modifications to network infrastructure. To ensure deployability, COMETS is architecturally flexible and can be deployed as an application-layer overlay network over existing Internet protocols (e.g., HTTP/QUIC over UDP), similar to Content Delivery Networks (CDNs) like Akamai or CloudFlare. It requires no infrastructure modifications and assumes trusted intermediate nodes under the same administrative domain, enabling immediate integration into today’s networks without network-layer changes. While COMETS shares MoQ’s vision of moving intelligence into the network, it avoids central bottlenecks by enabling per-hop optimization via ICN primitives, and is deployable over MoQ-capable infrastructures as an overlay.

Our Approach. COMETS transforms video streaming from isolated endpoint control into coordinated in-network negotiation, with four key contributions:

Range-interest protocol for coordinated adaptation. We introduce a novel protocol where clients express resolution ranges rather than specific quality levels. This enables forwarders to aggregate requests and optimize resolution assignments across user groups, shifting adaptation logic from endpoints to the network fabric.
Scalable architecture without central bottlenecks. COMETS distributes adaptation logic across forwarders, combining request aggregation with per-hop decision-making.
Distributed optimization with closed-form solutions. We formalize coordinated multi-destination video transmission as a unified Integer Linear Programming (ILP) problem and develop a two-stage distributed algorithm. Unlike prior ICN approaches that rely on heuristics or reactive congestion signals, our method derives analytical closed-form solutions for per-hop quality decisions, enabling proactive, group-aware rate allocation with provable convergence guarantees.
Implementation and Evaluation: Through extensive emulation on Mini-NDN with up to 300 concurrent clients, we demonstrate that COMETS achieves consistent QoE scores above 0.7 across all tested scales, while baselines degrade below 0.5 at high concurrency. COMETS maintains near-perfect fairness (Jain’s index ≥ 0.93) and achieves optimization convergence within 50ms—up to 3.7× faster than centralized approaches.

References

Yulong Zhang, Ying Cui, Zili Meng, Abhishek Kumar, Dirk Kutscher; COMETS: Coordinated Multi-Destination Video Transmission with In-Network Rate Adaptation; IEEE Transactions on Multimedia; 2026; pre-print: https://arxiv.org/abs/2601.18670

Written by dkutscher

January 28th, 2026 at 4:57 am

Posted in Publications

Tagged with , , , ,

Invited Talk at FNDC: Connecting AI: Inter-Networking Challenges for Distributed Machine Learning

without comments

I gave a talk at the Future Network Development Conference (FNDC) in Nanjing on August 20th, 2025. The title of the talk was Connecting AI: Inter-Networking Challenges for Distributed Machine Learning, and I talked about our recent work on PacTrain, NetSenseML, and some new work on in-network aggregation.

PacTrain is a novel framework that accelerates distributed training by combining pruning with sparse gradient compression. Active pruning of the neural network makes the model weights and gradients sparse. By ensuring the global knowledge of the gradient sparsity among all distributed training workers, we can perform lightweight compression communication without harming accuracy. We show that the PacTrain compression scheme achieves a near-optimal compression strategy while remaining compatible with the all- reduce primitive. Experimental evaluations show that PacTrain improves training throughput by 1.25 to 8.72× compared to state-of-the-art compression-enabled systems for representative vision and language models training tasks under bandwidth-constrained conditions.

NetSenseML is a novel network adaptive distributed deep learning framework that dynamically adjusts quantization, pruning, and compression strategies in response to real-time network conditions. By actively monitoring network conditions, NetSenseML applies gradient compression only when network congestion negatively impacts convergence speed, thus effectively balancing data payload reduction and model accuracy preservation. Our approach ensures efficient resource usage by adapting reduction techniques based on current network conditions, leading to shorter convergence times and improved training efficiency. Experimental evaluations show that NetSenseML can improve training throughput by a factor of 1.55x to 9.84x compared to state-of-the-art compression-enabled systems for representative DDL training jobs in bandwidth-constrained conditions.

Written by dkutscher

August 21st, 2025 at 5:59 am

INDS Accepted at ACM Multimedia

without comments

Our paper on INDS: Incremental Named Data Streaming for Real-Time Point Cloud Video has been accepted at ACM Multimedia 2025.

Abstract:

Real-time streaming of point cloud video – characterized by high data volumes and extreme sensitivity to packet loss – presents significant challenges under dynamic network conditions. Traditional connection-oriented protocols such as TCP/IP incur substantial retransmission overhead and head-of-line blocking under lossy conditions, while reactive adaptation approaches such as DASH lead to frequent quality fluctuations and a suboptimal user experience. In this paper, we introduce INDS (Incremental Named Data Streaming), a novel adaptive transmission framework that exploits the inherent layered encoding and hierarchical object structure of point cloud data to enable clients to selectively request enhancement layers based on available bandwidth and decoding capabilities. Built on Information-Centric Networking (ICN) principles, INDS employs a hierarchical naming scheme organized by time windows and Groups of Frames (GoF), which enhances cache reuse and facilitates efficient data sharing, ultimately reducing both network and server load. We implemented a fully functional prototype and evaluated it using emulated network scenarios. The experimental results demonstrate that INDS reduces end-to-end delay by up to 80%, boosts effective throughput by 15%–50% across diverse operating conditions, and increases cache hit rates by 20%–30% on average.

References

Ruonan Chai, Yixiang Zhu, Xinjiao Li, Jiawei Li, Zili Meng, Dirk Kutscher; INDS: Incremental Named Data Streaming for Real-Time Point Cloud Video; accepted for publication at ACM Multimedia 2025; October 2025

Written by dkutscher

July 7th, 2025 at 11:51 am

AdaptQNet accepted at MobiCom

without comments

Our paper on AdaptQNet: Optimizing Quantized DNN on Microcontrollers via Adaptive Heterogeneous Processing Unit Utilization has been accepted at ACM MobiCom-2025.

Abstract

There is a growing trend in deploying DNNs on tiny micro-controller (MCUs) to provide inference capabilities in the IoT. While prior research has explored many lightweight techniques to compress DNN models, achieving overall efficiency in model inference requires not only model optimization but also careful system resource utilization for execution. Existing studies primarily leverage arithmetic logic units (ALUs) for integer-only computations on a single CPU core. Floating-point units (FPU) and multi-core capabilities available in many existing MCUs remain underutilized.

To fill this gap, we propose AdaptQNet, a novel MCU neural network system that can determine the optimal precision assignment for different layers of a DNN model. AdaptQNet models the latency of various operators in DNN models across different precisions on heterogeneous processing units. This facilitates the discovery of models that utilize FPU and multi-core capabilities to enhance capacity while adhering to stringent memory constraints. Our implementation and experiments demonstrate that AdaptQNet enables the deployment of models with better accuracy-efficiency trade-off on MCUs.

References

Yansong Sun, Jialuo He, Dirk Kutscher, Huangxun CHEN; AdaptQNet: Optimizing Quantized DNN on Microcontrollers via Adaptive Heterogeneous Processing Unit Utilization; The 31st Annual International Conference On Mobile Computing And Networking (MobiCom 2025)

Written by dkutscher

June 22nd, 2025 at 8:17 pm

Posted in Publications

Tagged with , , , , ,

NetSenseML accepted at Euro-Par

without comments

Our paper on NetSenseML: Network-Adaptive Compression for
Efficient Distributed Machine Learning
has been accepted at the 31st International European on Parallel and Distributed Computing (Euro-Par-2025).

Abstract:
Training large-scale distributed machine learning models imposes considerable demands on network infrastructure, often resulting in sudden traffic spikes that lead to congestion, increased latency, and reduced throughput, which would ultimately affect convergence times and overall training performance. While gradient compression techniques are commonly employed to alleviate network load, they frequently compromise model accuracy due to the loss of gradient information.

This paper introduces NetSenseML, a novel network adaptive distributed deep learning framework that dynamically adjusts quantization, pruning, and compression strategies in response to real-time network conditions. By actively monitoring network conditions, NetSenseML applies gradient compression only when network congestion negatively impacts convergence speed, thus effectively balancing data payload reduction and model accuracy preservation.

Our approach ensures efficient resource usage by adapting reduction techniques based on current network conditions, leading to shorter convergence times and improved training efficiency. We present the design of the NetSenseML adaptive data reduction function and experimental evaluations show that NetSenseML can improve training throughput by a factor of 1.55x to 9.84x compared to state-of-the-art compression-enabled systems for representative DDL training jobs in bandwidth-constrained conditions.

References

Yisu Wang, Xinjiao Li, Ruilong Wu, Huangxun Chen, Dirk Kutscher; NetSenseML: Network-Adaptive Compression for Efficient Distributed Machine Learning; 31st International European on Parallel and Distributed Computing (Euro-Par-2025); August 2025; Preprint, Euro-Par-2025 Proceedings

Trochilus accepted at USENIX ATC

without comments

Our paper on Trochilus, titled Learning-Enhanced High-Throughput Pattern Matching Based on Programmable Data Plane has been accepted at USENIX ATC-2025. This is joint work with Qing LI's group at Peng Cheng Lab, and the first author is Guanglin DUAN.

Abstract:
Pattern matching is critical in various network security applications. However, existing pattern matching solutions struggle to maintain high throughput and low cost in the face of growing network traffic and increasingly complex patterns. Besides, managing and updating these systems is labor intensive, requiring expert intervention to adapt to new patterns and threats. In this paper, we propose Trochilus, a novel framework that enables high-throughput and accurate pattern matching directly on programmable data planes, making it highly relevant to modern large-scale network systems. Trochilus innovated by combining the learning ability of model inference with the high-throughput and cost-effective advantages of data plane processing. It leverages a byte-level recurrent neural network (BRNN) to model complex patterns, preserving expert knowledge while enabling automated updates for sustained accuracy. To address the challenge of limited labeled data, Trochilus proposes a semi-supervised knowledge distillation (SSKD) mechanism, converting the BRNN into a lightweight, data-plane-friendly soft multi-view forest (SMF), which can be efficiently deployed as match-action tables. Trochilus minimizes the need for expensive TCAM through a novel entry cluster algorithm, making it scalable to large network environments. Our evaluations show that Trochilus achieves multi-Tbps throughput, supports various pattern sets, and maintains high accuracy through automatic updates.

References

  • Guanglin Duan, Yucheng Huang, Zhengxin Zhang, Qing Li, Dan Zhao, Zili Meng, Dirk Kutscher, Ruoyu Li, Yong Jiang, and Mingwei Xu. Learning-Enhanced High-Throughput Pattern Matching Based on Programmable Data Plane. Usenix ATC 2025. accepted for publication
  • Extended Summary by Peng Cheng Lab

Written by dkutscher

April 27th, 2025 at 5:50 am

Rethinking Dynamic Networks and Heterogeneous Computing with Automatic Parallelization accepted at ACM APNET

without comments

Our paper on Rethinking Dynamic Networks and Heterogeneous Computing with Automatic Parallelization has been accepted by the 9th Asia-Pacific Workshop on Networking (APNET'25).

Abstract:
Hybrid parallelism techniques are crucial for the efficient training of large language models (LLMs). However, these techniques often introduce differentiated computational and communication tasks across nodes. Existing automatic parallel planning frameworks typically fail to consider both node heterogeneity and dynamic changes in network topology simultaneously, limiting their practical performance. In this paper, we address this issue by positioning heterogeneous nodes within dynamic network environments and employing a simulator to identify optimal parallel strategies. Our approach achieves fine-grained workload distribution in scenarios featuring node heterogeneity and complex networks, while also matching state-of-the-art performance in regular topologies and stable network conditions. Moreover, to mitigate the excessively long search times caused by large search spaces in existing frameworks, we propose a strategy pruning technique to rapidly eliminate infeasible parallel configurations. We further accelerate the search process by executing search tasks in parallel within the simulator. Preliminary evaluation results demonstrate that our method significantly improves training performance on heterogeneous nodes, and the proposed dynamic network design offers enhanced adaptability for complex scenarios such as cloud computing environments.

References

Ruilong Wu, Xinjiao Li, Yisu Wang, Xinyu Chen, Dirk Kutscher; Rethinking Dynamic Networks and Heterogeneous Computing with Automatic Parallelization; The 9th Asia-Pacific Workshop on Networking (APNET'25); August 2025; doi/10.1145/3735358.3735382

Written by dkutscher

April 24th, 2025 at 8:21 am

ViFusion accepted at ACM ICMR

without comments

Our paper on ViFusion: In-Network Tensor Fusion for Scalable Video Feature Indexing has been accepted at the ACM International Conference on Multimedia Retrieval 2025 (CCF-B).

Abstract:
Large-scale video feature indexing in datacenters is critically dependent on efficient data transfer. Although in-network computation has emerged as a compelling strategy for accelerating feature extraction and reducing overhead in distributed multimedia systems, harnessing advanced networking resources at both the switch and host levels remains a formidable challenge. These difficulties are compounded by heterogeneous hardware, diverse application requirements, and complex multipath topologies. Existing methods focus primarily on optimizing inference for large neural network models using specialized collective communication libraries, which often face performance degradation in network congestion scenarios.

To overcome these limitations, we present ViFusion, a communication aware tensor fusion framework that streamlines distributed video indexing by merging numerous small feature tensors into consolidated and more manageable units. By integrating an in-network computation module and a dedicated tensor fusion mechanism within datacenter environments, ViFusion substantially improves the efficiency of video feature indexing workflows. The deployment results show that ViFusion improves the throughput of the video retrieval system by 8–22x with the same level of latency as state-of-the-art systems.

Stay tuned for the pre-print.

References

Yisu Wang, Yixiang Zhu, Dirk Kutscher; ViFusion: In-Network Tensor Fusion for Scalable Video Feature Indexing; The 15th ACM International Conference on Multimedia Retrieval; June 2025; Preprint

Written by dkutscher

April 22nd, 2025 at 3:25 pm

Networked Metaverse Systems: Among the Most popular paper IEEE OJCOMS Paper 2024 – 2025

without comments

Our 2024 paper on Networked Metaverse Systems: Foundations, Gaps, Research Directions has been mentioned as one most popular and impactful papers of the IEEE Open Journal of the Communications Society (OJCOMS) 2024–2025.

References

Written by dkutscher

April 7th, 2025 at 8:33 am

Report Published: Greening Networking: Toward a Net Zero Internet (Dagstuhl Seminar 24402)

without comments

We have published the report of the Dagstuhl Seminar 24402 on Greening Networking: Toward a Net Zero Internet that took place from September 29th to October 2nd 2024. The seminar discussed the most impactful networking improvements for reducing carbon emissions in three different areas: 1) applications, systems, and stakeholders; 2) network technologies; and 3) lifecycle and control loops. As a major result of the seminar, the following problems and topics for future research were identified: 1) characterizing the Internet footprint on carbon emissions accurately; 2) understanding attributional and consequential accounting of carbon emissions in networked systems; and 3) identifying potential solutions to give network systems more flexibility in better supporting energy grids and connecting to renewable energy sources. One of the concrete results of this seminar is a list of technologies and research opportunities for which we estimated the potential impact and time horizons.

References

Written by dkutscher

April 7th, 2025 at 7:56 am