INDS Accepted at ACM Multimedia
Our paper on INDS: Incremental Named Data Streaming for Real-Time Point Cloud Video has been accepted at ACM Multimedia 2025.
Abstract:
Real-time streaming of point cloud video – characterized by high data volumes and extreme sensitivity to packet loss – presents significant challenges under dynamic network conditions. Traditional connection-oriented protocols such as TCP/IP incur substantial retransmission overhead and head-of-line blocking under lossy conditions, while reactive adaptation approaches such as DASH lead to frequent quality fluctuations and a suboptimal user experience. In this paper, we introduce INDS (Incremental Named Data Streaming), a novel adaptive transmission framework that exploits the inherent layered encoding and hierarchical object structure of point cloud data to enable clients to selectively request enhancement layers based on available bandwidth and decoding capabilities. Built on Information-Centric Networking (ICN) principles, INDS employs a hierarchical naming scheme organized by time windows and Groups of Frames (GoF), which enhances cache reuse and facilitates efficient data sharing, ultimately reducing both network and server load. We implemented a fully functional prototype and evaluated it using emulated network scenarios. The experimental results demonstrate that INDS reduces end-to-end delay by up to 80%, boosts effective throughput by 15%–50% across diverse operating conditions, and increases cache hit rates by 20%–30% on average.
References
Ruonan Chai, Yixiang Zhu, Xinjiao Li, Jiawei Li, Zili Meng, Dirk Kutscher; INDS: Incremental Named Data Streaming for Real-Time Point Cloud Video; accepted for publication at ACM Multimedia 2025; October 2025
ACM CoNEXT-2025 Workshop on Inter-networking challenges for AI
Generative AI systems are approaching a scalability limit in their development. Due to power density issues, it will soon become infeasible to train large language models with an increasing number of parameters in a single datacenter. While the industry is actively pursuing an effort to scale up AI systems, it becomes necessary to explore the use of scaled-out, global distributed systems to train or serve generative AI models.
Besides, services based on generative AI ask for stringent quality of service levels to meet users demand. Meeting those requirements can be addressed by using systems mixing powerful computing instances residing in cloud platforms with localized edge platforms, using heterogeneous and distributed systems.
Those questions may find a solution in approaches adopted by federated learning systems, in which models are trained among several stakeholders. Yet, those systems also face scalability issues in dealing with models of a larger size.
The ACM CoNEXT INet4AI workshop aims at discussing the networking challenges raised by the distribution of generative AI workloads at a large scale. To that extend, we aim at receiving contributions from academic researchers, machine learning system developers or AI infrastructure providers. .
Submitted papers must be at most six (6) pages long, excluding references and appendices, in two-column 10pt ACM format. Authors of accepted submissions are expected to present and discuss their work at the workshop. All submissions will be peer-reviewed, and the review process will be double-blind. Per the anonymity guidelines, please prepare your paper in a way that preserves the anonymity of the authors. No information will be shared with third parties.
Please submit your paper using the INET4AI Submission Portal: https://inet4ai25.hotcrp.com.
AdaptQNet accepted at MobiCom
Our paper on AdaptQNet: Optimizing Quantized DNN on Microcontrollers via Adaptive Heterogeneous Processing Unit Utilization has been accepted at ACM MobiCom-2025.
Abstract
There is a growing trend in deploying DNNs on tiny micro-controller (MCUs) to provide inference capabilities in the IoT. While prior research has explored many lightweight techniques to compress DNN models, achieving overall efficiency in model inference requires not only model optimization but also careful system resource utilization for execution. Existing studies primarily leverage arithmetic logic units (ALUs) for integer-only computations on a single CPU core. Floating-point units (FPU) and multi-core capabilities available in many existing MCUs remain underutilized.
To fill this gap, we propose AdaptQNet, a novel MCU neural network system that can determine the optimal precision assignment for different layers of a DNN model. AdaptQNet models the latency of various operators in DNN models across different precisions on heterogeneous processing units. This facilitates the discovery of models that utilize FPU and multi-core capabilities to enhance capacity while adhering to stringent memory constraints. Our implementation and experiments demonstrate that AdaptQNet enables the deployment of models with better accuracy-efficiency trade-off on MCUs.
References
Yansong Sun, Jialuo He, Dirk Kutscher, Huangxun CHEN; AdaptQNet: Optimizing Quantized DNN on Microcontrollers via Adaptive Heterogeneous Processing Unit Utilization; The 31st Annual International Conference On Mobile Computing And Networking (MobiCom 2025)
IETF and IRTF Deep-Drive Training in Beijing
I gave a talk about the Internet Research Task Force (IRTF) at an IETF Standards Culture and Process Deep-Dive Training that took place in Beijing on May 8th, 2025. The training was hosted by the China Internet Network Information Center (CNNIC). My talk explained what the IRTF is, how it works, and how to best contribute to its work.
Resources
NetSenseML accepted at Euro-Par
Our paper on NetSenseML: Network-Adaptive Compression for
Efficient Distributed Machine Learning has been accepted at the 31st International European on Parallel and Distributed Computing (Euro-Par-2025).
Abstract:
Training large-scale distributed machine learning models imposes considerable demands on network infrastructure, often resulting in sudden traffic spikes that lead to congestion, increased latency, and reduced throughput, which would ultimately affect convergence times and overall training performance. While gradient compression techniques are commonly employed to alleviate network load, they frequently compromise model accuracy due to the loss of gradient information.
This paper introduces NetSenseML, a novel network adaptive distributed deep learning framework that dynamically adjusts quantization, pruning, and compression strategies in response to real-time network conditions. By actively monitoring network conditions, NetSenseML applies gradient compression only when network congestion negatively impacts convergence speed, thus effectively balancing data payload reduction and model accuracy preservation.
Our approach ensures efficient resource usage by adapting reduction techniques based on current network conditions, leading to shorter convergence times and improved training efficiency. We present the design of the NetSenseML adaptive data reduction function and experimental evaluations show that NetSenseML can improve training throughput by a factor of 1.55x to 9.84x compared to state-of-the-art compression-enabled systems for representative DDL training jobs in bandwidth-constrained conditions.
References
Yisu Wang, Xinjiao Li, Ruilong Wu, Huangxun Chen, Dirk Kutscher; NetSenseML: Network-Adaptive Compression for Efficient Distributed Machine Learning; 31st International European on Parallel and Distributed Computing (Euro-Par-2025); August 2025; Preprint
Trochilus accepted at USENIX ATC
Our paper on Trochilus, titled Learning-Enhanced High-Throughput Pattern Matching Based on Programmable Data Plane has been accepted at USENIX ATC-2025. This is joint work with Qing LI's group at Peng Cheng Lab, and the first author is Guanglin DUAN.
Abstract:
Pattern matching is critical in various network security applications. However, existing pattern matching solutions struggle to maintain high throughput and low cost in the face of growing network traffic and increasingly complex patterns. Besides, managing and updating these systems is labor intensive, requiring expert intervention to adapt to new patterns and threats. In this paper, we propose Trochilus, a novel framework that enables high-throughput and accurate pattern matching directly on programmable data planes, making it highly relevant to modern large-scale network systems. Trochilus innovated by combining the learning ability of model inference with the high-throughput and cost-effective advantages of data plane processing. It leverages a byte-level recurrent neural network (BRNN) to model complex patterns, preserving expert knowledge while enabling automated updates for sustained accuracy. To address the challenge of limited labeled data, Trochilus proposes a semi-supervised knowledge distillation (SSKD) mechanism, converting the BRNN into a lightweight, data-plane-friendly soft multi-view forest (SMF), which can be efficiently deployed as match-action tables. Trochilus minimizes the need for expensive TCAM through a novel entry cluster algorithm, making it scalable to large network environments. Our evaluations show that Trochilus achieves multi-Tbps throughput, supports various pattern sets, and maintains high accuracy through automatic updates.
References
- Guanglin Duan, Yucheng Huang, Zhengxin Zhang, Qing Li, Dan Zhao, Zili Meng, Dirk Kutscher, Ruoyu Li, Yong Jiang, and Mingwei Xu. Learning-Enhanced High-Throughput Pattern Matching Based on Programmable Data Plane. Usenix ATC 2025. accepted for publication
- Extended Summary by Peng Cheng Lab
Rethinking Dynamic Networks and Heterogeneous Computing with Automatic Parallelization accepted at ACM APNET
Our paper on Rethinking Dynamic Networks and Heterogeneous Computing with Automatic Parallelization has been accepted by the 9th Asia-Pacific Workshop on Networking (APNET'25).
Abstract:
Hybrid parallelism techniques are crucial for the efficient training of large language models (LLMs). However, these techniques often introduce differentiated computational and communication tasks across nodes. Existing automatic parallel planning frameworks typically fail to consider both node heterogeneity and dynamic changes in network topology simultaneously, limiting their practical performance. In this paper, we address this issue by positioning heterogeneous nodes within dynamic network environments and employing a simulator to identify optimal parallel strategies. Our approach achieves fine-grained workload distribution in scenarios featuring node heterogeneity and complex networks, while also matching state-of-the-art performance in regular topologies and stable network conditions. Moreover, to mitigate the excessively long search times caused by large search spaces in existing frameworks, we propose a strategy pruning technique to rapidly eliminate infeasible parallel configurations. We further accelerate the search process by executing search tasks in parallel within the simulator. Preliminary evaluation results demonstrate that our method significantly improves training performance on heterogeneous nodes, and the proposed dynamic network design offers enhanced adaptability for complex scenarios such as cloud computing environments.
References
Ruilong Wu, Xinjiao Li, Yisu Wang, Xinyu Chen, Dirk Kutscher; Rethinking Dynamic Networks and Heterogeneous Computing with Automatic Parallelization; The 9th Asia-Pacific Workshop on Networking (APNET'25); August 2025; Preprint
ViFusion accepted at ACM ICMR
Our paper on ViFusion: In-Network Tensor Fusion for Scalable Video Feature Indexing has been accepted at the ACM International Conference on Multimedia Retrieval 2025 (CCF-B).
Abstract:
Large-scale video feature indexing in datacenters is critically dependent on efficient data transfer. Although in-network computation has emerged as a compelling strategy for accelerating feature extraction and reducing overhead in distributed multimedia systems, harnessing advanced networking resources at both the switch and host levels remains a formidable challenge. These difficulties are compounded by heterogeneous hardware, diverse application requirements, and complex multipath topologies. Existing methods focus primarily on optimizing inference for large neural network models using specialized collective communication libraries, which often face performance degradation in network congestion scenarios.
To overcome these limitations, we present ViFusion, a communication aware tensor fusion framework that streamlines distributed video indexing by merging numerous small feature tensors into consolidated and more manageable units. By integrating an in-network computation module and a dedicated tensor fusion mechanism within datacenter environments, ViFusion substantially improves the efficiency of video feature indexing workflows. The deployment results show that ViFusion improves the throughput of the video retrieval system by 8–22x with the same level of latency as state-of-the-art systems.
Stay tuned for the pre-print.
References
Yisu Wang, Yixiang Zhu, Dirk Kutscher; ViFusion: In-Network Tensor Fusion for Scalable Video Feature Indexing; The 15th ACM International Conference on Multimedia Retrieval; June 2025; Preprint
Interview on the IETF Blog
The IETF has recently published an interview with me on the IETF Blog.
Networked Metaverse Systems: Among the Most popular paper IEEE OJCOMS Paper 2024 – 2025
Our 2024 paper on Networked Metaverse Systems: Foundations, Gaps, Research Directions has been mentioned as one most popular and impactful papers of the IEEE Open Journal of the Communications Society (OJCOMS) 2024–2025.
References
- https://dirk-kutscher.info/publications/networked-metaverse-systems/
- Y. Zhang, D. Kutscher and Y. Cui, "Networked Metaverse Systems: Foundations, Gaps, Research Directions," in IEEE Open Journal of the Communications Society, vol. 5, pp. 5488-5539, 2024, doi: 10.1109/OJCOMS.2024.3426098.