Dirk Kutscher

Personal web page

Archive for the ‘machine learning’ tag

Invited Talk at FNDC: Connecting AI: Inter-Networking Challenges for Distributed Machine Learning

without comments

I gave a talk at the Future Network Development Conference (FNDC) in Nanjing on August 20th, 2025. The title of the talk was Connecting AI: Inter-Networking Challenges for Distributed Machine Learning, and I talked about our recent work on PacTrain, NetSenseML, and some new work on in-network aggregation.

PacTrain is a novel framework that accelerates distributed training by combining pruning with sparse gradient compression. Active pruning of the neural network makes the model weights and gradients sparse. By ensuring the global knowledge of the gradient sparsity among all distributed training workers, we can perform lightweight compression communication without harming accuracy. We show that the PacTrain compression scheme achieves a near-optimal compression strategy while remaining compatible with the all- reduce primitive. Experimental evaluations show that PacTrain improves training throughput by 1.25 to 8.72× compared to state-of-the-art compression-enabled systems for representative vision and language models training tasks under bandwidth-constrained conditions.

NetSenseML is a novel network adaptive distributed deep learning framework that dynamically adjusts quantization, pruning, and compression strategies in response to real-time network conditions. By actively monitoring network conditions, NetSenseML applies gradient compression only when network congestion negatively impacts convergence speed, thus effectively balancing data payload reduction and model accuracy preservation. Our approach ensures efficient resource usage by adapting reduction techniques based on current network conditions, leading to shorter convergence times and improved training efficiency. Experimental evaluations show that NetSenseML can improve training throughput by a factor of 1.55x to 9.84x compared to state-of-the-art compression-enabled systems for representative DDL training jobs in bandwidth-constrained conditions.

Written by dkutscher

August 21st, 2025 at 5:59 am

ACM CoNEXT-2025 Workshop on Inter-networking challenges for AI

without comments

Generative AI systems are approaching a scalability limit in their development. Due to power density issues, it will soon become infeasible to train large language models with an increasing number of parameters in a single datacenter. While the industry is actively pursuing an effort to scale up AI systems, it becomes necessary to explore the use of scaled-out, global distributed systems to train or serve generative AI models.

Besides, services based on generative AI ask for stringent quality of service levels to meet users demand. Meeting those requirements can be addressed by using systems mixing powerful computing instances residing in cloud platforms with localized edge platforms, using heterogeneous and distributed systems.

Those questions may find a solution in approaches adopted by federated learning systems, in which models are trained among several stakeholders. Yet, those systems also face scalability issues in dealing with models of a larger size.

The ACM CoNEXT INet4AI workshop aims at discussing the networking challenges raised by the distribution of generative AI workloads at a large scale. To that extend, we aim at receiving contributions from academic researchers, machine learning system developers or AI infrastructure providers. .

Submitted papers must be at most six (6) pages long, excluding references and appendices, in two-column 10pt ACM format. Authors of accepted submissions are expected to present and discuss their work at the workshop. All submissions will be peer-reviewed, and the review process will be double-blind. Per the anonymity guidelines, please prepare your paper in a way that preserves the anonymity of the authors. No information will be shared with third parties.

Please submit your paper using the INET4AI Submission Portal: https://inet4ai25.hotcrp.com.

Written by dkutscher

July 3rd, 2025 at 2:58 pm

AdaptQNet accepted at MobiCom

without comments

Our paper on AdaptQNet: Optimizing Quantized DNN on Microcontrollers via Adaptive Heterogeneous Processing Unit Utilization has been accepted at ACM MobiCom-2025.

Abstract

There is a growing trend in deploying DNNs on tiny micro-controller (MCUs) to provide inference capabilities in the IoT. While prior research has explored many lightweight techniques to compress DNN models, achieving overall efficiency in model inference requires not only model optimization but also careful system resource utilization for execution. Existing studies primarily leverage arithmetic logic units (ALUs) for integer-only computations on a single CPU core. Floating-point units (FPU) and multi-core capabilities available in many existing MCUs remain underutilized.

To fill this gap, we propose AdaptQNet, a novel MCU neural network system that can determine the optimal precision assignment for different layers of a DNN model. AdaptQNet models the latency of various operators in DNN models across different precisions on heterogeneous processing units. This facilitates the discovery of models that utilize FPU and multi-core capabilities to enhance capacity while adhering to stringent memory constraints. Our implementation and experiments demonstrate that AdaptQNet enables the deployment of models with better accuracy-efficiency trade-off on MCUs.

References

Yansong Sun, Jialuo He, Dirk Kutscher, Huangxun CHEN; AdaptQNet: Optimizing Quantized DNN on Microcontrollers via Adaptive Heterogeneous Processing Unit Utilization; The 31st Annual International Conference On Mobile Computing And Networking (MobiCom 2025)

Written by dkutscher

June 22nd, 2025 at 8:17 pm

Posted in Publications

Tagged with , , , , ,

Collective Communication: Better Network Abstractions for AI

without comments

We have submitted two new Internet Drafts on Collective Communication:

  1. Kehan Yao , Xu Shiping , Yizhou Li , Hongyi Huang , Dirk Kutscher; Collective Communication Optimization: Problem Statement and Use cases; Internet Draft draft-yao-tsvwg-cco-problem-statement-and-usecases-00; work in progress; October 2023

  2. Kehan Yao , Xu Shiping , Yizhou Li , Hongyi Huang , Dirk Kutscher; Collective Communication Optimization: Requirement and Analysis; Internet Draft draft-yao-tsvwg-cco-requirement-and-analysis-00; work in progress; October 2023

Collective Communication refers to communication between a group of processes in distributed computing contexts, for example involving interaction types such as broadcast, reduce, all-reduce. This data-oriented communication model is employed by distributed machine learning and other data processing systems, such as stream processing. Current Internet network and transport protocols (and corresponding transport layer security) make it difficult to support these interactions in the network, e.g., for aggregating data on topologically optimal nodes for performance enhancements. These two drafts discuss use cases, problems, and initial ideas for requirements for future system and protocol design for Collective Communication. They will be discussed at IETF-118.

Written by dkutscher

October 30th, 2023 at 8:03 am