Archive for the ‘Publications’ Category
PacTrain accepted at DAC-2025
Our paper on PacTrain: Pruning and Adaptive Sparse Gradient Compression for Efficient Collective Communication in Distributed Deep Learning has been accepted at the Design Automation Conference DAC (2025) (CCF-A).
Abstract:
Large-scale deep neural networks (DNN) exhibit excellent performance for various tasks. As DNNs and datasets grow, distributed training becomes extremely time-consuming and demands larger clusters. A main bottleneck is the resulting gradient aggregation overhead. While gradient compression and sparse collective communication techniques are commonly employed to alleviate network load, many gradient compression schemes do not achieve acceleration of the training process while also preserving accuracy. This paper introduces PacTrain, a novel framework that accelerates distributed training by combining pruning with sparse gradient compression. Active pruning of the neural network makes the model weights and gradients sparse.
By ensuring the global knowledge of the gradient sparsity among all distributed training workers, we can perform lightweight compression communication without harming accuracy. We show that the PacTrain compression scheme achieves a near-optimal compression strategy while remaining compatible with the all- reduce primitive. Experimental evaluations show that PacTrain improves training throughput by 1.25 to 8.72× compared to state-of-the-art compression-enabled systems for representative vision and language models training tasks under bandwidth-constrained conditions.
Stay tuned for the pre-print.
References
Yisu Wang, Ruilong Wu, Xinjiao Li , Dirk Kutscher; PacTrain: Pruning and Adaptive Sparse Gradient Compression for Efficient Collective Communication in Distributed Deep Learning; Design Automation Conference (DAC) 2025; Preprint
New Internet Draft draft-irtf-icnrg-reflexive-forwarding-00
We updated our Internet Draft draft-irtf-icnrg-reflexive-forwarding-00 on Reflexive Forwarding for CCNx and NDN Protocols.
Current Information-Centric Networking protocols such as CCNx and NDN have a wide range of useful applications in content retrieval and other scenarios that depend only on a robust two-way exchange in the form of a request and response (represented by an Interest-Data exchange in the case of the two protocols noted above). A number of important applications however, require placing large amounts of data in the Interest message, and/or more than one two-way handshake. While these can be accomplished using independent Interest-Data exchanges by reversing the roles of consumer and producer, such approaches can be both clumsy for applications and problematic from a state management, congestion control, or security standpoint. This specification proposes a Reflexive Forwarding extension to the CCNx and NDN protocol architectures that eliminates the problems inherent in using independent Interest-Data exchanges for such applications. It updates RFC8569 and RFC8609.
The recent update includes a generalization of the main protocol specification, so that Reflexive Forwarding can be used in both CCNx and NDN.
Invited Talk at Airbus Workshop on Networking Systems
On October 10th, 2024, I was invited to give a talk at the 2nd Airbus Workshop on Networking Systems. The workshop largely discussed connected aircraft scenarios and technologies and features talks on security and reliability, IoT sensor fusioning, and future space and 6G network architectures.

My talk was on Connected Aircraft – Network Architectures and Technologies, and discussed relevant scenarios from my perspective, such as passenger services and new aircraft management applications. For the technology discussion, I focused on large-scale low-latency multimedia communication over the expected heterogeneous and dynamic aircraft connectivity networks and discussed current and emerging technologies such as Media over QUIC, ICN.
I also introduced the recently established Low-Altitude Systems and Economy Research Institute at HKUST(GZ), a cross-disciplinary research institute for the low-altitude domain (with similar but not identical requirements) and some of our recent projects such as Named Data Microverse.
Networked Metaverse Systems
The term ‘Metaverse’ often denotes a wide range of existing and fictional applications. Nevertheless, there are actual systems today that can be studied and analyzed. However, whereas a considerable body of work has been published on applications and application ideas, there is less work on the technical implementation of such systems, especially from a networked systems perspective.

In a recently published open access journal article, we share some insights into the technical design of Metaverse systems, their key technologies, and their shortcomings, predominantly from a networked systems perspective. For the scope of this study, we define the ‘Metaverse’ as follows. The ‘Metaverse’ encompasses various current and emerging technologies, and the term is used to describe different applications, ranging from Augmented Reality (AR), Virtual Reality (VR),and Extended Reality (XR) to a new form of the Internet or Web. A key feature distinguishing the Metaverse from simple AR/VR is its inherently collaborative and shared nature, enabling interaction and collaboration among users in a virtual environment.

Building on Existing Platforms and Network Stacks
Most current Metaverse systems and designs are built on existing technologies and networks. For example, massively multiplayer online games such as Fortnite use a generalized client-server model. In this model, the server authoritatively manages the game state, while the client maintains a local subset of this state and can predict game flow by executing the same game code as the server on approximately the same data. Servers send information about the game world to clients by replicating relevant actors and their properties. Commercial social VR platforms such as Horizon Worlds and AltspaceVR use HTTPS to report client-side information and synchronize in-game clocks across users.
Mozilla Hubs, built with A-Frame (a web framework for building virtual reality experiences), uses WebRTC communication with a Selective Forwarding Unit (SFU). The SFU receives multiple audio and video data streams from its peers, then determines and forwards relevant data streams to connected peers. Blockchain or Non-Fungible Token (NFT)-based online games, such as Decentraland, run exclusively on the client side but allow for various data flow models, ranging from local effects and traditional client-server architectures to peer-to-peer (P2P) interactions based on state channels; Upland is built on EOSIO, an open-source blockchain protocol for scalable decentralized applications, and transports data through HTTPS. Connections between peers in Upland are established using TLS or VPN tunnels.
Many studies have focused on improving various aspects of Metaverse systems. For example, EdgeXAR is a mobile AR framework using edge offloading to enable lightweight tracking with six degrees of freedom (DOF) while reducing offloading delay from the user’s view; SORAS is an optimal resource allocation scheme for edgeenabled Metaverse, using stochastic integer programming to minimize the total network cost; Ibrahim et al. explores the issue of partial computation offloading for multiple subtasks in an in-network computing environment, aiming to minimize energy consumption and delay. However, these ideas for offloading computation and rendering tasks to edge platforms often conflict with the existing end-to-end transport protocols and overlay deployment models. Recently, a Deep Reinforcement Learning (DRL)-based multipath network orchestration framework designed for remote healthcare services is presented, automating subflow management to handle multipath networks. However, proposals for scalable multi-party communication would require interdomain multicast services, unavailable on today’s Internet.
Disconnect Between High-Level Concepts and Actual Systems
In practice, there is a significant disconnect between high-level Metaverse concepts, ideas for technical improvements, and systems that are actually developed and partially deployed. A 2022 ACM IMC paper titled Are we ready for metaverse?: a measurement study of social virtual reality platforms analyzes the performance of various social VR systems, pinpointing numerous issues related to performance, communication overhead, and scalability. These issues are primarily due to the fact that current systems leverage existing platforms, protocols, and system architectures, which cannot tap into any of the proposed architectural and technical enhancements, such as scalable multi-party communication, offloading computation, rendering tasks, etc.
Rather than merely layering ‘the Metaverse’ on top of legacy and not always ideal foundations, we consider Metaverse as a driver for future network and web applications and actively develop new designs to that end. In our article, we take a comprehensive systems approach and technically describe current Metaverse systems, focusing on their networking aspects. We document the requirements and challenges of Metaverse systems and propose a principled approach to system design for these requirements and challenges based on a thorough understanding of the needs of Metaverse systems, the current constraints and limitations, and the potential solutions of Internet technologies.
Article Overview

- We present a technical description of the ‘Metaverse’ based on existing and emerging systems, including a discussion of its fundamental properties, applications, and architectural models.
- We comprehensively study relevant enabling technologies for Metaverse systems, including HCI/XR technologies, networking, communications, media encoding, simulation, real-time rendering and AI. We also discuss current Metaverse system architectures and the integration of these technologies into actual applications.
- We conduct a detailed requirements analysis for constructing Metaverse systems. We analyze applications specific requirements and identify existing gaps in four key aspects: communication performance, mobility, large-scale operation,and end system architecture. For each area, we propose candidate technologies to address these gaps.
- We propose a research agenda for future Metaverse systems, based on our gap analysis and candidate technologies discussion. We re-assess the fundamental goals and requirements, without necessarily being constrained by existing system architectures and protocols. Based on a comprehensive understanding of what Metaverse systems need and what end-systems, devices, networks and communication services can theoretically provide, we propose specific design ideas and future research directions to realize Metaverse systems that can meet the expectations often articulated in the literature.
References
- Y. Zhang, D. Kutscher and Y. Cui; Networked Metaverse Systems: Foundations, Gaps, Research Directions; in IEEE Open Journal of the Communications Society, doi: 10.1109/OJCOMS.2024.3426098.
- Tianyuan Yu, Xinyu Ma, Varun Patil, Yekta Kocaogullar, Yulong Zhang, Jeff Burke, Dirk Kutscher, Lixia Zhang; Secure Web Objects: Building Blocks for Metaverse Interoperability and Decentralization; IEEE MetaCom 2024; August 12-14 2024; Hong Kong, China
- Dirk Kutscher, Jeff Burke, Giuseppe Fioccola, Paulo Mendes;
Statement: The Metaverse as an Information-Centric Network; 10th ACM Conference on Information-Centric Networking (ACM ICN '23); October 9 — 10, 2023, Reykjavik, Iceland - Giuseppe Fioccola , Paulo Mendes , Jeff Burke , Dirk Kutscher;
Information-Centric Metaverse; Internet Draft draft-fmbk-icnrg-metaverse-01; Work in Progress; July 2023
Affordable HPC: Leveraging Small Clusters for Big Data and Graph Computing

In our paper at PCDS-2024, we are exploring strategies for academic researchers to optimize computational resources within limited budgets, focusing on building small, efficient computing clusters. We analyzed the comparative costs of purchasing versus renting servers, guided by market research and economic theories on tiered pricing. The paper offers detailed insights into the selection and assembly of hardware components such as CPUs, GPUs, and motherboards tailored to specific research needs. It introduces innovative methods to mitigate the performance issues caused by PCIe switch bandwidth limitations in order to enhance GPU task scheduling. Furthermore, a Graph Neural Network (GNN) framework is proposed to analyze and optimize parallelism in computing networks.
Growing Resource Demands for Large-Scale Machine Learning
Large machine learning (ML) models, such as language models (LLMs), are becoming increasingly powerful and gradually accessible to end users. However, the growth in the capabilities of these models has led to memory and inference computation demands exceeding those of personal computers and servers. To enable users, research teams, and others to utilize and experiment with these models, a distributed architecture is essential.
In recent years, scientific research has shifted from a ”wisdom paradigm” to a ”resource paradigm.” As the number of researchers and the depth of scientific exploration increase, a significant portion of research computing tasks has moved to servers. This shift has been facilitated by the development of computing frameworks and widespread use of computers, leading to an increased demand for computer procurement.
Despite the abundance of online tutorials for assembling personal computers, information on the establishment of large clusters is relatively scarce. Large Internet companies and multinational corporations usually employ professional architects and engineers or work closely with vendors to optimize their cluster performance. However, researchers often do not have access to these technical details and must rely on packaged solutions from service providers to build small clusters.
Towards Affordable HPC
In our paper "Affordable HPC: Leveraging Small Clusters for Big Data and Graph Computing", we aim to bridge this gap by providing opportunities for researchers with limited funds to build small clusters from scratch. We compiled the necessary technical details and guidelines to enable researchers to assemble clusters independently. In addition, we propose a method to mitigate the performance degradation caused by the bandwidth limitations of PCIe switches, which can help researchers prioritize GPU training tasks effectively.
The papers discusses:
- How to build cost-effective clusters: We provide a comprehensive guide for researchers with limited funds, helping them to independently build small clusters and contribute to the development of large models.
- Performance Optimization: We propose a method to address the performance degradation caused by PCIe switch bandwidth limitations. This method allows researchers to prioritize GPU training tasks effectively, thereby improving the overall cluster performance.
- GNN for Network and Neural network parallelism: We propose a GNN (Graph Neural Network) framework that combines neural networks with parallel network flows in distributed systems. Our aim is to integrate different types of data flows, communication patterns, and computational tasks, thereby providing a novel perspective for evaluating the performance of distributed systems.
References
- Ruilong Wu, Yisu Wang, Dirk Kutscher; Affordable HPC: Leveraging Small Clusters for Big Data and Graph Computing; The 1st International Symposium on Parallel Comnputing and Distributed Systems; September 2024; pre-print: https://arxiv.org/abs/2408.15568
Secure Web Objects: Building Blocks for Metaverse Interoperability and Decentralization
In our upcoming paper at IEEE Metacom-2024, we propose a data-oriented approach for future Web and Metaverse system designs.
Abstract
This position paper explores how to support the Web's evolution through an underlying data-centric approach that better matches the data-orientedness of modern and emerging applications. We revisit the original vision of the Web as a hypermedia system that supports document composability and application interoperability via name-based data access. We propose the use of secure web objects (SWO), a data-oriented communication approach that can reduce complexity, centrality, and inefficiency, particularly for collaborative and local-first applications, such as the Metaverse and other collaborative applications. SWO are named, signed, application-defined objects that are secured independently of their containers or communications channels, an approach that leverages the results from over a decade-long data-centric networking research. This approach does not require intermediation by aggregators of identity, storage, and other services that are common today. We present a brief design overview, illustrated through prototypes for two editors of shared hypermedia documents: one for 3D and one for LaTeX. We also discuss our findings and suggest a roadmap for future research.
References
-
Tianyuan Yu, Xinyu Ma, Varun Patil, Yekta Kocaogullar, Yulong Zhang, Jeff Burke, Dirk Kutscher, Lixia Zhang; Secure Web Objects: Building Blocks for Metaverse Interoperability and Decentralization; IEEE MetaCom 2024, pre-print: https://arxiv.org/abs/2407.15221
-
Dirk Kutscher; Data-oriented, Decentralized, Daring: Opportunities and Research Challenges for an Information-Centric Web; Lightning Talk at NDNComm 2024; March 2024
-
Navin V. Keizer, Onur Ascigil, Michał Król, Dirk Kutscher, and George Pavlou; A Survey on Content Retrieval on the Decentralised Web; ACM Computing Surveys; March 2024; https://doi.org/10.1145/3649132
-
Dirk Kutscher, Jeff Burke, Giuseppe Fioccola, Paulo Mendes;
Statement: The Metaverse as an Information-Centric Network; 10th ACM Conference on Information-Centric Networking (ACM ICN '23); October 9 — 10, 2023, Reykjavik, Iceland; https://doi.org/10.1145/3623565.3623761
Networked Systems for Distributed Machine Learning at Scale
On July 3rd, 2024, I gave a talk at the UCL/Huawei Joint Lab Workshop on "Building Better Protocols for Future Smart Networks" that took place on UCL's campus in London.
Talk Abstract
Large-scale distributed machine learning training networks are increasingly facing scaling problems with respect to FLOPS per deployed compute node. Communication bottlenecks can inhibit the effective utilization of expensive GPU resources. The root cause of these performance problems is not insufficient transmission speed or slow servers; it is the structure of the distributed computing and the communication characteristics it incurs. Large machine learning workloads typically provide relatively asymmetric, and sometimes centralized, communication structures, such as gradient aggregation and model update distribution. Even when training networks are less centralized, the amount of data that needs to be sent to aggregate several thousand input values through collective communication functions such as AllReduce can lead to Incast problems that overload network resources and servers. This talk discusses challenges and opportunities for developing in-network aggregation systems from a distributed computing and networked systems perspective.
RFC 9556: Internet of Things (IoT) Edge Challenges and Functions
|
|
Many Internet of Things (IoT) applications have requirements that cannot be satisfied by centralized cloud-based systems (i.e., cloud computing). These include time sensitivity, data volume, connectivity cost, operation in the face of intermittent services, privacy, and security. As a result, IoT is driving the Internet toward edge computing.
We have published RFC 9556, outlining the requirements of the emerging IoT edge and its challenges. It presents a general model and major components of the IoT edge to provide a common basis for future discussions in the Thing-to-Thing Research Group (T2TRG) and other IRTF and IETF groups.
Today, many IoT services leverage cloud computing platforms because they provide virtually unlimited storage and processing power. The reliance of IoT on back-end cloud computing provides additional advantages, such as scalability and efficiency. At the time of writing, IoT systems are fairly static with respect to integrating and supporting computation. It is not that there is no computation, but that systems are often limited to static configurations (edge gateways and cloud services).
However, IoT devices generate large amounts of data at the edges of the network. To meet IoT use case requirements, data is increasingly being stored, processed, analyzed, and acted upon close to the data sources. These requirements include time sensitivity, data volume, connectivity cost, and resiliency in the presence of intermittent connectivity, privacy, and security, which cannot be addressed by centralized cloud computing. A more flexible approach is necessary to address these needs effectively. This involves distributing computing (and storage) and seamlessly integrating it into the edge-cloud continuum. We refer to this integration of edge computing and IoT as "IoT edge computing". RFC 9556 describes the related background, use cases, challenges, system models, and functional components.
Data-oriented, Decentralized, Daring: Opportunities and Research Challenges for an Information-Centric Web
Research and development in ICN has led to different communication patterns such as Sync and API implementations such as CNL. It is now time to think about how to leverage Information-Centric principles for providing better foundations for hypermedia applications in the future web. At NDNComm-2024 I talked about how ICN could possibly help, what could be fruitful future research directions, and why web3 and dweb are not the answer.
Material
Content Retrieval on the Decentralised Web
Trends and Emerging Technologies for Content Retrieval on the Decentralized Web
The control, governance, and management of the web have become increasingly centralised, resulting in security, privacy, and censorship concerns. Decentralised initiatives have emerged to address these issues, beginning with decentralised file systems. These systems have gained popularity, with major platforms serving millions of content requests daily. Complementing the file systems are decentralised search engines and name registry infrastructures, together forming the basis of a decentralised web. We have published a survey paper that analyses research trends and emerging technologies for content retrieval on the decentralised web, encompassing both academic literature and industrial projects.
Challenges
Several challenges hinder the realisation of a fully decentralised web. Achieving comparable performance to centralised systems without compromising decentralisation is a key challenge. Hybrid infrastructures, blending centralised components with verifiability mechanisms, show promise to improve decentralised initiatives. While decentralised file systems have seen more mature deployments, they still face challenges such as usability, performance, privacy, and content moderation. Integrating these systems with decentralised name-registries offers a potential for improved usability with human-readable and persistent names for content. Further research is needed to address security concerns in decentralised name-registries and enhance governance and crypto-economic incentive mechanisms.
References
Navin V. Keizer, Onur Ascigil, Michał Król, Dirk Kutscher, and George Pavlou; A Survey on Content Retrieval on the Decentralised Web; ACM Computing Surveys; March 2024; https://doi.org/10.1145/3649132