Archive for the ‘in-network computing’ tag
Towards a Unified Transport Protocol for In-Network Computing in Support of RPC-based Applications
The emerging term In-Network Computin (INC) [inc] in particular refers applying on-path programmable networking devices (e.g., switches and routers between clients and servers) as an accelerator or function offloader to boost throughput, reduce server load, or improve latency, typically in a well-controlled data center network environment.
Some INC implementations evolved from programmable data plane systems and align with the trend of network programmability at large. In recent year, it has been shown to support many promising applications (e.g., caching, aggregation, and agreement). For example, in distributed machine learning (DML), training nodes produce data (gradients) that needs to be aggregated or reduced -- and the result could be distributed to one or multiple consumers. As another example, the NetClone system [netclone] uses in-network forwarder to replicate RPC invocation messages and to perform more informed forwarding based on observed latencies for accelerating RPC communication.
While it is possible to achieve this kind of operation purely with end-to-end communication between worker nodes, performance can be dramatically improved by offloading both the operation processing and the data dissemination to nodes in the network. These in-network processors are often conceived as semi-transparent performance enhancing on-path elements, i.e., they are not the actual endpoints in transport protocol sessions and would intercept packets with application data and potentially generate new data that they would have to transmit.
In our Internet Draft draft-song-inc-transport-protocol-req-01.txt, we are discussing this problem and are formulating some requirements for the design of future transport protocols in this space.
References
- Collective Communication: Better Network Abstractions for AI
- Computing in the Network – Lessons Learned and New Opportunities
- [I-D.yao-tsvwg-cco-problem-statement-and-usecases] Yao, K., Shiping, X., Li, Y., Huang, H., and D. KUTSCHER, "Collective Communication Optimization: Problem Statement and Use cases", Work in Progress, Internet-Draft, draft-yao-tsvwg-cco-problem-statement-and-usecases-00, 23 October 2023, https://datatracker.ietf.org/doc/html/draft-yao-tsvwg-cco-problem-statement-and-usecases-00.
- [inc] Klenk et al., B., "An In-Network Architecture for Accelerating Shared-Memory Multiprocessor Collectives", ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), 2020, <https:dx.doi.org/10.1109/ISCA45697.2020.00085>
- [netclone] Kim, G., "NetClone: Fast, Scalable, and Dynamic Request Cloning for Microsecond-Scale RPCs", In Proceedings of the ACM SIGCOMM 2023 Conference (ACM SIGCOMM '23). Association for Computing Machinery, New York, NY, USA, 195-207, 2023 https://dl.acm.org/doi/10.1145/3603269.3604820
Computing in the Network – Lessons Learned and New Opportunities
The Internet is a distributed system that enables distributed computing applications, from client-server web applications to collaborative multi-media applications. The evolution of both compute server and network infrastructure platforms has fueled the development of new approaches for building more programmable networks and of application support functions in the network.
At the same time, new applications such as IoT data processing, distributed machine learning, decomposed application architectures such as Microservice and distributed computing frameworks introduce new opportunities for the development of more principled approaches towards Computing in the Network.
In my invited talk at AINTEC-2023, I reviewed some promising use cases, highlighted recent relevant research results and discussed several research challenges for conceiving Computing in the Network from an Internet perspective, for example discussing the meaning of "end-to-end communication" and "permissionless innovation" in the light of these new developments.
From "In-Network Computing"...
"In-Network Computing" is a popular but also relatively poorly defined term that comes up a lot in recent research studies. I discussed the different facets such as traditional networked computing, middlebox-like packet processing, active networking, programmable dataplane, Network Functions Virtualization and Service Function Chaning as depicted in the figure below.
In general, we can distinguish two main directions:
- Computing on the Network: general distributed computing using Internet technologies for communication, such as the Web and related overlay networks such as CDNs.
- Middlebox-like packet processing: intercepting, manipulating, generating, and steering packets has been applied to production networks in data centers and telco networks, often as a performance enhancing approach.
What about Programmable Data Plane?
Programmable Data Plane approaches such as the P4 programming language are often used to implement certain elements of either of these two categories, for example, traffic steering, load balancing etc. There are some point solutions for more application-layer-oriented functionalities such as NetCache, support for distributed consensus protocols, support for distributed machine learning training etc., but these tyically operate under very specific assumptions, and are often at odds with end-to-semantics and security. One example of a productive use of Programmable Data Plane in my opinion was the SIGCOMM-2023 paper on NetClone: Fast, Scalable, and Dynamic Request Cloning for Microsecond-Scale RPCs by Gyuyeong Kim. In this work, programmable switches were used to implemenent request forwarding strategies based on relatively simple packet meta information and observed performance, i.e., without requiring application layer knowledge.
... To "Computing in the Network"
There are many relevant use cases of distributed computing that can benefit from (and urgently need) support from networking and where distributing processing, aggregation etc. with awareness of network topologies, current utilization etc. would make a real difference. We have earlier built such a system and called it Compute-First Networking: Distributed Computing meets ICN (see https://dirk-kutscher.info/publications/distributed-computing-icn/ for background).
I talked about relevant applications such as distributed stream processing, and distributed machine learning. Today, these systems are typically run on the network but could definitely benefit from a better support and from better awareness of the network – so I asked the question whether there is the possibility for a confluence of existing and emerging capabilities of modern hardware and the requirements of relevant distributed computing applications.
Questions I raised included:
- How can we conceive such a confluence?
- How can we support distributed computing without giving up layering and principles such as the end-to-end principle?
- What features do we need from transport protocols to support diverse use cases?
Distributed Machine Learning
Distributed machine learning, e.g., federated learning, is an application that is currently perceived as a major driver for in-network computing. Large-scale training networks are expected to enable higher degrees of parallelization and handling of larger model sizes. How would we run such workloads as distributed systems, within data centers but potentially also across the Internet?
It is important to understand the performance requirements of such systems. Initial systems were build with bespoke High-Performance Computing (HPC) architectures and communication technologies such as Infiniband. Such systems used in-network aggregation functions and defined corresponding architectures such as SHArP.
Today's data center systems employ RDMA and RDMA over Ethernet (RoCE) as low-layer abstraction for efficient packet-based communication on layer 2, without addressing higher layer transport and system design aspects.
Collective Communications
In parallel computing architectures, Message Passing Interface (MPI) is typically used to provide efficient and portable inter-process communication for high-performance computing. One of the concepts developed in MPI is Collective Communication, a set of bespoke data aggregation and distribution patterns for different data-oriented distributed computing scenarios, such as:
- Broadcasting, e.g., for distributing configuration data or common ML models
- Scattering: single process involves a single process sending distinct pieces of data to each process
- Gathering: one process collecting and combining data pieces from other processes
- All-to-all communications: every process sends data to every other processes
- Reduction: collect data from all processes, aggregate and send result
Today's Collective Communication implementations are implementing these patterns for different underlaying networks and inter-process facilities. For GPU-based Collective Communications in today's networks, often a ring-based communication is applied, leading to quite some inefficiencies with respect to communication overhead and idle times of the different processors. See this presentation from Tencent at the recent AIDC side meeting at IETF-118. Other implementations use peer-to-peer communication models.
Collective Communication in the Network
From a networking perspective, the question is how to map collective communication better to Internet technology-based networked systems, avoiding unnessary duplication, providing typical transport protocol features such as reliability and congestion control, and enabling an optimal placement of corresponding aggregation functions.
This would incur a set of challenges such as
- Transport
- Reliability: underlying network lacks communication reliability
- Application data units instead of packets
- Blocking & non-blocking communication modes
- Security (potentially)
- Multi-destination delivery
- IP-Multicast possibly not the best fit
- Computing in the Network Framework
- Generic operations as primitives (at least per application domain)
- Stringent performance requirement
- Control, Optimizations, Management
- Topology and utilization awareness
- Scheduling communication and computation for optimal performance
We discussed these challenges in two recently submitted Internet Drafts on Transport for Collective Communications, and I discussed these issues in more detail during the talk.
Data-Oriented Collective Communications
I proposed the direction of data-oriented Collective Communication and discussed how concepts from Information-Centric distributed computing could possibly employed to achieve efficient and practical multi-destination transport, reliability and congestion control, and flexible placement of aggregation functions with a name-based identity scheme.
Promising features would include:
- Data-oriented communication model
- Locator-less model conducive to data production and consumption at different places in the network (computing)
- Multi-destination delivery included
- In-network retransmission and caching could help with reliability and performance
However, I also mentioned some challenges:
- Receiver-driven transport results in polling – efficient enough?
- RDMA-like communication unexplored
- Security concept: data-oriented security good – unclear whether it can be afforded
- Exact scheduling may be at odds with current ICN system design – more work needed
In summary, this seems to be rich field for future systems research. Distributed machine learning drives the development of new concepts for communication and computing. It clearly needs efficient multi-destination communication and an efficient mapping of MPI-inspired Collective Communication. The current abstractions do not fit well, and pure IP packet level communication is too limited. Connection-oriented transport seems to be at odds with the communication semantics, which makes data-oriented communication attractive. Such an approach could work with a name-based approach, i.e., without addresses, which is conducive to data production and consumption. Certainly, the challenging performance requirements call for more research and possibly evolution of current ICN protocols.
References
- [CFN-ICN] Compute-First Networking: Distributed Computing meets ICN
- [DISTCOMPICN] Distributed Computing in ICN
- [IETFCollectiveCommunications] Collective Communication: Better Network Abstractions for AI
- [IETF118AIDC] Side meeting at IETF-118 on AI in Data Centers
- [IETF118CC] Side meeting at IETF-118 on Collective Communications
- [NETCLONE] NetClone: Fast, Scalable, and Dynamic Request Cloning for Microsecond-Scale RPCs
- [RoCE] RDMA over Ethernet (RoCE)
- [SHARP] Richard L. Graham, Devendar Bureddy, Pak Lui, Hal Rosenstock, Gilad Shainer, Gil Bloch, Dror Goldenerg, Mike Dubman, Sasha Kotchubievsky, Vladimir Koushnir, Lion Levi, Alex Margolin, Tamir Ronen, Alexander Shpiner, Oded Wertheim, and Eitan Zahavi. 2016. Scalable hierarchical aggregation protocol (SHArP): a hardware architecture for efficient data reduction. In Proceedings of the First Workshop on Optimization of Communication in HPC (COM-HPC '16). IEEE Press, 1–10.
Directions for Computing in the Network
We have updated our Internet Draft on Directions for Computing in the Network.
In-network computing can be conceived in many different ways -- from active networking, data plane programmability, running virtualized functions, service chaining, to distributed computing.
This memo proposes a particular direction for Computing in the Networking (COIN) research and lists suggested research challenges.
This is now an adopted COINRG work item.
Link to draft: draft-irtf-coin-dir.
ACM CoNEXT Workshop on Emerging In-Network Computing Paradigms (ENCP)
Edge- and, more generally, in-network computing is receiving a lot attention in research and industry fora. The ability to decentralize computing, to achieve low latency communication to distributed application logic, and the potential for privacy-preserving analytics are just a few examples that motivate a new approach for looking at computing and networking.
What are the interesting research questions from a networking and distributed computing perspective? In-network computing can be conceived in many different ways – from active networking, data plane programmability, running virtualized functions, service chaining, to distributed computing. What abstractions do we need to program, optimize, and to manage such systems? What is the relationship to cloud networking?
These questions will be discussed at the first workshop on Emerging In-Network Computing (ENCP) that takes place at ACM CoNEXT-2019 on December 9th in Orlando.
We have received many interesting submission and were able to put together a really interesting program that covers both Network Programmability and In-Network Computing Architectures and Protocols. Check out the full program here.
Many thanks to my co-organizers Spyros Mastorakis and Abderrahmen Mtibaa, to our steering committee members Jon Crowcroft, Satyajayant (Jay) Misra, and Dave Oran, and to our great Technical Program Committee for putting this together.