The number of cores in modern-day processors keeps growing and it has already reached hundreds of cores per socket. This trend, combined with improvements in virtualization technologies, results in many independent tenants sharing a single server. Eﬃcient link and processor utilization in environments with thousands of tenants sharing an I/O device introduces unique challenges and requires rethinking existing hardware architectures. Also, a high-bandwidth network available in modern-day servers requires signiﬁcant CPU involvement to handle transport protocol operations. Datacenter providers need to decrease CPU involvement in network transport protocol operations. This can be achieved by performing common protocol operations in hardware.
This thesis presents an analysis of current I/O address translation schemes and studies of their scalability in hyper-tenant setups, and it describes a new trace-based simulator. Also, this thesis introduces the architecture of scalable hardware units used to perform the common operations between multiple network protocols to oﬄoad a portion of the networking stack from software on general purpose a processor to hardware. Finally, this thesis covers an I/O interface for communication between a hardware testbench and a host machine.