Best Practices for Paravirtualization Enhancements from Intel® Virtualization Technology: EPT and VT-d

Overview

Paravirtualization is a technique for increasing the performance of virtualized systems by reducing the proportion of hardware resources that the virtual machine monitor (VMM) must dynamically emulate in software, relative to full virtualization scenarios. Traditional emulation typically involves binary translation, in which a software-based process within the VMM traps hardware calls from the guest OSs and translates them to make them compatible with the host OS. That translation requires computation that can introduce substantial processing overhead and decrease the overall performance and scalability of the environment.

Paravirtualization removes the need for binary translation by building a software interface into the VMM that presents the virtual machines (VMs) with appropriate drivers and other elements that take the place of the dynamically emulated hardware. While paravirtualization typically requires modification of guest operating systems (OSs), Intel VT enables Xen, VMware, and other virtualization environments to run many unmodified guest OSs.

Hardware Page Table Virtualization provides a hardware assist to memory virtualization, which includes the partitioning and allocation of physical memory among VMs. Memory virtualization causes VMs to see a contiguous address space, which is not actually contiguous within the underlying physical memory. The guest OS stores the mapping between virtual and physical memory addresses in page tables.

Because the guest OSs do not have native direct access to physical system memory, the VMM must perform another level of memory virtualization in order to accommodate multiple VMs simultaneously. That is, mapping must be performed within the VMM between the physical memory and the page tables in the guest OSs. In order to accelerate this additional layer of memory virtualization, both Intel and AMD have announced technologies to provide a hardware assist. Intel's is called Extended Page Tables (EPT), and AMD's is called Nested Page Tables (NPT). These two technologies are very similar at a conceptual level.

Intel Virtualization Technology for Directed I/O (Intel VT-d) supports re-mapping of direct memory access (DMA) transfers and device-generated interrupts, which helps to improve isolation of I/O devices. VMMs can use VT-d to directly assign an I/O resource to a specific VM. Thus, an unmodified guest OS can obtain direct access to that resource, without requiring the VMM to provide emulated device drivers for it. Moreover, if an I/O device has been assigned to particular VMs, that device is not accessible by other VMs, nor are the other VMs accessible by the device.

This virtualization of interrupts and DMA transfers prevents a device under the control of one VM from accessing the memory space controlled by another VM. Under VT-d, the I/O memory management unit (IOMMU) maintains a record of which physical memory regions are mapped to which I/O devices, allowing it to control access to those memory locations on the basis of which I/O device requests the access.

For a more in-depth discussion of VT-d and its potential benefits to s oftware products, see the article "Intel® Virtualization Technology for Directed I/O (VT-d): Enhancing Intel platforms for efficient virtualization of I/O devices."

Intel VT for x86-based Intel® Architecture (VT-x) provides a hardware assist for virtualizing the CPU and the memory subsystem in systems based on 32-bit Intel® processors or Intel® 64 architecture (formerly Intel® EM64T) that support Intel VT. For a more complete discussion of the architecture and processes that underlie this hardware assist, see the Intel VT Platform Technology Site.

Best Practices: Obtaining the Benefits of EPT and VT-d

For the most part, it is not necessary for application software to change in order to take direct advantage of hardware page table virtualization and VT-d. The following benefits are immediately available:

Reduced complexity: There is no longer any need for page-table shadowing in software, and it is possible to avoid I/O emulation for direct-mapped I/O devices.
Improved performance: Hardware page-table walkers reduce address-translation overheads, and there is no longer any need for shadow page tables, which saves on memory requirements. EPT and VT-d also give the VMM the option to direct-map I/O devices to VMs, when desired.
Improved functionality: The guest OS has direct access to modern physical device functions (for the direct-mapped case).
Enhanced reliability and protection: Device DMA is constrained by translation tables, and DMA misfires are logged and reported to software.

Because of the relative simplicity in achieving these benefits, software vendors should consider recommending to their customers that using Intel® architecture-based servers that support the latest versions of Intel VT as enterprise virtualization platforms automatically delivers benefits in terms of performance, scalability, reliability, and security.

Best Practices: VT-d DMA Remapping

Intel VT-d DMA remapping allows for reduction in VM exits for assigned devices. DMA requests specify a requester-ID and address, and remap hardware transforms the request to a physical memory access, using a software-programmed table structure in memory.

DMA remapping hardware enforces isolation by validating that the requester-ID is allowed to access the address and translates the device-provided address into a physical memory address. The hardware can cache frequently used translations in a TLB-like structure, and software may dynamically update remapping tables for efficient re-direction. DMA remapping is applicable to all DMA sources, and it works with existing device hardware.

Best Practices: VT-d Interrupt Remapping

Intel VT-d interrupt remapping allows for reductions in interrupt virtualization overhead for assigned devices. Interrupt requests specify a requester-ID and interrupt-ID, and remap hardware transforms these requests to a physical interrupt, using a software-programmed Interrupt Remap Table structure in memory.

Interrupt remapp ing hardware enforces isolation by validating that the interrupt-ID is from an allowed requester-ID and generates interrupts with attributes from the remap structure. The hardware can cache frequently used interrupt-remap structures, and software may dynamically update remap entries for efficient interrupt re-direction. Interrupt remapping is applicable to all interrupt sources, including legacy interrupts delivered through I/O APICs and message-signaled interrupts, including MSI, MSI-x, and MSI-v. This process works with existing device hardware.

Best Practices: Comparing EPT with Shadow-Page Tables

In order to compare EPT with predecessor Intel VT-x (using shadow-page tables), it is first necessary to consider some terminology:

Guest-linear address: produced by guest software

Guest-physical address: translation of guest-linear address produced by page tables maintained (or desired) by guest operating system

Host-physical address: actual address used to access memory

Addressing with Shadow Page Tables	Addressing with EPT
Guest maintains guest page tables that map guest-linear to guest-physical	Guest maintains page tables that map guest-linear to guest-physical
VMM maintains active page tables that map guest-linear to host-physical	VMM maintains extended page tables that map guest-physical to host-physical
CPU uses only active page tables	CPU uses both sets of tables

Best Practices: Identifying the Balance Between Overheads With and Without Hardware Page Table Virtualization

Hardware page table virtualization eliminates exit overhead from guest and monitor MMU (Memory Management Unit) page faults, CR3 (Control Register 3) changes, and INVLPG (Invalidate Translation Look-Aside Buffer Entry), but adds overhead to the page walk and TLB (Translation Look-Aside Buffer) fill processes.

The balance between the impacts of hardware page table virtualization is such that some applications benefit more than others from its performance effects. Specifically, applications with high levels of process creation and memory allocation see the most benefit.

Best Practices: Characterizing Workloads with Regard to VT-x Exit Transitions

As mentioned above, performance benefits from hardware page virtualization are tied to the prevalence of VT-x exit transitions. Specifically, higher levels of these events in an a pplication workload approximately suggest higher likelihood for benefit from hardware page table virtualization.

VT-x exit transitions from the guest to monitor occur when guest code accesses or modifies privileged virtualized state, executes privileged instructions, or handles certain external events (such as external interrupts). Frequent exits are caused by page faults, external interrupts, control register reads/writes, and I/O instructions. Page fault exits include guest page table faults, MMU faults, APIC (Advanced Programmable Interrupt Controller) reads and writes, and device MMIO (memory-mapped I/O) reads and writes.

Using an instrumented VMM, it is possible to detect exit transitions that occur during code execution. By comparing this data between different builds of applications during development, it is possible to gain additional insight into the contributors to performance in virtual environments under Intel VT from hardware page table virtualization.

Best Practices: Using VT-d to Manage Access to Restricted Resources

Enterprise administration often requires security and management agents to be placed on user machines that it is desirable to make inaccessible both to users and to unauthorized code. For example, restricting access to an intrusion-detection agent in this way could prevent it from being removed by the user or compromised by malicious code.

DMA mapping under VT-d makes it possible for agents to be placed in a dedicated service-partition VM, the memory pages of which are accessible only by specific DMA devices (such as NICs specified by IT). Thus, access to the service partition can be controlled by system administrators, effectively isolating the security and management agents from the user.

Best Practices: Avoiding Bounce Buffer Conditions

Some I/O devices have limited DMA addressability that prevents access to high memory. In order to copy I/O buffers into high memory, software may use bounce buffer techniques; a bounce buffer is a memory area used for the temporary storage of data that is copied between the device and a device-inaccessible memory area. Using this copying technique introduces significant overhead. System software using VT-d can use DMA remapping to overcome the device's addressability limitations, redirecting the data to high memory without resort to bounce buffer techniques.

Conclusion

This article is part of a series of guides that identify best practices for the use of common products and technologies with Intel VT to support virtualized enterprise workloads. The entire series is introduced in the companion article, "Intel® Virtualization Technology: Best Practices for Software Vendors," which provides a general introduction to virtualization best practices, as well as links to each guide in the series.

For More Information

Intel VT Platform Technology Site provides an overview and description of the technology, as well as links to resources that help you identify potential benefits and the products that deliver them.
Intel® Virtualization Technology for Directed I/O is an Intel® Technology Journal discussion of current I/O virtualization techniques.
Wikipedia Article on Paravirtualization provides history, background, and related resources relate to paravirtualization technology.