Xen

Context

Xen is very influential in both academia and industry because the authors made it open source.

Goal

A little bit different from VM/370 and VMWare, Zen focuses on servers. With this is mind, we need:
1. Isolation
2. Performance Isolation
3. Multiple Operating Systems
4. Scalibility

Implementation

What makes x86 so hard to virtualize?

x86 doesn't trap on all invalid instructions → we cannot use the trap and emulate technique applied by IBM
The same instruction might have different behaviors when run in a user level vs a prillege level.
x86 (and arm) has hardware managed TLB

CPU

Xen's ApproachVMWare's approach

Paravirtualization:

Privilege instructions in operating systems are replaced by "hyper call" → we trap to the hypervisor.

Binary rewriting:

Take a look at a block of instructions, if they only contain user level unprivileged instructions, VMWare let the cpu execute this block directly, and then jump back to VMWare software. VMWare then goes on to the next block.

If the block contains privilege instructions, VMWare rewrites the binary that would jump back to VMWare. When the CPU execute the instructions to the line that is rewritten, it will jump back to VMWare.

It is very hard to do static binary rewriting. Therefore, it is probably done dynamically.

Memory

Usually, the operating system is responsible for mapping virtual addresses to physical addresses. That is, the operating system creates and maintains the page tables. In x86, the operating system writes this to MMU.

The physical memory that hypervisor gives to the operating systems are not incontinuous and not starting from 0 (which is the assumption when operating systems were implemented).

Solution: Virtualize physical memory

Physical memory virtualization:
Add another layer of abstraction between physical memory and the actual DRAM memory.

// TODO: Add a diagram

Virtual memory → Physical memory → Hardware Memory → Physical DRAM

Warning

TLB is hardware managed in x86, and it only maps virtual to physical memory. Now that we have three layers instead of two layers. It becomes a problem.

Since the page table in MMU must map from virtual addresses to hardware addresses, and the page tables maintained in the operating systems only map from virtual to physical, we cannot use operating system's page tables directly.

Xen ApproachVMWare Approach

The Exokernel approach:

Note

This has to be done via modifying the operating system's source code.

The hypervisor will give available pages to the operating systems, and operating systems are going to contruct their page tables using these pages. And whenever the opearting system wants to update the page table, it will make a hyper call to the hypervisor and let the hypervisor verifies and update the "machine page".

The L4 approach: Shadow Pages

Note

The page tables maintained in the operating system is not written to MMU, it is only for the correctness of the operating system. The page tables maintained in the hypervisor are the ones written into MMU/TLB.

VMWare maintains extra page tables in its hypervisor. When VMWare hypervisor does the binary rewriting, it will scan through the code, and see if operating system is going to make changes to the page tables, it will let operating system make the change, but add another call back to VMWare to propagate the change and update the page tables maintained in the hypervisor.

I/O

Xen's Approach

Xen has a special trusted Domain (Dom 0). We configure Xen utilities to run in Dom 0. It will be running as a user level program instead of hypervisors. Usually, this is just a normal linux, and we allow direct access to hardware I/O for Dom 0.

For other operating systems that are running on other virtual machines that are not trusted. Since due to legacy reasons, most operating systems have driver for ethernet card. Therefore, we write a emulated ethernet card in the operating systems. Since we wrote these drivers, they are Xen aware. The emulated ethernet card will deliver packets from other Doms to Dom 0, and use the actual drivers running on its operating system to transfer packet directly to hardware I/Os.

There is a lot of control overhead because we are switching around virual machines.

Hardware optimizations

Hardware support to make CPU more efficient

Add one extra ring (root ring 0) which is lower than ring 0 (higher previllege), and modify hardware accordingly. The hypervisor will run in root ring 0, and the operating system will run in ring 0.

Hardware support to make memory more efficient

Add nested page tables in MMU/TLB to support another layer of abstraction.

Hardware support to make I/O more efficient

VMWare's Approach

Takeaway

My Question

Can we just do everything through the hypervisor?