Skip to content

GMS

Context

Settings: you have a cluster of machines take, you want to make use of distributed resources to make them more efficient


Goal

  1. high probability evict M oldest page (the order can be arbitrary) at the end of an epoch → randomized

Implementation

Core Assumptions

  1. Turst/Sharing
  2. Multiplexing
  3. Idleness
  4. Reliable network (and faster than reading from Disks)



Memory Hierarchy

Register
Cache
DRAM
        ← global memeory (inserted by GMS for optimization)
Hard Disk



Core Mechnisms

  1. Find page()
  2. Put page()
    • ideal: run LRU algorithm on the entire cluster (Not feasible because collecting the information gives a large overhead)
    • when even the global memory is full
  3. Get page()



Achieve Goal 1
At the beginning of the epoch, since we know every single page in the global memory and their time, we know the M oldest pages and their locations. We then use this as the weight for randomization. When an active node needs to evict a page from the global memory, it picks idle nodes with global memory randomly using weight mentioned above. For example, an idle node hold 50% of the M oldest pages, then it will have 50% probability being selected to evict a page. Active nodes do need not to coordinate their decisions; decisions are made locally. But the sum of all local decisions over the time should achieve Goal 1 due to randomization.

Note

This is a precise algorithm. It's more of an approximation, but it is good enough and close enough to our higher level goal.

Also, the pages evicted could be more than M, but it should be relatively close to M (if calculations are done properly), so in short words, in algorithm works well.



When a Machine goes from idle to active

When a machine goes from idle to active, GMS will start replacing global pages with local pages as page fault occurs. However, there are two choices for GMS right now, it can either just abandon these global pages or send them over the network to other idle machines.

What should we do when swapping out global pages in an active machine? abandon vs storing in another idle machine

This is determined by whether the page that is about to be evicted x is one of the M oldest pages mentioned above.

  1. If x is one of the M oldest pages, then just evict x
  2. If x is not one of the M oldest pages, then evict x and replace it on another idle machine with a page y that is one of the M oldest pages for this epoch



Software Managed TLB vs Hardware Managed TLB

//TODO ask Prof



Further optimization with idle CPU resource




Takeaway

Back to top