EyeQ | Chapter09's space

[1] V. Jeyakumar, M. Alizadeh, and D. Mazieres, “EyeQ: practical network performance isolation for the multi-tenant cloud,” presented at the Proceedings of the 4th …, 2012.

Problem:

应该有一个类似于Oktopus的virtual abstraction来描述tenant所需求的网络性能模型
如何实现这样的abstraction，并保证abstraction对每一个tenant都可靠？
- 合适的拓扑结构，high bisection bandwidth
- traffic load balancing schemes, e.g. leverage ECMP
- bandwidth arbitration mechanism. 带宽分配控制机制
- server-to-server congestion control mechanisms

Goal & Discussion:

为什么不采用exact rate?
- 在packet level做bandwidth guarantee 需要网络中switch，per-endpoint queue做支持。
- 为了通用性和扩展性，EyeQ采用average rate guarantees
–> 如何使得结果趋于精确：使用时间间隔越小越好

//interval太大，导致短暂的burst被忽略，从而无法保证带宽

–> 达到小的时间间隔：分布反馈式的拥塞控制。基于Rate Control Protocol, 达到100s of microseconds
Fat Network or Flat Network?
- Flat Network 中交换机压力太大，因为没有flow汇聚的地方，拥塞到处都可能发生
- Fat Network中层次比较多，其中通过TCP，ECMP这些协议的特性，可以知道一般拥塞会发生在edge

Method:

Design:
- 发现资源竞争的机制+限制flow速度的机制
- senders: detect + Weighted Fair Queueing
- receivers: measures rate every 200μs and uses this information to rate limit flows.
  - using per-endpoint rate meters.
- REM: per-endpoint rate meters. (a byte counter that periodically (200μs) tracks the endpoint’s receive rate)
  - IPID 143 –> HACK
  - 16-bit feedback / 10kB receive data
  - $C_i$ 根据Bi的比例和receiver的C来计算
  - clocked by incoming packets?
  - 一个包到达以后开始计时，$20\mu s$中到达的包有多少，以此来测算link rate
- SEM: per-destination rate limiters.
  - root WRR scheduler + rate limiters
  - per-dst traffic 根据HACK，100 ms内没收到HACK，限速1Mb/s
  - 计算$R_i$ 的Rate Control Loop:
    - receive rate $y_i$ + $C_i$
    - $ R_i \gets R_i(1-\alpha\cdot { {y_i-C_i} \over C_i}) $
    - 通过 $z[n+1] = b*z[n](1-z[n])$ 来证明该算法收敛, worst 30 iterations
  - 如果network core 出现拥塞，ECN能够让EyeQ缓解in-network的拥塞
- 对Network的要求：
  - the sum of bandwidth guarantees < access links bandwidth
  - the sum of bandwidth guarantees < Spine bandwidth
  - 剩下的bandwidth，以best-effort分配给没有guarantee的VM
Implementation:
- 1900 lines of C code and about 700 lines of header files.
- REM
  - 1 byte的counter将成为overhead，因此
  - 用200 microsecond的原因
- per-core queue
  - split the ideal rate limiter’s FIFO queue into a per-CPU queue
  - per-queue token bucket
Evaluation
- 16 x server, with 4 core 双线程
- memcached + UDP interference

Seawall
Oktupos
SecondNet
Gatekeeper
- sender rate limiter + receiver feedback + congestion on edge + fully-provisioned network core
- Gatekeeper从high level上基本和EyeQ一样，但是EyeQ的工作相对来说更solid。EyeQ评价Gatekeeper为：lacks details on the system design: rate control mechanism, bandwidth guarantees at short timescales. 另外还说gatekeeper的evaluation做得不好：Gatekeeper’s evaluation is limited to static scenarios with long lived flows. 最后实现上，Gatekeeper使用的是htb而不如EyeQ使用的per-core queue.
FairCloud
NetShare
Netlord

Misc: