[1] V. Jeyakumar, M. Alizadeh, and D. Mazieres, “EyeQ: practical network performance isolation for the multi-tenant cloud,” presented at the Proceedings of the 4th …, 2012.

Problem:

  1. 应该有一个类似于Oktopus的virtual abstraction来描述tenant所需求的网络性能模型
  2. 如何实现这样的abstraction,并保证abstraction对每一个tenant都可靠?
    • 合适的拓扑结构,high bisection bandwidth
    • traffic load balancing schemes, e.g. leverage ECMP
    • bandwidth arbitration mechanism. 带宽分配控制机制

    • server-to-server congestion control mechanisms

Goal & Discussion:

  1. 为什么不采用exact rate?
    • 在packet level做bandwidth guarantee 需要网络中switch,per-endpoint queue做支持。
    • 为了通用性和扩展性,EyeQ采用average rate guarantees

    –> 如何使得结果趋于精确:使用时间间隔越小越好

    //interval太大,导致短暂的burst被忽略,从而无法保证带宽

    –> 达到小的时间间隔:分布反馈式的拥塞控制。基于Rate Control Protocol, 达到100s of microseconds

  2. Fat Network or Flat Network?

    • Flat Network 中交换机压力太大,因为没有flow汇聚的地方,拥塞到处都可能发生
    • Fat Network中层次比较多,其中通过TCP,ECMP这些协议的特性,可以知道一般拥塞会发生在edge

Method:

  1. Design:

    EyeQ

    • 发现资源竞争的机制+限制flow速度的机制

    • senders: detect + Weighted Fair Queueing

    • receivers: measures rate every 200μs and uses this information to rate limit flows.
      • using per-endpoint rate meters.
    • REM: per-endpoint rate meters. (a byte counter that periodically (200μs) tracks the endpoint’s receive rate)
      • IPID 143 –> HACK
      • 16-bit feedback / 10kB receive data
      • $C_i$ 根据Bi的比例和receiver的C来计算
      • clocked by incoming packets?
      • 一个包到达以后开始计时,$20\mu s$中到达的包有多少,以此来测算link rate
    • SEM: per-destination rate limiters.

      • root WRR scheduler + rate limiters
      • per-dst traffic 根据HACK,100 ms内没收到HACK,限速1Mb/s
      • 计算$R_i$ 的Rate Control Loop:
        • receive rate $y_i$ + $C_i$
        • $ R_i \gets R_i(1-\alpha\cdot { {y_i-C_i} \over C_i}) $
        • 通过 $z[n+1] = b*z[n](1-z[n])$ 来证明该算法收敛, worst 30 iterations
      • 如果network core 出现拥塞,ECN能够让EyeQ缓解in-network的拥塞
    • 对Network的要求:
      • the sum of bandwidth guarantees < access links bandwidth
      • the sum of bandwidth guarantees < Spine bandwidth
      • 剩下的bandwidth,以best-effort分配给没有guarantee的VM
  2. Implementation:
    • 1900 lines of C code and about 700 lines of header files.
    • REM
      • 1 byte的counter将成为overhead,因此
      • 用200 microsecond的原因
    • per-core queue
      • split the ideal rate limiter’s FIFO queue into a per-CPU queue
      • per-queue token bucket
  3. Evaluation
    • 16 x server, with 4 core 双线程
    • memcached + UDP interference

Related:

  • Seawall
  • Oktupos
  • SecondNet
  • Gatekeeper
    • sender rate limiter + receiver feedback + congestion on edge + fully-provisioned network core
    • Gatekeeper从high level上基本和EyeQ一样,但是EyeQ的工作相对来说更solid。EyeQ评价Gatekeeper为:lacks details on the system design: rate control mechanism, bandwidth guarantees at short timescales. 另外还说gatekeeper的evaluation做得不好:Gatekeeper’s evaluation is limited to static scenarios with long lived flows. 最后实现上,Gatekeeper使用的是htb而不如EyeQ使用的per-core queue.
  • FairCloud
  • NetShare
  • Netlord

Misc: