A Dynamic Congestion Management System for InfiniBand Networks

Authors

  • Fabrice Mizero University of Virginia, Charlottesville
  • Malathi Veeraraghavan University of Virginia, Charlottesville
  • Qian Liu University of New Hampshire, Durham
  • Robert D. Russell University of New Hampshire, Durham
  • John M. Dennis National Center for Atmospheric Research, Boulder

DOI:

https://doi.org/10.14529/jsfi160201

Abstract

While the InfiniBand link-by-link flow control helps avoid packet loss, it unfortunately causes the effects of congestion to spread through a network. Flows whose paths do not even pass through congested ports could suffer from reduced throughput. We propose a Dynamic Congestion Management System (DCMS) to address this problem. Without per-flow information, the DCMS leverages performance counters of switch ports to detect onset of congestion, and determines whether-or-not victim flows are present. The DCMS then takes actions to cause an aggressive reduction in the sending rates of congestion-causing (contributor) flows if victim flows are present. On the other hand, in the absence of victim flows, the DCMS allows the contributor flows to maintain high sending rates and finish as quickly as possible.Our results show that dynamic congestion management can enable a network to serve both contributor flows and victim flows effectively. The DCMS solution operates within the constraints of the InfiniBand Standard.

References

Pfster G, Gusat M, Denzel W, Craddock D, Ni N, Rooney W, et al. Solving hot spot contention using InfiniBand architecture congestion control. In: Proceedings In High Performance Interconnects for Distributed Computing, 2005, Research Triangle Park, NC; 2005.

Gran EG, Reinemo SA, Lysne O, Skeie T, Zahavi E, Shainer G. Exploring the Scope of the InfiniBand Congestion Control Mechanism. In: Parallel Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International; 2012. p. 1131-1143.

InfiniBand Trade Association. InfiniBand Architecture Specification Volume 1, Release 1.3; 2015. Available from: http://infinibandta.org.

Gran EG, Reinemo SA. InfiniBand Congestion Control: Modelling and Validation. In Proceedings of the 4th International ICST Conference on Simulation Tools and Techniques. SIMUTools '11. ICST, Brussels, Belgium, Belgium: ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering); 2011. p. 390-397.

Gran EG, Eimot M, Reinemo SA, Skeie T, Lysne O, Huse LP, et al. First experiences with congestion control in InfiniBand hardware. In: Parallel Distributed Processing (IPDPS), 2010 IEEE International Symposium on; 2010. p. 1-12.

Guay WL, Reinemo SA, Lysne O, Skeie T. dFtree: A Fat-tree Routing Algorithm Using Dynamic Allocation of Virtual Lanes to Alleviate Congestion in InfiniBand Networks. In: Proceedings of the First International Workshop on Network-aware Data Management. NDM '11. New York, NY, USA: ACM; 2011. p. 1-10.

Gran EG, Zahavi E, Reinemo SA, Skeie T, Shainer G, Lysne O. On the Relation betweenCongestion Control, Switch Arbitration and Fairness. In: Cluster, Cloud and Grid Computing (CCGrid), 2011 11th IEEE/ACM International Symposium on; 2011. p. 342-351.

Santos JR, Turner Y, Janakiraman G. End-to-end congestion control for InfiniBand. In INFOCOM 2003. Twenty-Second Annual Joint Conference of the IEEE Computer and Communications. IEEE Societies. vol. 2; 2003. p. 1123-1133 vol.2.

Duato J, Johnson I, Flich J, Naven F, Garcia P, Nachiondo T. A new scalable and cost-effective congestion management strategy for lossless multistage interconnection networks. In: High-Performance Computer Architecture, 2005. HPCA-11. 11th International Symposium on. IEEE; 2005.p. 108-119.

Gomez ME, Flich J, Robles A, Lopez P, Duato J. VOQSW: a methodology to reduce HOL blocking in InfiniBand networks. In: Parallel and Distributed Processing Symposium, 2003. Proceedings. International; 2003. p. 10.

Garcia PJ, Quiles FJ, Flich J, Duato J, Johnson I, Naven F. Effcient, Scalable Congestion Management for Interconnection Networks. Micro, IEEE. 2006 Sept; 26(5):52-66.

Nachiondo T, Flich J, Duato J. Buffer Management Strategies to Reduce HoL Blocking. Parallel and Distributed Systems, IEEE Transactions on. 2010 June;21(6):739-753.

Guay WL, Bogdanski B, Reinemo SA, Lysne O, Skeie T. vFtree - A Fat-Tree Routing Algorithm Using Virtual Lanes to Alleviate Congestion. In: Parallel Distributed Processing Symposium (IPDPS), 2011 IEEE International; 2011. p.197-208.

Escudero-Sahuquillo J, Garcia PJ, Quiles FJ, Reinemo SA, Skeie T, Lysne O, et al. A new proposal to deal with congestion in InfiniBand-based fat-trees. Journal of Parallel and Distributed Computing. 2014;74(1):1802-1819.

Zahavi E. InfiniBand adaptive congestion control adaptive marking rate. Google Patents; 2010. US Patent App. 12/245,814.

Downloads

Published

2016-09-19

How to Cite

Mizero, F., Veeraraghavan, M., Liu, Q., Russell, R. D., & Dennis, J. M. (2016). A Dynamic Congestion Management System for InfiniBand Networks. Supercomputing Frontiers and Innovations, 3(2), 5–20. https://doi.org/10.14529/jsfi160201