Energy-efficient Algorithms for Ultrascale Systems


  • Jesus Carretero Computer Science and Engineering Dep. Engineering School University Carlos III of Madrid, Madrid
  • Salvatore Distefano Politecnico di Milano, Milano
  • Dana Petcu West University of Timisoara, Timisoara
  • Daniel Pop West University of Timisoara, Timisoara
  • Thomas Rauber University Bayreut, Bayreut
  • Gudula Rünger Technical University Chemnitz, Chemnitz
  • David E. Singh Computer Science and Engineering Dep. Engineering School University Carlos III of Madrid, Madrid



The chances to reach Exascale or Ultrascale Computing are strongly connected with the problem of the energy consumption for processing applications. For physical as well as economical reasons, the energy consumption has to be reduced significantly to make Ultrascale Computing possible. The research efforts towards energy-saving mechanisms of the hardware has already led to energy-aware hardware systems available today. However, hardware mechanisms can only obtain an energy reduction if software can exploit them such that energy-efficient computing actually results. In the software area, there also exists a multitude of research approaches towards energy saving. These research approaches and results are often isolated either on the system software level or the application organization level, reflecting the expertise of the corresponding research group. The challenge of reducing the energy consumption dramatically to make Ultrascale Computing possible are so ambitions that a concerted action combining all these software levels and research efforts seems reasonable. In this article, we demonstrate the current research efforts and results related to energy in the diverse areas of software. Moreover, we conclude with open problems and questions concerning energy-related techniques with an emphasis on the application algorithmic side.


M. Abdel-Majeed, D. Wong, and M. Annavaram. Warped Gates: Gating Aware Scheduling and Power Gating for GPGPUs. In Proc. of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-46, pages 111–122, New York, NY, USA, 2013. ACM. DOI: 10.1145/2540708.2540719.

S. Afzal, M.F. Saleem, F. Jan, and M. Ahmad. A Review on Green Software Development in a Cloud Environment Regarding Software Development Life Cycle (SDLC) Perspective. International Journal of Computer Trends and Technology (IJCTT), 4(9), 2013.

G. Agosta, M. Bessi, E. Capra, and C. Francalanci. Dynamic memorization for energy efficiency in financial applications. In Proc of the 2011 Int. Green Computing Conference and Workshops (IGCC), pages 1–8, July 2011.

S. Albers. Energy-efficient Algorithms. Commun. ACM, 53(5):86–96, May 2010. DOI: 10.1145/1735223.1735245.

J.I. Aliaga1, H. Anzt, M. Castillo, J. C. Fernandez, G. Leon, J. Perez, and E.S. Quintana-Orti. Unveiling the performance-energy trade-off in iterative linear system solvers for multithreaded processors. Concurrency and Computation: Practice and Experience, 26(17), 2014.

H Amur, J Cipar, V Gupta, and GR Ganger. Robust and Flexible Power-Proportional Storage. ACM Symposium on Cloud Computing (SOCC), 2010.

G. Ananthanarayanan, A. Ghodsi, S. Shenker, and I. Stoica. Disk-locality in Datacenter Computing Considered Irrelevant. In Proc. of the 13th USENIX Conference on Hot Topics in Operating Systems, HotOS’13, pages 12–12, Berkeley, CA, USA, 2011. USENIX Association.

J. Ansel, M. Pacula, Y. Wong, C. Chan, M. Olszewski, U. O’Reilly, and S. Amarasinghe. SiblingRivalry: online autotuning through local competition. In Proc. of the 2012 Int. Conf. on Compilers, Architectures and Synthesis for Embedded Systems, pages 91–100. ACM, 2012.

K. Asanovic, R. Bodik, B. Catanzaro, J. Gebis, P. Husbands, K. Keutzer, D. Patterson, W. Plishker, J. Shalf, S. Williams, and K. Yelick. The Landscape of Parallel Computing Research: A View from Berkeley. Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley, Dec 2006.

G. Aupy, Y. Robert, F. Vivien, and D. Zaidouni. Checkpointing Strategies with Prediction Windows. In Proc. of the 19th Pacific Rim International Symposium on Dependable Computing (PRDC), pages 1–10. IEEE, Dec 2013.

V. Avelar, D. Azevedo, and A. French. PUETM: A Comprehensive examination of the metric, , 2012.

W. Baek and T. Chilimbi. Green: A Framework for Supporting Energy-conscious Programming Using Controlled Approximation. In Proc. of the 2010 ACM SIGPLAN Conf. on Programming Language Design and Implementation, PLDI ’10, pages 198–209, New York, NY, USA, 2010. ACM. DOI: 10.1145/1806596.1806620.

D.H. Bailey, E. Barszcz, L. Dagum, and H.D. Simon. NAS parallel benchmark results. Parallel Distributed Technology: Systems Applications, IEEE, 1(1):43–51, 1993.

O. Beaumont and L. Marchal. What Makes Affinity-Based Schedulers So Efficient ?, October 2013.

Kent Beck. Test Driven Development: By Example. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2002.

C. Belady, A. Rawson, J.Pfleuger, and T. Cader. Green Grid Data Center Power Efficiency Metrics: PUE and DCIE, 2008.

C. Bienia, S. Kumar, J.P. Singh, and K. Li. The PARSEC Benchmark Suite: Characterization and Architectural Implications. In Proc. of the 17th Int. Conf. on Parallel Architectures and Compilation Techniques, October 2008.

J. Bilmes, K. Asanovic, C. Chin, and J. Demmel. Optimizing Matrix Multiply Using PHiPAC: A Portable, High-performance, ANSI C Coding Methodology. In Proc. of the 11th Int. Conf. on Supercomputing, ICS ’97, pages 340–347, New York, NY, USA, 1997. ACM. DOI: 10.1145/263580.263662.

W. Bland, P. Du, A. Bouteiller, T. Herault, G. Bosilca, and J. Dongarra. Extending the scope of the Checkpoint-on-Failure protocol for forward recovery in standard MPI. Concurrency and computation: Practice and experience, 25(17):2381–2393, 2013.

Y. Boykov and V. Kolmogorov. An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(9):1124–1137, 2004.

C. Calero and M. Piattini, editors. Green in Software Engineering. Springer, 2015. ISBN 978-3-319-08580-7.

H. Casanova, Y. Robert, and U. Schwiegelshohn. Algorithms and Scheduling Techniques for Exascale Systems (Dagstuhl Seminar 13381). Dagstuhl Reports, 3(9):106–129, 2014. DOI: 10.4230/DagRep.3.9.106.

H. Chen and W. Shi. Power Measurement and Profiling. In I. Ahmad and S. Ranka, editors, Handbook of Energy-Aware and Green Computing, pages 649–674. CRC Press, 2012.

G.L.T. Chetsa, L. Lefevre, J. Pierson, P. Stolf, and G. Da Costa. Beyond CPU Frequency Scaling for a Fine-grained Energy Control of HPC Systems. In Proc. of the 24th Int. Symp. on Computer Architecture and High Performance Computing (SBAC-PAD), pages 132–138, Oct 2012. DOI: 10.1109/SBAC-PAD.2012.32.

J. Choi, M. Dukhan, X. Liu, and R.W. Vuduc. Algorithmic Time, Energy, and Power on Candidate HPC Compute Building Blocks. In Proc. of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pages 447–457, 2014. DOI: 10.1109/IPDPS.2014.54.

ACPI Promoters Corporation. Advanced configuration and power interface specification. Technical report, ACPI Promoters Corporation, 11 2013.

IBM Corporation. IBM Systems Director Active Energy Manager. /systems/director/aem. Accessed January 16, 2015.

Intel Corporation. Intel Datacenter Manager Energy Director. /www/us/en/software/intel-energy-director-product-detail.html. Accessed January 16, 2015.

C. Ţăpuş, I-H. Chung, and J. Hollingsworth. Active Harmony: Towards Automated Performance Tuning. In Proceedings of the ACM/IEEE Conference on Supercomputing, SC ’02, pages 1–11, Los Alamitos, CA, USA, 2002. IEEE Computer Society Press.

H. Cui, J. Wu, C. Tsai, and J. Yang. Stable Deterministic Multithreading Through Schedule Memorization. In Proc. of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI’10, pages 1–13, Berkeley, CA, USA, 2010. USENIX Association.

W. Dargie. A Stochastic Model for Estimating the Power Consumption of a Processor. IEEE Transactions on Computers, PP(99):1–1, 2014. DOI: 10.1109/TC.2014.2315629.

Q. Deng, D. Meisner, A. Bhattacharjee, T.F. Wenisch, and R. Bianchini. MultiScale: Memory System DVFS with Multiple Memory Controllers. In Proc. of the 2012 ACM/IEEE International Symposium on Low Power Electronics and Design, ISLPED ’12, pages 297–302, New York, NY, USA, 2012. ACM. DOI: 10.1145/2333660.2333727.

Q. Deng, D. Meisner, L. Ramos, T.F. Wenisch, and R. Bianchini. MemScale: Active Low-power Modes for Main Memory. In Proc. of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVI, pages 225–238, New York, NY, USA, 2011. ACM. DOI: 10.1145/1950365.1950392.

J. Dongarra, G. Bosilca, Z. Chen, V. Eijkhout, G. E. Fagg, E. Fuentes, J. Langou, P. Luszczek, J. Pjesivac-Grbovic, K. Seymour, H. You, and S. S. Vadhiyar. Self-adapting Numerical Software (SANS) Effort. IBM J. Res. Dev., 50(2/3):223–238, March 2006. DOI: 10.1147/rd.502.0223.

J. Dongarra, P. Luszczek, and A. Petitet. The LINPACK Benchmark: past, present and future. Concurrency and Computation: Practice and Experience, 15(9):803–820, 2003.

G.C. Durelli, M. Pogliani, A. Miele, C. Plessl, H. Riebler, M.D. Santambrogio, G. Vaz, and C. Bolchini. Runtime Resource Management in Heterogeneous System Architectures: The SAVE Approach. In Proc. of the International Symposium on Parallel and Distributed Processing with Applications (ISPA), pages 142–149. IEEE, Aug 2014.

J. Dongarra et al. The International Exascale Software Project Roadmap. Int. J. High Perform. Comput. Appl., 25(1):3–60, February 2011. DOI: 10.1177/1094342010391989.

P. Kogge et al. ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems, 2008.

EU. European technological Platform for High Performance Computing, Vision White paper, 2012.

M. Frigo and S. G. Johnson. The Design and Implementation of FFTW3. Proceedings of the IEEE, 93(2):216–231, 2005. Special issue on “Program Generation, Optimization, and Platform Adaptation”.

E. Garcia, J. Arteaga, R. Pavel, and G. Gao. Optimizing the LU Factorization for Energy Efficiency on a Many-Core Architecture. In Proc. of the 26th Int. Workshop on Languages and Compilers for Parallel Computing (LCPC 2013), pages 237–251. Springer LNCS 8664, 2013.

W. Gropp and M. Snir. Programming for Exascale Computers. Computing in Science and Engineering, 15(6):27–35, 2013. DOI: 10.1109/MCSE.2013.96.

P. Gschwandtner, J. Durillo, and T. Fahringer. Multi-Objective Auto-Tuning with Insieme: Optimization and Trade-Off Analysis for Time, Energy and Resource Usage. In Fernando Silva, Inês Dutra, and Vítor Santos Costa, editors, Euro-Par 2014 Parallel Processing, volume 8632 of Lecture Notes in Computer Science, pages 87–98. Springer International Publishing, 2014. DOI: 10.1007/978-3-319-09873-9_8.

Shin gyu K., Chanho C., Hyeonsang E., H.Y. Yeom, and Huichung B. Energy-Centric DVFS Controling Method for Multi-core Platforms. In 2012 SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC), pages 685–690, Nov 2012. DOI: 10.1109/SC.Companion.2012.94.

M. Hähnel, B. Döbel, M. Völp, and H. Härtig. Measuring Energy Consumption for Short Code Paths Using RAPL. SIGMETRICS Perform. Eval. Rev., 40(3):13–17, January 2012. DOI: 10.1145/2425248.2425252.

Taliver Heath, Bruno Diniz, Enrique V. Carrera, Wagner Meira, Jr., and Ricardo Bianchini. Energy conservation in heterogeneous server clusters. In Proc. of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’05, pages 186–195, New York, NY, USA, 2005. ACM. DOI: 10.1145/1065944.1065969.

B Heller, S Seetharaman, P Mahadevan, Y Yiakoumis, P Sharma, S Banerjee, and N McKeown. ElasticTree: Saving energy in data center networks. Proceedings of the 7th USENIX conference on Networked systems design and implementation, pages 17–17, 2010.

J.L. Hennessy and D.A. Patterson. Computer Architecture - A Quantitative Approach (5. ed.). Morgan Kaufmann, 2012.

H. Hoffmann, S. Sidiroglou, M. Carbin, S. Misailovic, A. Agarwal, and M. Rinard. Dynamic Knobs for Responsive Power-aware Computing. SIGPLAN Not., 46(3):199–212, March 2011. DOI: 10.1145/1961296.1950390.

HP. HP Power Advisor A tool for estimating power requirements of HP enterprise solutions. Technical report, Hewlett-Packard Development Company, 01 2013.

C.-H. Hsu and Wu chun Feng. A power-aware run-time system for high-performance computing. In Proc. of the ACM/IEEE Conference on Supercomputing, pages 1–1, Nov 2005. DOI: 10.1109/SC.2005.3.

E. Im and K. Yelick. Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY. In V.. Alexandrov, J. Dongarra, B. Juliano, R. Renner, and C. Tan, editors, Computational Science — ICCS 2001, volume 2073 of Lecture Notes in Computer Science, pages 127–136. Springer Berlin Heidelberg, 2001. DOI: 10.1007/3-540-45545-0_22.

K. Iskra, K. Yoshii, R. Gupta, and P. Beckman. Power Management for Exascale, 2012.

S. Jana, J. Schuchart, and B. Chapman. Analysis of Energy and Performance of PGAS-based Data Access Patterns. In Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, PGAS ’14, pages 15:1–15:10, New York, NY, USA, 2014. ACM. DOI: 10.1145/2676870.2676882.

N. Kalinnik, M. Korch, and T. Rauber. Online auto-tuning for the time-step-based parallel solution of ODEs on shared-memory systems. Journal of Parallel and Distributed Computing, 74(8):2722–2744, 2014.

S. Kaxiras and M. Martonosi. Computer Architecture Techniques for Power-Efficiency. Morgan & Claypool Publishers, 2008.

G. Kestor, R. Gioiosa, D. Kerbyson, and Hoisie A. Quantifying the energy cost of data movement in scientific applications. In Proceedings of the IEEE International Symposium on Workload Characterization, IISWC, pages 56–65, 2013. DOI: 10.1109/IISWC.2013.6704670.

P. Kogge and J. Shalf. Exascale Computing Trends: Adjusting to the New Normal for Computer Architecture. Computing in Science and Engineering, 15(6):16–26, November 2013. DOI: 10.1109/MCSE.2013.95.

V.A. Korthikanti and G. Agha. Towards optimizing energy costs of algorithms for shared memory architectures. In SPAA ’10: Proc. of the 22nd ACM Symposium on Parallelism in Algorithms and Architectures, pages 157–165, New York, NY, USA, 2010. ACM. DOI: 10.1145/1810479.1810510.

I. Koutsopoulos and M. Halkidi. Measurement aggregation and routing techniques for energy-efficient estimation in wireless sensor networks. In Proc. of the 8th International Symposium on Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks (WiOpt), pages 1–10, May 2010.

P. Lago, R. Kazman, N. Meyer, M. Morisio, H.A. Müller, and F. Paulisch. Exploring Initial Challenges for Green Software Engineering: Summary of the First GREENS Workshop, at ICSE 2012. SIGSOFT Softw. Eng. Notes, 38(1):31–33, January 2013. DOI: 10.1145/2413038.2413062.

J. Lang and G. Rünger. An Execution Time and Energy Model for an Energy-aware Execution of a Conjugate Gradient Method with CPU/GPU Collaboration. J. Parallel Distrib. Comput., 74(9):2884–2897, September 2014. DOI: 10.1016/j.jpdc.2014.06.001.

J. Lang, G. Rünger, and P. Stöcker. Towards energy-efficient linear algebra with an ATLAS library tuned for energy consumption. In The 2015 International Conference on High Performance Computing and Simulation (HPCS 2015), 2015.

K.-D. Lange, M.G. Tricker, J.A. Arnold, H. Block, and S. Sharma. SPECpower_Ssj2008: Driving Server Energy Efficiency. In Proc. of the 3rd ACM/SPEC International Conference on Performance Engineering, ICPE ’12, pages 253–254, New York, NY, USA, 2012. ACM. DOI: 10.1145/2188286.2188329.

E. Levy, A. Barak, A. Shiloh, M. Lieber, C. Weinhold, and H. Härtig. Overhead of a Decentralized Gossip Algorithm on the Performance of HPC Applications. In Proc. of the 4th Int. Workshop on Runtime and Operating Systems for Supercomputers, ROSS ’14, pages 10:1–10:7, New York, NY, USA, 2014. ACM. DOI: 10.1145/2612262.2612271.

A. Lewis, S. Ghosh, and N.-F. Tzeng. Run-time Energy Consumption Estimation Based on Workload in Server Systems. In Proc. of the 2008 Conference on Power Aware Computing and Systems, HotPower’08, pages 4–4, Berkeley, CA, USA, 2008. USENIX Association.

G. Li and Y. Wang. Automatic ARIMA modeling-based data aggregation scheme in wireless sensor networks. EURASIP Journal on Wireless Communications and Networking, 2013(1):1–13, 2013.

J. Li, J. Chinneck, M. Woodside, M. Litoiu, and G. Iszlai. Performance model driven QoS guarantees and optimization in Clouds. In CLOUD ’09: Proc. of the 2009 ICSE Workshop on Software Engineering Challenges of Cloud Computing, pages 15–22, Washington, DC, USA, 2009. IEEE Computer Society. DOI: 10.1109/CLOUD.2009.5071528.

W. Liu, H. Li, Wei Du, and F. Shi. Energy-Aware Task Clustering Scheduling Algorithm for Heterogeneous Clusters. In Proc. of the 2011 IEEE/ACM Int. Conf. on Green Computing and Communications, GREENCOM ’11, pages 34–37, Washington, DC, USA, 2011. IEEE Computer Society. DOI: 10.1109/GreenCom.2011.14.

P. Llopis, J.G. Blas, F. Isaila, and J. Carretero. Survey of Energy-Efficient and Power-Proportional Storage Systems. The Computer Journal, 2013. DOI: 10.1093/comjnl/bxt058.

M. Lorenz, P. Marwedel, T. Dräger, G. Fettweis, and R. Leupers. Compiler based exploration of DSP energy savings by SIMD operations. In Proc. of the 2004 Conference on Asia South Pacific Design Automation: Electronic Design and Solution Fair, pages 838–841, 2004. DOI: 10.1145/1015090.1015314.

Jiong Luo, Li-Shiuan Peh, and Niraj Jha. Simultaneous dynamic voltage scaling of processors and communication links in real-time distributed embedded systems. In Proc. of the Conf. on Design, Automation and Test in Europe - Volume 1, DATE ’03, pages 11150–, Washington, DC, USA, 2003. IEEE Computer Society.

T. M. Lynar, R. D. Herbert, S. Chivers, and W. J. Chivers. Resource allocation to conserve energy in distributed computing. Int. J. Grid Util. Comput., 2(1):1–10, 2011.

H. Ma. QoS-driven composition analysis for component-based system development. PhD thesis, Computer Science Department, Richardson, TX, USA, 2007. Adviser-Yen, I-Ling.

K. Ma and X. Wang. PGCapping: Exploiting Power Gating for Power Capping and Core Lifetime Balancing in CMPs. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, PACT ’12, pages 13–22, New York, NY, USA, 2012. ACM. DOI: 10.1145/2370816.2370821.

O. Mämmelä, M. Majanen, R. Basmadjian, H. De Meer, A. Giesler, and W. Homberg. Energy-aware job scheduler for high-performance computing. Computer Science - Research and Development, 27(4):265–275, 2012. DOI: 10.1007/s00450-011-0189-6.

E. Mancini, U. Villano, N. Mazzocca, M. Rak, and R. Torella. Performance-Driven Development of a Web Services Application using MetaPL/HeSSE. In PDP ’05: Proceedings of the 13th Euromicro Conference on Parallel, Distributed and Network-Based Processing, pages 12–19, Washington, DC, USA, 2005. IEEE Computer Society. DOI: 10.1109/EMPDP.2005.31.

H. McCraw, J. Ralph, A. Danalis, and J. Dongarra. Power Monitoring with PAPI for Extreme Scale Architectures and Dataflow-based Programming Models. In Workshop on Monitoring and Analysis for High Performance Computing Systems Plus Applications (HPCMASPA 2014), IEEE Cluster 2014, pages 385–391, Sept 2014.

J. Mei, K. Li, and K. Li. A Resource-aware Scheduling Algorithm with Reduced Task Duplication on Heterogeneous Computing Systems. J. Supercomput., 68(3):1347–1377, June 2014. DOI: 10.1007/s11227-014-1090-4.

A. Orgerie, M. Dias de Assuncao, and L. Lefevre. A Survey on Techniques for Improving the Energy Efficiency of Large-scale Distributed Systems. ACM Comput. Surv., 46(4):47:1–47:31, March 2014. DOI: 10.1145/2532637.

M. Pedram and Inkwon Hwang. Power and Performance Modeling in a Virtualized Server System. In Proc. of the 39th International Conference on Parallel Processing Workshops (ICPPW), pages 520–526. IEEE, Sept 2010. DOI: 10.1109/ICPPW.2010.76.

M. Püschel, J.M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R.W. Johnson, and N. Rizzolo. SPIRAL: Code Generation for DSP Transforms. Proceedings of the IEEE, 93(2):211–215, 2005. Special issue on “Program Generation, Optimization, and Platform Adaptation”.

R. Rajagopalan and P.K. Varshney. Data-aggregation techniques in sensor networks: A survey. IEEE Communications Surveys Tutorials, 8(4):48–63, 2006. DOI: 10.1109/COMST.2006.283821.

V. Rapp and K. Graffi. Continuous Gossip-Based Aggregation through Dynamic Information Aging. In Proc. of the 22nd International Conference on Computer Communications and Networks (ICCCN), pages 1–7, July 2013.

T. Rauber and G. Rünger. A Transformation Approach to Derive Efficient Parallel Implementations. IEEE Transactions on Software Engineering, 26(4):315–339, 2000.

T. Rauber and G. Rünger. Tlib - A Library to Support Programming with Hierarchical Multi-Processor Tasks. Journal of Parallel and Distributed Computing, 65(3):347–360, 2005.

T. Rauber and G. Rünger. Towards an Energy Model for Modular Parallel Scientific Applications. In IEEE International Conference on Green Computing and Communications (GreenCom 2012), pages 523–532. IEEE, 2012. DOI: 10.1109/GreenCom.2012.79.

T. Rauber and G. Rünger. Modeling and Analyzing the Energy Consumption of Fork-Join-based Task Parallel Programs. Concurrency and Computation: Practice and Experience, 27(1):211–236, 2015. DOI: 10.1002/cpe.3219.

T. Rauber, G. Rünger, and M. Schwind. Energy Measurement and Prediction for Multi-threaded Programs. In Proc. of the High Performance Computing Symposium, HPC ’14, pages 20:1–20:9, San Diego, CA, USA, 2014. Society for Computer Simulation International.

T. Rauber, G. Rünger, M. Schwind, H. Xu, and S. Melzner. Energy Measurement, Modeling, and Prediction for Processors with Frequency Scaling. The Journal of Supercomputing, 70(3):1451–1476, 2014. DOI: 10.1007/s11227-014-1236-4.

L. Renganarayana, U. Bondhugula, S. Derisavi, A. E. Eichenberger, and K. O’Brien. Compact multi-dimensional kernel extraction for register tiling. In Proc. of the Conf. on High Performance Computing Networking, Storage and Analysis, page 45. ACM, 2009.

S. Rivoire, M. Shah, P. Ranganathan, and C. Kozyrakis. JouleSort: A Balanced Energy-efficiency Benchmark. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, SIGMOD ’07, pages 365–376. ACM, 2007. DOI: 10.1145/1247480.1247522.

S. Roy, A. Rudra, and A. Verma. Energy Aware Algorithmic Engineering. In Proceedings of the 22nd Int. Symp. on Modelling, Analysis & Simulation of Computer and Telecommunication Systems, MASCOTS ’14, pages 321–330. IEEE Computer Society, 2014. DOI: 10.1109/MASCOTS.2014.47.

N. Satish, C. Kim, J. Chhugani, and P. Dubey. Large-scale Energy-efficient Graph Traversal: A Path to Efficient Data-intensive Supercomputing. In Proc. of the Int. Conf. on High Performance Computing, Networking, Storage and Analysis, SC ’12, pages 14:1–14:11, Los Alamitos, CA, USA, 2012. IEEE Computer Society Press.

S.S. Shenoy and R. Eeratta. Green software development model: An approach towards sustainable software development. In 2011 Annual IEEE India Conference (INDICON), pages 1–6, Dec 2011. DOI: 10.1109/INDCON.2011.6139638.

Y. Shin, J. Seomun, K.-M. Choi, and T. Sakurai. Power Gating: Circuits, Design Methodologies, and Best Practice for Standard-cell VLSI Designs. ACM Trans. Des. Autom. Electron. Syst., 15(4):28:1–28:37, October 2010. DOI: 10.1145/1835420.1835421.

J. Shinde and S.S. Salankar. Clock gating A power optimizing technique for VLSI circuits. In Proc. of the 2011 Annual IEEE India Conference (INDICON), pages 1–4, Dec 2011. DOI: 10.1109/INDCON.2011.6139440.

K. Singh, M. Bhadauria, and S. McKee. Real Time Power Estimation and Thread Scheduling via Performance Counters. SIGARCH Comput. Archit. News, 37(2):46–55, July 2009. DOI: 10.1145/1577129.1577137.

C. U. Smith. Performance Engineering of Software Systems. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1990.

C.U. Smith and L.G. Williams. Performance Solutions: a Practical Guide to Creating Responsive, Scalable Software. Addison Wesley Longman Publishing Co., Inc., Redwood City, CA, USA, 2002.

SPEC Consortium. Standard Performance Evaluation Corp (SPEC),, 2015.

L. Tan, S. Kothapalli, L. Chen, O. Hussaini, R. Bissiri, and Z. Chen. A survey of power and energy efficient techniques for high performance numerical linear algebra operations. Parallel Computing, 40:559–573, 2014.

K.M. Tarplee, R. Friese, A.A. Maciejewski, and H.J. Siegel. Efficient and scalable computation of the energy and makespan Pareto front for heterogeneous computing systems. In Proc. of the Federated Conference on Computer Science and Information Systems (FedCSIS), pages 401–408, Sept 2013.

The Green Grid Consortium. The Green Grid Website:, 2014.

E. Thereska, A. Donnelly, and D. Narayanan. Sierra: practical power-proportionality for data center storage. In EuroSys ’11: Proc. of the 6th Conference on Computer systems. ACM Request Permissions, April 2011.

M. Thiry, L. Frez, and A. Zoucas. GreenRM: Reference Model for Sustainable Software Development. In Proc. of the 26th International Conference on Software Engineering and Knowledge Engineering, pages 39–42, 2014.

A. Tiwari, M. Laurenzano, L. Carrington, and A. Snavely. Auto-tuning for Energy Usage in Scientific Applications. In Proc. of the 2011 Int. Conf. on Parallel Processing - Volume 2, Euro-Par’11, pages 178–187, Berlin, Heidelberg, 2012. Springer-Verlag. DOI: 10.1007/978-3-642-29740-3_21.

M.E. Tolentino and K.W. Cameron. The Optimist, the Pessimist, and the Global Race to Exascale in 20 Megawatts. IEEE Computer, 26(4):95–97, 2012.

TPC Consortium. Transaction Processing Performance Council (TPC),, 2015.

U.S. Environmental Protection Agency. The ENERGYSTAR Website:, 2015.

J. v. Kistowski, H. Block, J. Beckett, K. Lange, J. Arnold, and S. Kounev. Analysis of the Influences on Server Power Consumption and Energy Efficiency for CPU-Intensive Workloads. In Proc. of the 6th ACM/SPEC Int. Conf. on Performance Engineering, ICPE ’15, pages 223–234, New York, NY, USA, 2015. ACM. DOI: 10.1145/2668930.2688057.

A. Venkatesh, K. Kandalla, and D. Panda. Evaluation of Energy Characteristics of MPI Communication Primitives with RAPL. In Proc. of the International Workshop on High Performance Power-Aware Computing at IPDPS. IEEE, 2013.

R. Vuduc, J. Demmel, and K. Yelick. OSKI: A library of automatically tuned sparse matrix kernels. In Proc. SciDAC, J. Physics: Conf. Ser., volume 16, pages 521–530, 2005. DOI: 10.1088/1742-6596/16/1/071.

R.W. Vuduc. Autotuning. In Encyclopedia of Parallel Computing, pages 102–105. 2011.

Lizhe Wang, Samee U. Khan, Dan Chen, Joanna Kołodziej, Rajiv Ranjan, Cheng zhong Xu, and Albert Zomaya. Energy-aware parallel task scheduling in a cluster. Future Generation Computer Systems, 29(7):1661 – 1670, 2013.

R.C. Whaley and J.J. Dongarra. Automatically Tuned Linear Algebra Software. In Proc. of the 1998 ACM/IEEE Conference on Supercomputing, SC ’98, pages 1–27, Washington, DC, USA, 1998. IEEE Computer Society.

M. Wilde, M. Hategan, J.M. Wozniak, B. Clifford, D.S. Katz, and I. Foster. Swift: A Language for Distributed Parallel Scripting. Parallel Computing, 37(9):633–652, September 2011. DOI: 10.1016/j.parco.2011.05.005.

S. Williams, K. Datta, J. Carter, L. Oliker, J. Shalf, K. Yelick, and D Bailey. PERI - auto-tuning memory-intensive kernels for multicore. Journal of Physics Conference Series, 125(1), July 2008.

J.M. Wozniak, T.G. Armstrong, M. Wilde, D.S. Katz, E. Lusk, and I.T. Foster. Swift/T: Large-Scale Application Composition via Distributed-Memory Dataflow Processing. In Proc. of the 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pages 95–102, May 2013. DOI: 10.1109/CCGrid.2013.99.

D. Yamada, Sonobe T., H. Tezuka, and M. Inaba. Grid Spider: A framework for Data-Intensive research with Data Process Memorization Cache. In Proc. of the 4th Int. Confererence on Resource Intensive Applications and Services. INTENSIVE 2012, pages 5–8, 2012.

K. Yotov, X. Li, G. Ren, M.J. Garzarán, D. Padua, K. Pingali, and P. Stodghill. Is search really necessary to generate high-performance BLAS? Proceedings of the IEEE, 93(2):358–386, 2005.

Ziliang Z., A. Manzanares, B. Stinar, and Xiao Q. Energy-Aware Duplication Strategies for Scheduling Precedence-Constrained Parallel Tasks on Clusters. In Proc. of the 2006 IEEE Int. Conf. on Cluster Computing, pages 1–8, Sept 2006. DOI: 10.1109/CLUSTR.2006.311860.

I. Zecena, Ziliang Zong, Rong Ge, Tongdan Jin, Zizhong Chen, and Meikang Qiu. Energy consumption analysis of parallel sorting algorithms running on multicore systems. In Proc. of the 2012 International Green Computing Conference (IGCC), pages 1–6. IEEE, June 2012. DOI: 10.1109/IGCC.2012.6322290.

S. Zheng, P. Zhang, and Zhang Q. A Routing Protocol Based on Energy Aware in Ad Hoc Networks. Information Technology Journal, 9(4):797–803, 2010. DOI: 10.3923/itj.2010.797.803.




How to Cite

Carretero, J., Distefano, S., Petcu, D., Pop, D., Rauber, T., Rünger, G., & Singh, D. E. (2015). Energy-efficient Algorithms for Ultrascale Systems. Supercomputing Frontiers and Innovations, 2(2), 77–104.

Most read articles by the same author(s)