Performance Reduction For Automatic Development of Parallel Applications For Reconfigurable Computer Systems

Authors

  • Alexey I. Dordopulo Scientific Research Centre of Supercomputers and Neurocomputers, Co., Ltd.
  • Ilya I. Levin Southern Federal University

DOI:

https://doi.org/10.14529/jsfi200201

Abstract

In the paper, we review a suboptimal methodology of mapping of a task information graph on the architecture of a reconfigurable computer system. Using performance reduction methods, we can solve computational problems which need hardware costs exceeding the available hardware resource. We proved theorems, concerning properties of sequential reductions. In our case, we have the following types of reduction such as the reduction by number of basic subgraphs, by number of computing devices, and by data width. On the base of the proved theorems and corollaries, we developed the methodology of reduction transformations of a task information graph for its automatic adaptation to the architecture of a reconfigurable computer system. We estimated the maximum number of transformations, which, according to the suggested methodology, are needed for balanced reduction of the performance and hardware costs of applications for reconfigurable computer systems.

References

Voevodin, V.V., Voevodin Vl.V.: Parallel computing. BHV-Petersburg (2002)

Palkowski, M., Bielecki, W.: TRACO Parallelizing Compiler. In: Wiliski, A., Fray, I., Peja, J. (eds.) Soft Computing in Computer and Information Science. Advances in Intelligent Systems and Computing, vol. 342, pp. 409–421. Springer, Cham (2015), DOI: 10.1007/978-3-319-15147-2_34

SAPFOR system. https://www.keldysh.ru/dvm/SAPFOR/, accessed: 2020-05-22

Bielecki, W., Palkowski, M.: Perfectly Nested Loop Tiling Transformations Based on the Transitive Closure of the Program Dependence Graph. In: Wiliski, A., Fray, I., Peja, J. (eds) Soft Computing in Computer and Information Science. Advances in Intelligent Systems and Computing, vol. 342, pp. 309–320. Springer, Cham (2015), DOI: 10.1007/978-3-319-15147-2_26

Devan, P.S, Kamat, R.K.: A Review – LOOP Dependence Analysis for Parallelizing Compiler. International Journal of Computer Science and Information Technologies 5(3) (2014) https://www.ijcsit.com/docs/Volume%205/vol5issue03/ijcsit20140503305.pdf, accessed: 2020-05-22.

Jensen, N., Karlsson, S.: Improving Loop Dependence Analysis. ACM Transactions on Architecture and Code Optimization 14(3), 1–24 (2017), DOI: 10.1145/3095754

Solihin, Y.: Fundamentals of parallel computer architecture: multichip and multicore systems. Chapman and Hall/CRC (2016)

Cooper, K.D., Torczon, L.: Engineering a Compiler. Morgan Kaufmann (2005)

Kennedy, K., Allen, R.: Optimizing Compilers for Modern Architectures: A Dependence-based Approach. Morgan Kaufmann (2001)

Muchnick, S.S.: Advanced Compiler Design and Implementation. Morgan Kaufmann (1997)

Levin, I., Dordopulo, A., Fedorov, A., Kalyaev, I.: Reconfigurable computer systems: from the first FPGAs towards liquid cooling systems. Supercomputing Frontiers and Innovations 3(1), 22–40 (2016), DOI: 10.14529/jsfi160102

Liu, S., Liu Z., Huang, H.: FPGA implementation of a fast pipeline architecture for JND computation. In: Proceedings of 5th International Congress on Image and Signal Processing, 16-18 Oct. 2012, Chongqing, China. pp. 577–581. IEEE (2012), DOI: 10.1109/CISP.2012.6469995

Trimberger, S.M.: Three Ages of FPGAs: A Retrospective on the First Thirty Years of FPGA Technology. Proceedings of the IEEE 103(3), 318–331 (2015), DOI: 10.1109/JPROC.2015.2392104

Wahiba, M., Abdellah, S., Aichouche, B.: Implementation of parallel-pipeline H.265 CABAC decoder on FPGA. In: Proceedings of the First International Conference on Embedded & Distributed Systems, EDiS 2017, 17-18 Dec. 2017, Oran, Algeria. pp. 1–6. IEEE (2017), DOI: 10.1109/EDIS.2017.8284037

Khatami, R.I., Ahmadi, M.: High throughput multi pipeline packet classifier on FPGA. In: Proceedings of the 17th CSI International Symposium on Computer Architecture & Digital Systems, 30-31 Oct. 2013, Tehran, Iran. pp. 137–138. IEEE (2013), DOI: 10.1109/CADS.2013.6714253

Prihozhy, A., Bezati, E., Ab Rahman, A.A., Mattavelli, M.: Synthesis and Optimization of Pipelines for HW Implementations of Dataflow Programs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 34(10), 1613–1626 (2015), DOI: 10.1109/TCAD.2015.2427278

Korcyl, G., Korcyl, P.: Investigating the Dirac Operator Evaluation with FPGAs. Supercomputing Frontiers And Innovations 6(2), 56–63 (2019), DOI: 10.14529/jsfi190204

Qu, Y.R., Prasanna, V.K.: High-Performance and Dynamically Updatable Packet Classification Engine on FPGA. IEEE Transactions on Parallel and Distributed Systems 27(1), 197–209 (2016), DOI: 10.1109/TPDS.2015.2389239

Kalyaev, I.A., Levin, I.I., Semernikov, E.A., Shmoilov, V.I.: Reconfigurable multipipeline computing structures. Nova Science Publishers, New York, USA (2012)

Sorokin, D.A., Dordopulo, A.I., Levin, I.I., Melnikov, A.K.: Solving problems with essentially variable intensity of data flows on reconfigurable computing systems. Bulletin of computer and information technologies 2, 49–56 (2012), DOI: 10.14489/issn.1810-7206

Graham, R.L., Knuth, D.E., Patashnik, O.: Concrete Mathematics: A foundation for computer science (2nd ed.). Addison-Wesley Professional (1994)

Patterson, D., Hennessy, J.: Computer Architecture: A Quantitative Approach (5th ed.). Morgan Kaufmann (2011)

Unnikrishnan P., Shirako J., Barton K., Chatterjee S., Silvera R., Sarkar V.: A Practical Approach to DOACROSS Parallelization. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds) European Conference on Parallel Processing, 27-31 Aug. 2012, Rhodes Island, Greece. Euro-Par 2012 Parallel Processing, Lecture Notes in Computer Science, vol. 7484, pp. 219–231. Springer, Berlin, Heidelberg (2012), DOI: 10.1007/978-3-642-32820-6_23

Levin I., Dordopulo, A., Gudkov, V., Gulenok, A., Bovkun A., Yevstafiyev, G., Alekseev, K.: Software Development Tools for FPGA-Based Reconfigurable Systems Programming. In: Voevodin, Vl., Sobolev, S. (eds) Russian Supercomputing Days, 23-24 Sept., Moscow, Russia. Supercomputing, Communications in Computer and Information Science, vol. 1129, pp. 625–640. Springer, Cham (2019), DOI: 10.1007/978-3-030-36592-9_51

Downloads

Published

2020-07-21

How to Cite

Dordopulo, A. I., & Levin, I. I. (2020). Performance Reduction For Automatic Development of Parallel Applications For Reconfigurable Computer Systems. Supercomputing Frontiers and Innovations, 7(2), 4–23. https://doi.org/10.14529/jsfi200201