The paper suggests and describes two architectures for parallel data sort. The first architecture is applicable to large data sets and it combines three stages of data processing: data sorting in hardware (in a Field-Programmable Gate Arrays – FPGA), merging preliminary sorted blocks in hardware (in the FPGA), and merging large subsets received from the FPGA in general-purpose software. Dataexchange between the FPGA anda general-purpose computeris organized throughafast Peripheral Component Interconnect (PCI) expressbus.The second architectureis applicabletosmalldatasetsandit enablessortingtobedoneatthetimeofdata acquisition,i.e.as soon as the last data item is received, the sorted items can be transferred immediately. The results of experiments clearly demonstrate the advantages of the proposed architectures that permit the reduction of the required hardware resources and increasing throughput compared to the results reported in publications and software functions targeted to data sorting.
1. Knuth, D. E. The Art of Computer Programming. Sorting and Searching, Vol. III. Addison-Wesley, 2011.
2. Sklyarov, V. and Skliarova, I. High-performance implementation of regular and easily scalable sorting networks on an FPGA. Microprocess. Microsyst., 2014, 38(5), 470–484.
https://doi.org/10.1016/j.micpro.2014.03.003
3. Mueller, R., Teubner, J., and Alonso, G. Sorting networks on FPGAs. Int. J. Very Large Data Bases, 2012, 21(1), 1–23.
https://doi.org/10.1007/s00778-011-0232-z
4. Ortiz, J. and Andrews, D. A configurable high-throughput linear sorter system. In Proceedings of the 2010 IEEE International Symposium on Parallel & Distributed Processing. IEEE, 2010, 1–8.
https://doi.org/10.1109/ipdpsw.2010.5470730
5. Zuluaga, M., Milder, P., and Puschel, M. Computer generation of streaming sorting networks. In Proceedings of the 49th Design Automation Conference. ACM, New York, 2012, 1245–1253.
https://doi.org/10.1145/2228360.2228588
6. Singh, S. and Greaves, D. J. Kiwi: synthesis of FPGA circuits from parallel programs. In Proceedings of the 16th IEEE International Symposium on Field-Programmable Custom Computing Machines. IEEE, 2008, 3–12.
https://doi.org/10.1109/fccm.2008.46
7. Che, S., Li, J., Sheaffer, J. W., Skadron, K., and Lach, J. Accelerating compute-intensive applications with GPUs and FPGAs. In Proceedings of the 2008 Symposium on Application Specific Processors. IEEE, 2008, 101–107.
https://doi.org/10.1109/SASP.2008.4570793
8. Chamberlain, R. D. and Ganesan, N. Sorting on architecturally diverse computer systems. In Proceedings of the 3rd International Workshop on High-Performance Reconfigurable Computing Technology and Applications. ACM, New York, 2009, 39–46.
https://doi.org/10.1145/1646461.1646466
9. Mueller, R. Data Stream Processing on Embedded Devices. Ph.D. thesis, ETH, Zurich, 2010.
10. Koch, D. and Torresen, J. FPGASort: a high performance sorting architecture exploiting run-time reconfiguration on FPGAs for large problem sorting. In Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays. ACM, New York, 2011, 45–54.
https://doi.org/10.1145/1950413.1950427
11. Sklyarov, V., Skliarova, I., Mihhailov, D., and Sudnitson, A. Implementation in FPGA of address-based data sorting. In Proceedings of the 21st International Conference on Field-Programmable Logic and Applications. IEEE, 2011, 405–410.
https://doi.org/10.1109/fpl.2011.81
12. Kipfer, P. and Westermann, R. GPU Gems 2, Improved GPU Sorting. http://http.developer.nvidia.com/GPUGems2/ gpugems2_chapter46.html, 2005 (accessed 08.06.2016).
13. Gapannini, G., Silvestri, F., and Baraglia, R. Sorting on GPU for large scale datasets: a thorough comparison. Inf. Process. Manage, 2012, 48(5), 903–917.
https://doi.org/10.1016/j.ipm.2010.11.010
14. Ye, X., Fan, D., Lin, W., Yuan, N., and Ienne, P. High performance comparison-based sorting algorithm on many-core GPUs. In Proceedings of the 2010 IEEE International Symposium on Parallel & Distributed Processing. IEEE, 2010, 1–10.
15. Satish, N., Harris, M., and Garland, M. Designing efficient sorting algorithms for manycore GPUs. In Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing. IEEE, 2009, 1–10.
https://doi.org/10.1109/IPDPS.2009.5161005
16. Cederman, D. and Tsigas, P. A practical quicksort algorithm for graphics processors. In Proceedings of the 16th Annual European Symposium on Algorithms. Springer-Verlag, Berlin, Heidelberg, 2008, 246–258.
https://doi.org/10.1007/978-3-540-87744-8_21
17. Grozea, C., Bankovic, Z., and Laskov, P. FPGA vs. multi-core CPUs vs. GPUs: hands-on experience with a sorting application. In Facing the Multicore-Challenge (Keller, R., Kramer, D., and Weiss, J. P., eds). Springer-Verlag, Berlin, Heidelberg, 2010, 105–117.
https://doi.org/10.1007/978-3-642-16233-6_12
18. Edahiro, M. Parallelizing fundamental algorithms such as sorting on multi-core processors for EDA acceleration. In Proceedings of the 2009 Asia and South Pacific Design Automation Conference. IEEE, 2009, 230–233.
https://doi.org/10.1109/ASPDAC.2009.4796485
19. Aj-Haj Baddar, S. W. and Batcher, K. E. Designing Sorting Networks. A New Paradigm. Springer, 2011.
https://doi.org/10.1007/978-1-4614-1851-1
20. Marcelino, R., Neto, H. C., and Cardoso, J. M. P. A comparison of three representative hardware sorting units. In Proceedings of the 35th Annual IEEE Conference on Industrial Electronics. IEEE, 2009, 2805–2810.
https://doi.org/10.1109/iecon.2009.5415409
21. Batcher, K. E. Sorting networks and their applications. In Proceedings of the AFIPS Spring Joint Computer Conference. ACM, New York, 1968, 307–314.
https://doi.org/10.1145/1468075.1468121
22. Lacey, S. and Box, R. A fast, easy sort: a novel enhancement makes a bubble sort into one of the fastest sorting routines. Byte, 1991, 16(4), 315–320.
23. Sklyarov, V. and Skliarova, I. Fast regular circuits for network-based parallel data processing. Adv. Electr. Comput. Eng., 2013, 13(4), 47–50.
https://doi.org/10.4316/AECE.2013.04008
24. Xilinx, Inc. Zynq-7000 all programmable SoC. Technical Reference Manual. https://www.xilinx.com/support/documentation/user_guides/ug585-Zynq-7000-TRM.pdf , 2016 (accessed 01.02.2017).
25. Xilinx, Inc. VC707 Evaluation Board for the Virtex-7 FPGA User Guide. https://www.xilinx.com/support/documentation/boards_and_kits/vc707/ug885_VC707_Eval_Bd.pdf , 2016 (accessed 01.02.2017).
26. Digilent, Inc. Nexys4 DDR FPGA Board Reference Manual. https://reference.digilentinc.com/_media/nexys4-ddr: nexys4ddr_rm.pdf, 2016 (accessed 08.06.2016).