System Software for Data-Intensive Science

  • Chapter
  • First Online:
Advanced Software Technologies for Post-Peta Scale Computing

Abstract

The storage performance is an issue for supercomputers to facilitate the data-intensive science. To improve the storage bandwidth according to the number of compute nodes, we assume a node-local scale-out storage architecture. The number of local storages increases according to the number of compute nodes, and the total storage bandwidth increases scalably. Our research target is a distributed file system in the node-local storage architecture, an operating system for compute node, and runtime systems for the distributed file system using node-local storages for workflow systems, MapReduce, MPI-IO, and batch job schedulers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (France)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 85.59
Price includes VAT (France)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
EUR 105.49
Price includes VAT (France)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Armstrong, T.G., Zhang, Z., Katz, D.S., Wilde, M., Foster, I.T.: Scheduling many-task workloads on supercomputers: dealing with trailing tasks. In: 2010 3rd Workshop on Many-Task Computing on Grids and Supercomputers, pp. 1–10. IEEE (2010). https://doi.org/10.1109/MTAGS.2010.5699433. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=5699433

  2. Dahlin, M.D., Wang, R.Y., Anderson, T.E., Patterson, D.A.: Cooperative caching: using remote client memory to improve file system performance. In: Proceedings of the 1st USENIX Conference on Operating Systems Design and Implementation (1994)

    Google Scholar 

  3. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008). https://doi.org/10.1145/1327452.1327492

    Article  Google Scholar 

  4. Fusion-Io: NVM Primitives Library (2014). http://opennvm.github.io/nvm-primitives-documents/

  5. Ghemawat, S., Gobioff, H., Leung, S.T.: The Google file system. In: Proceedings of the 19th ACM Symposium on Operating Systems Principles, pp. 20–43 (2003)

    Google Scholar 

  6. Hadoop Distributed File System. http://hadoop.apache.org/

  7. Herlihy, M., Luchangco, V., Moir, M., Scherer III, W.N.: Software transactional memory for dynamic-sized data structures. In: Proceedings of the Twenty-Second Annual Symposium on Principles of Distributed Computing (PODC ’03), pp. 92–101. ACM, New York (2003). https://doi.org/10.1145/872035.872048 http://doi.acm.org/10.1145/872035.872048

  8. Jacob, J.C., Katz, D.S., Berriman, G.B., Good, J.C., Laity, A.C., Deelman, E., Kesselman, C., Singh, G., Su, M.H., Prince, T.A., Williams, R.: Montage: a grid portal and software toolkit for science-grade astronomical image mosaicking. Int. J. Comput. Sci. Eng. 4(2), 73–87 (2009). https://doi.org/10.1504/IJCSE.2009.026999. http://dl.acm.org/citation.cfm?id=1568665.1568666

    Article  Google Scholar 

  9. Josephson, W.K., Bongo, L.A., Li, K., Flynn, D.: DFS: a file system for virtualized flash storage. ACM Trans. Storage 6(3), 14:1–14:25 (2010)

    Article  Google Scholar 

  10. Karypis, G., Kumar, V.: Multilevel algorithms for multi-constraint graph partitioning. In: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing, Supercomputing ’98, pp. 296–310. Springer, IEEE Computer Society, Washington, DC (1998). https://doi.org/10.1109/SC.1998.10018. http://dl.acm.org/citation.cfm?id=509086

  11. Li, X., Tatebe, O.: Improved Data-Aware Task Dispatching for Batch Queuing Systems. In: 2016 Seventh International Workshop on Data-Intensive Computing in the Clouds (DataCloud), pp. 37–44. IEEE (2016). https://doi.org/10.1109/DataCloud.2016.009. http://ieeexplore.ieee.org/document/7845280/

  12. Li, X., Tatebe, O.: Data-aware task dispatching for batch queuing system. IEEE Syst. J. 11(2), 889–897 (2017). https://doi.org/10.1109/JSYST.2015.2471850. http://ieeexplore.ieee.org/document/7273750/

    Article  Google Scholar 

  13. Ohtsuji, H., Tatebe, O.: Active-storage mechanism for cluster-wide RAID system. In: Proceedings of IEEE International Conference on Data Science and Data Intensive Systems (DSDIS), pp. 25–32 (2015)

    Google Scholar 

  14. Oyama, Y., Ishiguro, S., Murakami, J., Sasaki, S., Matsumiya, R., Tatebe, O.: Reduction of operating system jitter caused by page reclaim. In: Proceedings of the 4th International Workshop on Runtime and Operating Systems for Supercomputers (2014)

    Google Scholar 

  15. Oyama, Y., Ishiguro, S., Murakami, J., Sasaki, S., Matsumiya, R., Tatebe, O.: Experimental analysis of operating system jitter caused by page reclaim. J. Supercomput. 72(5), 1946–1972 (2016)

    Article  Google Scholar 

  16. Oyama, Y., Murakami, J., Ishiguro, S., Tatebe, O.: Implementation of a deduplication cache mechanism using content-defined chunking. Int. J. High Perform. Comput. Netw. 9(3), 190–205 (2016)

    Article  Google Scholar 

  17. Ren, K., Gibson, G.: Tablefs: Enhancing metadata efficiency in the local file system. In: Proceedings of the 2013 USENIX Conference on Annual Technical Conference, USENIX ATC’13, pp. 145–156. USENIX Association, Berkeley (2013). http://dl.acm.org/citation.cfm?id=2535461.2535480

  18. Ren, K., Zheng, Q., Patil, S., Gibson, G.: Indexfs: Scaling file system metadata performance with stateless caching and bulk insertion. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC’14, pp. 237–248. IEEE Press, Piscataway (2014). https://doi.org/10.1109/SC.2014.25

  19. Sasaki, S., Matsumiya, R., Takahashi, K., Oyama, Y., Tatebe, O.: RDMA-based cooperative caching for a distributed file system. In: Proceedings of the 21st IEEE International Conference on Parallel and Distributed Systems, pp. 344–353 (2015)

    Google Scholar 

  20. Sasaki, S., Takahashi, K., Oyama, Y., Tatebe, O.: RDMA-based direct transfer of file data to remote page cache. In: Proceedings of 2015 IEEE International Conference on Cluster Computing, pp. 214–225 (2015)

    Google Scholar 

  21. Schloegel, K., Karypis, G., Kumar, V.: Parallel static and dynamic multi-constraint graph partitioning. Concur. Comput. Pract. Exp. 14(3), 219–240 (2002). https://doi.org/10.1002/cpe.605

    Article  Google Scholar 

  22. Takatsu, F., Hiraga, K., Tatebe, O.: Design of object storage using open VM for high-performance distributed file system. J. Inf. Process. 24(5), 824–833 (2016)

    Google Scholar 

  23. Takatsu, F., Hiraga, K., Tatebe, O.: PPFS: a scale-out distributed file system for post-petascale systems. J. Inf. Process. 25, 538–447 (2017)

    Google Scholar 

  24. Tanaka, M., Tatebe, O.: Pwrake: A parallel and distributed flexible workflow management tool for wide-area data intensive computing. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (HPDC ’10), pp. 356–359. ACM Press, New York (2010). https://doi.org/10.1145/1851476.1851529. http://dl.acm.org/citation.cfm?id=1851476.1851529 http://portal.acm.org/citation.cfm?id=1851476.1851529

  25. Tanaka, M., Tatebe, O.: Workflow scheduling to minimize data movement using multi-constraint graph partitioning. In: 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2012), pp. 65–72. IEEE (2012). https://doi.org/10.1109/CCGrid.2012.134. http://dl.acm.org/citation.cfm?id=2310096.2310129

  26. Tanaka, M., Tatebe, O.: Disk Cache-Aware Task Scheduling For Data-Intensive and Many-Task Workflow. In: IEEE Cluster 2014, pp. 167–175. IEEE, Madrid (2014). https://doi.org/10.1109/CLUSTER.2014.6968774. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=6968774

  27. Tatebe, O., Hiraga, K., Soda, N.: Gfarm grid file system. N. Gener. Comput. 28, 257–275 (2010)

    Article  Google Scholar 

  28. Zaharia, M., Borthakur, D., Sen Sarma, J., Elmeleegy, K., Shenker, S., Stoica, I.: Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling. In: Proceedings of the 5th European conference on Computer systems – EuroSys ’10, p. 265. ACM Press, New York (2010). https://doi.org/10.1145/1755913.1755940. http://portal.acm.org/citation.cfm?doid=1755913.1755940

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Osamu Tatebe .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Tatebe, O., Oyama, Y., Tanaka, M., Ohtsuji, H., Takatsu, F., Li, X. (2019). System Software for Data-Intensive Science. In: Sato, M. (eds) Advanced Software Technologies for Post-Peta Scale Computing. Springer, Singapore. https://doi.org/10.1007/978-981-13-1924-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-1924-2_6

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-1923-5

  • Online ISBN: 978-981-13-1924-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation