Abstract
The storage performance is an issue for supercomputers to facilitate the data-intensive science. To improve the storage bandwidth according to the number of compute nodes, we assume a node-local scale-out storage architecture. The number of local storages increases according to the number of compute nodes, and the total storage bandwidth increases scalably. Our research target is a distributed file system in the node-local storage architecture, an operating system for compute node, and runtime systems for the distributed file system using node-local storages for workflow systems, MapReduce, MPI-IO, and batch job schedulers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Armstrong, T.G., Zhang, Z., Katz, D.S., Wilde, M., Foster, I.T.: Scheduling many-task workloads on supercomputers: dealing with trailing tasks. In: 2010 3rd Workshop on Many-Task Computing on Grids and Supercomputers, pp. 1–10. IEEE (2010). https://doi.org/10.1109/MTAGS.2010.5699433. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=5699433
Dahlin, M.D., Wang, R.Y., Anderson, T.E., Patterson, D.A.: Cooperative caching: using remote client memory to improve file system performance. In: Proceedings of the 1st USENIX Conference on Operating Systems Design and Implementation (1994)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008). https://doi.org/10.1145/1327452.1327492
Fusion-Io: NVM Primitives Library (2014). http://opennvm.github.io/nvm-primitives-documents/
Ghemawat, S., Gobioff, H., Leung, S.T.: The Google file system. In: Proceedings of the 19th ACM Symposium on Operating Systems Principles, pp. 20–43 (2003)
Hadoop Distributed File System. http://hadoop.apache.org/
Herlihy, M., Luchangco, V., Moir, M., Scherer III, W.N.: Software transactional memory for dynamic-sized data structures. In: Proceedings of the Twenty-Second Annual Symposium on Principles of Distributed Computing (PODC ’03), pp. 92–101. ACM, New York (2003). https://doi.org/10.1145/872035.872048 http://doi.acm.org/10.1145/872035.872048
Jacob, J.C., Katz, D.S., Berriman, G.B., Good, J.C., Laity, A.C., Deelman, E., Kesselman, C., Singh, G., Su, M.H., Prince, T.A., Williams, R.: Montage: a grid portal and software toolkit for science-grade astronomical image mosaicking. Int. J. Comput. Sci. Eng. 4(2), 73–87 (2009). https://doi.org/10.1504/IJCSE.2009.026999. http://dl.acm.org/citation.cfm?id=1568665.1568666
Josephson, W.K., Bongo, L.A., Li, K., Flynn, D.: DFS: a file system for virtualized flash storage. ACM Trans. Storage 6(3), 14:1–14:25 (2010)
Karypis, G., Kumar, V.: Multilevel algorithms for multi-constraint graph partitioning. In: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing, Supercomputing ’98, pp. 296–310. Springer, IEEE Computer Society, Washington, DC (1998). https://doi.org/10.1109/SC.1998.10018. http://dl.acm.org/citation.cfm?id=509086
Li, X., Tatebe, O.: Improved Data-Aware Task Dispatching for Batch Queuing Systems. In: 2016 Seventh International Workshop on Data-Intensive Computing in the Clouds (DataCloud), pp. 37–44. IEEE (2016). https://doi.org/10.1109/DataCloud.2016.009. http://ieeexplore.ieee.org/document/7845280/
Li, X., Tatebe, O.: Data-aware task dispatching for batch queuing system. IEEE Syst. J. 11(2), 889–897 (2017). https://doi.org/10.1109/JSYST.2015.2471850. http://ieeexplore.ieee.org/document/7273750/
Ohtsuji, H., Tatebe, O.: Active-storage mechanism for cluster-wide RAID system. In: Proceedings of IEEE International Conference on Data Science and Data Intensive Systems (DSDIS), pp. 25–32 (2015)
Oyama, Y., Ishiguro, S., Murakami, J., Sasaki, S., Matsumiya, R., Tatebe, O.: Reduction of operating system jitter caused by page reclaim. In: Proceedings of the 4th International Workshop on Runtime and Operating Systems for Supercomputers (2014)
Oyama, Y., Ishiguro, S., Murakami, J., Sasaki, S., Matsumiya, R., Tatebe, O.: Experimental analysis of operating system jitter caused by page reclaim. J. Supercomput. 72(5), 1946–1972 (2016)
Oyama, Y., Murakami, J., Ishiguro, S., Tatebe, O.: Implementation of a deduplication cache mechanism using content-defined chunking. Int. J. High Perform. Comput. Netw. 9(3), 190–205 (2016)
Ren, K., Gibson, G.: Tablefs: Enhancing metadata efficiency in the local file system. In: Proceedings of the 2013 USENIX Conference on Annual Technical Conference, USENIX ATC’13, pp. 145–156. USENIX Association, Berkeley (2013). http://dl.acm.org/citation.cfm?id=2535461.2535480
Ren, K., Zheng, Q., Patil, S., Gibson, G.: Indexfs: Scaling file system metadata performance with stateless caching and bulk insertion. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC’14, pp. 237–248. IEEE Press, Piscataway (2014). https://doi.org/10.1109/SC.2014.25
Sasaki, S., Matsumiya, R., Takahashi, K., Oyama, Y., Tatebe, O.: RDMA-based cooperative caching for a distributed file system. In: Proceedings of the 21st IEEE International Conference on Parallel and Distributed Systems, pp. 344–353 (2015)
Sasaki, S., Takahashi, K., Oyama, Y., Tatebe, O.: RDMA-based direct transfer of file data to remote page cache. In: Proceedings of 2015 IEEE International Conference on Cluster Computing, pp. 214–225 (2015)
Schloegel, K., Karypis, G., Kumar, V.: Parallel static and dynamic multi-constraint graph partitioning. Concur. Comput. Pract. Exp. 14(3), 219–240 (2002). https://doi.org/10.1002/cpe.605
Takatsu, F., Hiraga, K., Tatebe, O.: Design of object storage using open VM for high-performance distributed file system. J. Inf. Process. 24(5), 824–833 (2016)
Takatsu, F., Hiraga, K., Tatebe, O.: PPFS: a scale-out distributed file system for post-petascale systems. J. Inf. Process. 25, 538–447 (2017)
Tanaka, M., Tatebe, O.: Pwrake: A parallel and distributed flexible workflow management tool for wide-area data intensive computing. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (HPDC ’10), pp. 356–359. ACM Press, New York (2010). https://doi.org/10.1145/1851476.1851529. http://dl.acm.org/citation.cfm?id=1851476.1851529 http://portal.acm.org/citation.cfm?id=1851476.1851529
Tanaka, M., Tatebe, O.: Workflow scheduling to minimize data movement using multi-constraint graph partitioning. In: 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2012), pp. 65–72. IEEE (2012). https://doi.org/10.1109/CCGrid.2012.134. http://dl.acm.org/citation.cfm?id=2310096.2310129
Tanaka, M., Tatebe, O.: Disk Cache-Aware Task Scheduling For Data-Intensive and Many-Task Workflow. In: IEEE Cluster 2014, pp. 167–175. IEEE, Madrid (2014). https://doi.org/10.1109/CLUSTER.2014.6968774. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=6968774
Tatebe, O., Hiraga, K., Soda, N.: Gfarm grid file system. N. Gener. Comput. 28, 257–275 (2010)
Zaharia, M., Borthakur, D., Sen Sarma, J., Elmeleegy, K., Shenker, S., Stoica, I.: Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling. In: Proceedings of the 5th European conference on Computer systems – EuroSys ’10, p. 265. ACM Press, New York (2010). https://doi.org/10.1145/1755913.1755940. http://portal.acm.org/citation.cfm?doid=1755913.1755940
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Tatebe, O., Oyama, Y., Tanaka, M., Ohtsuji, H., Takatsu, F., Li, X. (2019). System Software for Data-Intensive Science. In: Sato, M. (eds) Advanced Software Technologies for Post-Peta Scale Computing. Springer, Singapore. https://doi.org/10.1007/978-981-13-1924-2_6
Download citation
DOI: https://doi.org/10.1007/978-981-13-1924-2_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1923-5
Online ISBN: 978-981-13-1924-2
eBook Packages: Computer ScienceComputer Science (R0)