Abstract
Checkpoint can store and recovery applications when faults happen and is becoming critical to large information systems. Unfortunately, existing checkpoint tools have some limitations such as: not transparent to applications, ignoring file system states, cluster checkpoint is not well supported, and so on. We present a light weight virtual container based cluster checkpoint. Firstly, a virtual container, IPG (Isolated Process Group), is designed to wrap all target applications together and produce checkpoint transparently and completely. Secondly, each IPG has its independent namespace built on an exclusively owned LV (Logical Volume), which can be checkpointed synchronously with the IPG’s memory to guarantee the consistency. Finally, distributed applications can be deployed on many IPGs and a cluster checkpoint protocol is presented to orchestrate all IPGs to produce global checkpoints. Experiments and evaluations results illustrate that no overhead will be introduced for applications running in IPGs, and our prototype system works more stable than the library base checkpoint tools.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Choy, M., Leong, H.V., Wong, M.H.: Disaster recovery techniques for database systems. Communications of the ACM 43(11) (2000)
Lyubashevskiy, I., Strumpen, V.: Fault-tolerant file-I/O for portable checkpointing systems. The Journal of Supercomputing 16(1-2), 69–92 (2000)
Pei, D.: Modification Operations Buffering: A Lowoverhead Approach to Checkpoint User Files. In: IEEE 29th Symposium on Fault-Tolerant Computing, Madison, USA, June 1999, pp. 36–38 (1999)
Jeyakumar, A.R.: Metamori: A library for Incremental File Checkpointing. Master’s thesis, Virgina Tech, Blacksburg, June 21 (2004)
Osman, S., Subhraveti, D., Su, G., Nieh, J.: The Design and Implementation of Zap: A System for Migrating Computing Environments. In: Proceedings of the Fifth USENIX Symposium Operating Systems Design and Implementation, Boston, MA, USA, December, 2002, pp. 361–376 (2002)
Plank, J.S., Beck, M., Kingsley, G., Li, K.: Libckpt: Transparent Checkpointing Under Unix. In: Proceedings of the USENIX Winter 1995 Technical Conference, New Orlands, LA, USA, January 1995, pp. 213–223 (1995)
Dieter, W.R., Lumpp Jr., J.E.: User Level Checkpointing for Linux Threads Progams. In: Proceedings of the FREENIX Track: 2001 USENIX Annual Technical Conference, Boston, MA, USA, June 2001, pp. 81–92 (2001)
Duell, J., Hargrove, P., Roman, E.: The Design and Implementation of Berkeley Lab’s Linux Checkpoint/Restart. White paper. Future Technologies Group (2003)
Litzkow, M., Tannenbaum, T., Basney, J., Livny, M.: Checkpoint and Migration of UNIX Processes in the Condor Distributed Processing System. Technical Report CS-TR-1997-1346, University of Wisconsin, Madison (April 1997)
Kim, H., Yeom, H.: A User-Transparent Recoverable File System for Distributed Computing Environment. In: Proceedings of CLADE 2005, July 2005, pp. 45–53 (2005)
Nakano, J., Montesinos, P., Gharachorloo, K., Torrellas, J.: RevivoI/O: Efficient Handling of I/O in Highly-Available Rollback-Recovery Servers. In: Proceedings of HPCA 2006, Austin, Texas, USA, February 2006, pp. 200–211 (2006)
Janakiraman, G., Santos, J.R., Subhraveti, D., Turner, Y.: Cruz: Application Transparent Distributed Checkpoint Restart on Standard Operating Systems. In: Proceedings of DSN 2005, Yokohama, Japan, 28 June-1 July, 2005, pp. 260–269 (2005)
Laadan, O., Phung, D., Nieh, J.: Transparent Checkpoint Restart of Distributed Applications on Commodity Clusters. In: Proceedings of the 2005 IEEE International Conference on Cluster Computing, Boston, MA, USA, September, 2005, pp. 1–13 (2005)
Zandy, C.: Ckpt – Process Checkpoint Library., http://pages.cs.wisc.edu/~zandy/ckpt/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
**ang, Xj., Yu, Hl., Shu, Jw. (2009). Virtual Container Based Consistent Cluster Checkpoint. In: Ślęzak, D., Kim, Th., Stoica, A., Kang, BH. (eds) Control and Automation. CA 2009. Communications in Computer and Information Science, vol 65. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10741-2_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-10741-2_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10742-9
Online ISBN: 978-3-642-10741-2
eBook Packages: Computer ScienceComputer Science (R0)