Virtual Container Based Consistent Cluster Checkpoint

  • Conference paper
Control and Automation (CA 2009)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 65))

Included in the following conference series:

  • 576 Accesses

Abstract

Checkpoint can store and recovery applications when faults happen and is becoming critical to large information systems. Unfortunately, existing checkpoint tools have some limitations such as: not transparent to applications, ignoring file system states, cluster checkpoint is not well supported, and so on. We present a light weight virtual container based cluster checkpoint. Firstly, a virtual container, IPG (Isolated Process Group), is designed to wrap all target applications together and produce checkpoint transparently and completely. Secondly, each IPG has its independent namespace built on an exclusively owned LV (Logical Volume), which can be checkpointed synchronously with the IPG’s memory to guarantee the consistency. Finally, distributed applications can be deployed on many IPGs and a cluster checkpoint protocol is presented to orchestrate all IPGs to produce global checkpoints. Experiments and evaluations results illustrate that no overhead will be introduced for applications running in IPGs, and our prototype system works more stable than the library base checkpoint tools.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 35.99
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 44.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Choy, M., Leong, H.V., Wong, M.H.: Disaster recovery techniques for database systems. Communications of the ACM 43(11) (2000)

    Google Scholar 

  2. Lyubashevskiy, I., Strumpen, V.: Fault-tolerant file-I/O for portable checkpointing systems. The Journal of Supercomputing 16(1-2), 69–92 (2000)

    Article  Google Scholar 

  3. Pei, D.: Modification Operations Buffering: A Lowoverhead Approach to Checkpoint User Files. In: IEEE 29th Symposium on Fault-Tolerant Computing, Madison, USA, June 1999, pp. 36–38 (1999)

    Google Scholar 

  4. Jeyakumar, A.R.: Metamori: A library for Incremental File Checkpointing. Master’s thesis, Virgina Tech, Blacksburg, June 21 (2004)

    Google Scholar 

  5. Osman, S., Subhraveti, D., Su, G., Nieh, J.: The Design and Implementation of Zap: A System for Migrating Computing Environments. In: Proceedings of the Fifth USENIX Symposium Operating Systems Design and Implementation, Boston, MA, USA, December, 2002, pp. 361–376 (2002)

    Google Scholar 

  6. Plank, J.S., Beck, M., Kingsley, G., Li, K.: Libckpt: Transparent Checkpointing Under Unix. In: Proceedings of the USENIX Winter 1995 Technical Conference, New Orlands, LA, USA, January 1995, pp. 213–223 (1995)

    Google Scholar 

  7. Dieter, W.R., Lumpp Jr., J.E.: User Level Checkpointing for Linux Threads Progams. In: Proceedings of the FREENIX Track: 2001 USENIX Annual Technical Conference, Boston, MA, USA, June 2001, pp. 81–92 (2001)

    Google Scholar 

  8. Duell, J., Hargrove, P., Roman, E.: The Design and Implementation of Berkeley Lab’s Linux Checkpoint/Restart. White paper. Future Technologies Group (2003)

    Google Scholar 

  9. Litzkow, M., Tannenbaum, T., Basney, J., Livny, M.: Checkpoint and Migration of UNIX Processes in the Condor Distributed Processing System. Technical Report CS-TR-1997-1346, University of Wisconsin, Madison (April 1997)

    Google Scholar 

  10. Kim, H., Yeom, H.: A User-Transparent Recoverable File System for Distributed Computing Environment. In: Proceedings of CLADE 2005, July 2005, pp. 45–53 (2005)

    Google Scholar 

  11. Nakano, J., Montesinos, P., Gharachorloo, K., Torrellas, J.: RevivoI/O: Efficient Handling of I/O in Highly-Available Rollback-Recovery Servers. In: Proceedings of HPCA 2006, Austin, Texas, USA, February 2006, pp. 200–211 (2006)

    Google Scholar 

  12. Janakiraman, G., Santos, J.R., Subhraveti, D., Turner, Y.: Cruz: Application Transparent Distributed Checkpoint Restart on Standard Operating Systems. In: Proceedings of DSN 2005, Yokohama, Japan, 28 June-1 July, 2005, pp. 260–269 (2005)

    Google Scholar 

  13. Laadan, O., Phung, D., Nieh, J.: Transparent Checkpoint Restart of Distributed Applications on Commodity Clusters. In: Proceedings of the 2005 IEEE International Conference on Cluster Computing, Boston, MA, USA, September, 2005, pp. 1–13 (2005)

    Google Scholar 

  14. Zandy, C.: Ckpt – Process Checkpoint Library., http://pages.cs.wisc.edu/~zandy/ckpt/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

**ang, Xj., Yu, Hl., Shu, Jw. (2009). Virtual Container Based Consistent Cluster Checkpoint. In: Ślęzak, D., Kim, Th., Stoica, A., Kang, BH. (eds) Control and Automation. CA 2009. Communications in Computer and Information Science, vol 65. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10741-2_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-10741-2_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-10742-9

  • Online ISBN: 978-3-642-10741-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation