Abstract
Database migration is an important problem faced by companies dealing with big data. Not only is migration a costly procedure, but it also involves serious security risks as well. For some institutions, the primary focus is on reducing the cost of the migration operation, which manifests itself in application testing. For other institutions, minimizing security risks is the most important goal, especially if the data involved is of a sensitive nature. In the literature, the database migration problem has been studied from a test cost minimization perspective. In this paper, we focus on an orthogonal measure, i.e., security risk minimization. We associate security with the number of shifts needed to complete the migration task. Ideally, we want to complete the migration in as few shifts as possible, so that the risk of data exposure is minimized. In this paper, we provide a formal framework for studying the database migration problem from the perspective of security risk minimization (shift minimization) and establish the computational complexities of several models in the same. For the NP-hard models, we develop memetic algorithms that produce solutions that are within \(10\%\) and \(7\%\) of the optimal in \(95\%\) of the instances under 8 and 82 seconds, respectively.
Similar content being viewed by others
References
Acikalin, U.U., & Caskurlu, B. (2022). Multilevel memetic hypergraph partitioning with greedy recombination. In GECCO ’22: Genetic and Evolutionary Computation Conference, Companion Volume, Boston, Massachusetts, USA, July 9 - 13, 2022, pp. 168–171. ACM
Acikalin, U.U., Caskurlu, B., Wojciechowski, P., & Subramani, K. (2021). New results on test-cost minimization in database migration. In International Symposium on Algorithmic Aspects of Cloud Computing, pp. 38–55, Springer
Azeroual, O., & Jha, M. (2021). Without data quality, there is no data migration. Big Data and Cognitive Computing, 5(20), 24.
Barhate, S., & Dhore, M. (2015). Data migration issues in cloud computing: a survey. International Journal of Electronics, Communication and Soft Computing Science & Engineering (IJECSCSE), 360
Chang, J., Gabow, H. N., & Khuller, S. (2014). A model for minimizing active processor time. Algorithmica, 70(3), 368–405.
Chatterjee, A., & Segev, A. (1991). Data manipulation in heterogeneous databases. SIGMOD Record, 20(4), 64–68.
Chon, H. D., Agrawal, D., & El Abbadi, A. (2002). Data management for moving objects. IEEE Data Eng. Bull., 25(2), 41–47.
Chu, G., Stuckey, P.J., Schutt, A., Ehlers, T., Gange, G., & Francis, K. (2020). Chuffed, a lazy clause generation solver. https://github.com/chuffed/chuffed
Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2009). Introduction to Algorithms, Third Edition (3rd ed.). The MIT Press.
Dell’Amico, M., Díaz, J. C. D., & Iori, M. (2012). The bin packing problem with precedence constraints. Operations Research, 60(6), 1491–1504.
Drumm, C., Schmitt, M., Do, H. H., & Rahm, E. (2007). Quickmig: automatic schema matching for data migration projects. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, CIKM 2007, Lisbon, Portugal, November 6-10, 2007, pp. 107–116
Epstein, L., Favrholdt, L. M., & Levin, A. (2011). Online variable-sized bin packing with conflicts. Discrete Optimization, 8(2), 333–343.
Epstein, L., & Levin, A. (2008). An aptas for generalized cost variable-sized bin packing. SIAM Journal on Computing, 38(1), 411–428.
Epstein, L., & Levin, A. (2008). On bin packing with conflicts. SIAM Journal on Optimization, 19(3), 1270–1298.
Even, G., Levi, R., Rawitz, D., Schieber, B., Shahar, S., & Sviridenko, M. (2008). Algorithms for capacitated rectangle stabbing and lot sizing with joint set-up costs. ACM Transactions on Algorithms (TALG), 4(3), 1–17.
Falkenauer, E. (1996). A hybrid grou** genetic algorithm for bin packing. Journal of heuristics, 2(1), 5–30.
Falkenauer, E., & Delchambre, A. (1992). A genetic algorithm for bin packing and line balancing. In: Proceedings of the 1992 IEEE International Conference on Robotics and Automation, Nice, France, May 12-14, 1992, IEEE Computer Society, pp. 1186–1192
Ferrandina, F., Meyer, T., Zicari, R., Ferran, G., & Madec, J. (1995). Schema and database evolution in the O2 object database system. In: VLDB’95, Proceedings of 21th International Conference on Very Large Data Bases, September 11-15, 1995, Zurich, Switzerland, pp. 170–181.
Friesen, D. K., & Langston, M. A. (1986). Variable sized bin packing. SIAM journal on computing, 15(1), 222–230.
Gandhi, R., Halldórsson, M. M., Kortsarz, G., & Shachnai, H. (2004). Improved results for data migration and open shop scheduling. In: Automata, Languages and Programming: 31st International Colloquium, ICALP 2004, Turku, Finland, July 12-16, 2004. Proceedings, pp. 658–669
Gandhi, R., & Mestre, J. (2009). Combinatorial algorithms for data migration to minimize average completion time. Algorithmica, 54(1), 54–71.
Garey, M. R., & Johnson, D. S. (1979). Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H: Freeman.
Garey, M. R., Johnson, D. S., Simons, B. B., & Tarjan, R. E. (1981). Scheduling unit-time tasks with arbitrary release times and deadlines. SIAM Journal on Computing, 10(2), 256–269.
Goldman, R., McHugh, J., & Widom, J. (1999). From semistructured data to XML: migrating the lore data model and query language. In: ACM SIGMOD Workshop on The Web and Databases, WebDB 1999, Philadelphia, Pennsylvania, USA, June 3-4, 1999. Informal Proceedings, pp. 25–30
Golubchik, L., Khuller, S., Kim, Y. A., Shargorodskaya, S., & Wan, Y. J. (2004). Data migration on parallel disks. In: Algorithms - ESA 2004, 12th Annual European Symposium, Bergen, Norway, September 14-17, 2004, Proceedings, pp. 689–701
Hall, J., Hartline, J. D., Karlin, A. R., Saia, J., & Wilkes, J. (2001). On algorithms for efficient data migration. In: Proceedings of the Twelfth Annual Symposium on Discrete Algorithms, January 7-9, 2001, Washington, DC, USA, pp. 620–629
Hirofuchi, T., Ogawa, H., Nakada, H., Itoh, S., & Sekiguchi, S. (2009). A live storage migration mechanism over WAN for relocatable virtual machine services on clouds. In: 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, CCGrid 2009, Shanghai, China, 18-21 May 2009, pp. 460–465.
Jansen, K. (1999). An approximation scheme for bin packing with conflicts. Journal of combinatorial optimization, 3(4), 363–377.
Jensen, M., Schwenk, J., Gruschka, N., & Iacono, L. L. (2009). On technical security issues in cloud computing. In: IEEE International Conference on Cloud Computing, CLOUD 2009, Bangalore, India, 21-25 September, 2009, pp. 109–116
Karmarkar, N., & Karp, R. M. (1982). An efficient approximation scheme for the one-dimensional bin-packing problem. In :23rd Annual Symposium on Foundations of Computer Science (sfcs 1982), IEEE, pp. 312–320
Kelarev, A., Seberry, J., Rylands, L., & Yi, X. (2017). Combinatorial algorithms and methods for security of statistical databases related to the work of mirka miller. In: Combinatorial Algorithms - 28th International Workshop, IWOCA 2017, Newcastle, NSW, Australia, July 17-21, 2017, Revised Selected Papers, pp. 383–394
Khuller, S., Kim, Y. A., & Wan, Y. J. (2003). Algorithms for data migration with cloning. In: Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 9-12, 2003, San Diego, CA, USA, pp. 27–36
Liu, Q., Cheng, H., Tian, T., Wang, Y., Leng, J., Zhao, R., Zhang, H., & Wei, L. (2021). Algorithms for the variable-sized bin packing problem with time windows. Computers & Industrial Engineering, 155, 107175.
Lougee-Heimer, R. (2003). The common optimization interface for operations research: Promoting open-source software in the operations research community. IBM Journal of Research & Development, 47(1), 57–66.
Mens, T., Demeyer, S., Hainaut, J.-L., Cleve, A., Henrard, J., & Hick, J.-M. (2008). Migration of legacy information systems. Software Evolution, 105–138.
Murgolo, F. D. (1987). An efficient approximation scheme for variable-sized bin packing. SIAM Journal on Computing, 16(1), 149–161.
Narayanan, D., Thereska, E., Donnelly, A., Elnikety, S., & Rowstron, A. I. T. (2009). Migrating server storage to SSDs: analysis of tradeoffs. In: Proceedings of the 2009 EuroSys Conference, Nuremberg, Germany, April 1-3, 2009, pp. 145–158
Nethercote, N., Stuckey, P. J., Becket, R., Brand, S., Duck, G. J., & Tack, G. (2007). Minizinc: Towards a standard cp modelling language. In: Principles and Practice of Constraint Programming–CP 2007: 13th International Conference, CP 2007, Providence, RI, USA, September 23-27, 2007. Proceedings 13, pp. 529–543. Springer
Otto, A., Otto, C., & Scholl, A. (2013). Systematic data generation & test design for solution algorithms on the example of salbpgen for assembly line balancing. European Journal of Operational Research, 228(1), 33–45.
Patil, S., Roy, S., Augustine, J., Redlich, A., Lodha, S., Vin, H. M., Deshpande, A., Gharote, M. S., & Mehrotra, A. (2010). Minimizing testing overheads in database migration lifecycle. In: COMAD, Citeseer, p. 191
Perron, L., & Furnon, V. (2022) Or-tools. Google.
Quiroz-Castellanos, M., Cruz-Reyes, L., Torres-Jimenez, J., Gómez, C., Huacuja, H. J. F., & Alvim, A. C. (2015). A grou** genetic algorithm with controlled gene transmission for the bin packing problem. Computers and Operations Research, 55, 52–64.
Quiroz-Castellanos, M., Cruz Reyes, L., Torres-Jiménez, J., Santillán, C. G., Fraire Huacuja, H. J., & Alvim, A. C. F. (2015). A grou** genetic algorithm with controlled gene transmission for the bin packing problem. Computers & OR, 55, 52–64.
Saranya, N., Brindha, R., Aishwariya, N., Kokila, R., Matheswaran, P., & Poongavi, P. (2021). Data migration using etl workflow. In: 2021 7th International Conference on Advanced Computing & Communication Systems (ICACCS), vol. 1, IEEE, pp. 1661–1664
Scholl, A., Klein, R., & Jürgens, C. (1997). Bison: A fast hybrid procedure for exactly solving the one-dimensional bin packing problem. Computers & OR, 24(7), 627–645.
Schulte, C., Lagerkvist, M., & Tack, G. (2006). Gecode. Software download & online material at the website: http://www.gecode.org, 11–13
Sianipar, J., Sukmana, M., & Meinel, C. (2018). Moving sensitive data against live memory dum**, spectre and meltdown attacks. In: 2018 26th International Conference on Systems Engineering (ICSEng), IEEE, pp. 1–8
Singh, A., & Gupta, A. K. (2007). Two heuristics for the one-dimensional bin-packing problem. OR Spectrum, 29(4), 765–781.
Subramani, K., Caskurlu, B., & Acikalin, U. U. (2019). Security-aware database migration planning. In: International Symposium on Algorithmic Aspects of Cloud Computing, Springer, pp. 103–121
Subramani, K., Caskurlu, B., & Velasquez, A. (2018) Minimization of testing costs in capacity-constrained database migration. In: Algorithmic Aspects of Cloud Computing - 4th International Symposium, ALGOCLOUD 2018, Helsinki, Finland, August 20-21, 2018, Revised Selected Papers, pp. 1–12.
Syswerda, G. (1991) A study of reproduction in generational and steady-state genetic algorithms. In: Foundations of genetic algorithms, vol. 1. Elsevier, pp. 94–101
Wang, J., & Lochovsky, F. H. (2003). Data extraction and label assignment for web databases. In Proceedings of the Twelfth International World Wide WebConference, WWW 2003, Budapest, Hungary, May 20-24, 2003, pp. 187–196
Wee, T., & Magazine, M. J. (1982). Assembly line balancing as generalized bin packing. Operations Research Letters, 1(2), 56–58.
Wojciechowski, P., Subramani, K., Velasquez, A., & Caskurlu, B. (2021). Algorithmic analysis of priority-based bin packing. In: Conference on Algorithms and Discrete Applied Mathematics, pp. 359–372, Springer
Zhao, X., Lin, Q., Chen, J., Wang, X., Yu, J., & Ming, Z. (2016). Optimizing security and quality of service in a real-time database system using multi-objective genetic algorithm. Expert Syst. Appl., 64, 11–23.
Zuckerman, D. (2006). Linear degree extractors and the inapproximability of max clique and chromatic number. In Proceedings of the thirty-eighth annual ACM symposium on Theory of computing, pp. 681–690
Acknowledgements
This research was supported in part by the Defense Advanced Research Projects Agency through grant HR001123S0001-FP-004.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
We declare that we have no conflict of interest.
Additional information
A preliminary version of this work have appeared in the proceedings of the 5th International Symposium on Algorithmic Aspects of Cloud Computing (ALGOCLOUD 2019) [49].
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Acikalin, U.U., Caskurlu, B. & Subramani, K. Security-Aware Database Migration Planning. Constraints 28, 472–505 (2023). https://doi.org/10.1007/s10601-023-09351-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10601-023-09351-6