Abstract
We reflect on our ongoing journey in the educational Cybersecurity Assessment Tools (CATS) Project to create two concept inventories for cybersecurity. We identify key steps in this journey and important questions we faced. We explain the decisions we made and discuss the consequences of those decisions, highlighting what worked well and what might have gone better.
The CATS Project is creating and validating two concept inventories—conceptual tests of understanding—that can be used to measure the effectiveness of various approaches to teaching and learning cybersecurity. The Cybersecurity Concept Inventory (CCI) is for students who have recently completed any first course in cybersecurity; the Cybersecurity Curriculum Assessment (CCA) is for students who have recently completed an undergraduate major or track in cybersecurity. Each assessment tool comprises 25 multiple-choice questions (MCQs) of various difficulties that target the same five core concepts, but the CCA assumes greater technical background.
Key steps include defining project scope, identifying the core concepts, uncovering student misconceptions, creating scenarios, drafting question stems, develo** distractor answer choices, generating educational materials, performing expert reviews, recruiting student subjects, organizing workshops, building community acceptance, forming a team and nurturing collaboration, adopting tools, and obtaining and using funding.
Creating effective MCQs is difficult and time-consuming, and cybersecurity presents special challenges. Because cybersecurity issues are often subtle, where the adversarial model and details matter greatly, it is challenging to construct MCQs for which there is exactly one best but non-obvious answer. We hope that our experiences and lessons learned may help others create more effective concept inventories and assessments in STEM.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Science, Technology, Engineering, and Mathematics (STEM).
- 2.
Personal correspondence with Melissa Dark (Purdue).
- 3.
As an experiment, select answers to your favorite cybersecurity certification exam looking only at the answer choices and not at the question stems. If you can score significantly better than random guessing, the exam is defective.
- 4.
Contact Alan Sherman (sherman@umbc.edu).
References
Bechger, T.M., Maris, G., Verstralen, H., Beguin, A.A.: Using classical test theory in combination with item response theory. Appl. Psychol. Meas. 27, 319–334 (2003)
Brame, C.J.: Writing good multiple choice test questions 2019. https://cft.vanderbilt.edu/guides-sub-pages/writing-good-multiple-choice-test-questions/. Accessed 19 Jan 2019
Brown, B.: Delphi process: a methodology used for the elicitation of opinions of experts. CA, USA, Rand Corporation, Santa Monica, September 1968
U.S. Department of Labor Bureau of Labor Statistics. Information Security Analysts. Occupational Outlook Handbook, September 2019
George Washington University Arlington Center. Cybersecurity education workshop, February 2014. https://research.gwu.edu/sites/g/files/zaxdzs2176/f/downloads/CEW_FinalReport_040714.pdf
Chi, M.T.H.: Methods to assess the representations of experts’ and novices’ knowledge. In: Cambridge Handbook of Expertise and Expert Performance, pp. 167–184 (2006)
Ciampa, M.: CompTIA Security+ Guide to Network Security Fundamentals, Loose-Leaf Version, 6th edn. Course Technology Press, Boston (2017)
CompTIA. CASP (CAS-003) certification study guide: CompTIA IT certifications
International Information System Security Certification Consortium: Certified information systems security professional. https://www.isc2.org/cissp/default.aspx. Accessed 14 Mar 2017
International Information Systems Security Certification Consortium. GIAC certifications: The highest standard in cyber security certifications. https://www.giac.org/
Cybersecurity and Infrastructure Security Agency. The National Initiative for Cybersecurity Careers & Studies. https://niccs.us-cert.gov/featured-stories/take-cybersecurity-certification-prep-course
Epstein, J.: The calculus concept inventory: measurement of the effect of teaching methodology in mathematics. Not. ACM 60(8), 1018–1025 (2013)
Ericsson, K.A., Simon, H.A.: Protocol Analysis. MIT Press, Cambridge (1993)
Evans, D.L., Gray, G.L., Krause, S., Martin, J., Midkiff, C., Notaros, B.M., Pavelich, M., Rancour, D., Reed, T., Steif, P., Streveler, R., Wage, K.: Progress on concept inventory assessment tools. In 33rd Annual Frontiers in Education, 2003. FIE 2003, pp. T4G– 1, December 2003
CSEC2017 Joint Task Force. Cybersecurity curricula 2017. Technical report, CSEC2017 Joint task force, December 2017
Gibson, D., Anand, V., Dehlinger, J., Dierbach, C., Emmersen, T., Phillips, A.: Accredited undergraduate cybersecurity degrees: four approaches. Computer 52(3), 38–47 (2019)
Glaser, B.G., Strauss, A.L., Strutzel, E.: The discovery of grounded theory: strategies for qualitative research. Nurs. Res. 17(4), 364 (1968)
Goldman, K., Gross, P., Heeren, C., Herman, G.L., Kaczmarczy, L., Loui, M.C., Zilles, C.: Setting the scope of concept inventories for introductory computing subject. ACM Trans. Comput. Educ. 10(2), 1–29 (2010)
Hake, R.: Interactive-engagement versus traditional methods: a six-thousand-student survey of mechanics test data for introductory physics courses. Am. J. Phys. 66(1), 64–74 (1998)
Hake, R.: Lessons from the physics-education reform effort. Conserv. Ecol. 5, 07 (2001)
Hambleton, R.K., Jones, R.J.: Comparison of classical test theory and item response theory and their applications to test development. Educ. Meas. Issues Pract. 12, 253–262 (1993)
Herman, G.L., Loui, M.C., Zilles, C.: Creating the digital logic concept inventory. In: Proceedings of the 41st ACM Technical Symposium on Computer Science Education, pp. 102–106, January 2010
Hestenes, D., Wells, M., Swackhamer, G.: Force concept inventory. Phys. Teach. 30, 141–166 (1992)
**deel, M.: I just don’t get it: common security misconceptions. Master’s thesis, University of Minnesota, June 2019
Association for computing machinery (ACM) joint task force on computing curricula and IEEE Computer Society. Computer Science Curricula: Curriculum Guidelines for Undergraduate Degree Programs in Computer Science. Association for Computing Machinery, New York (2013)
Jorion, N., Gane, B., James, K., Schroeder, L., DiBello, L., Pellegrino, J.: An analytic framework for evaluating the validity of concept inventory claims. J. Eng. Educ. 104(4), 454–496 (2015)
Kittur, A., Chi, E.H., Suh, B.: Crowdsourcing user studies with Mechanical Turk. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 453–456, April 2008
Litzinger, T.A., Van Meter, P., Firetto, C.M., Passmore, L.J., Masters, C.B., Turns, S.R., Gray, G.L., Costanzo, F., Zappe, S.E.: A cognitive study of problem solving in statics. J. Eng. Educ. 99(4), 337–353 (2010)
Mestre, J.P., Dufresne, R.J., Gerace, W.J., Hardiman, P.T., Touger, J.S.: Promoting skilled problem-solving behavior among beginning physics students. J. Res. Sci. Teach. 30(3), 303–317 (1993)
NIST. NICE framework. http://csrc.nist.gov/nice/framework/. Accessed 8 Oct 2016
Offenberger, S., Herman, G.L., Peterson, P., Sherman, A.T., Golaszewski, E., Scheponik, T., Oliva, L.: Initial validation of the cybersecurity concept inventory: pilot testing and expert review. In: Proceedings of Frontiers in Education Conference, October 2019
Olds, B.M., Moskal, B.M., Miller, R.L.: Assessment in engineering education: evolution, approaches and future collaborations. J. Eng. Educ. 94(1), 13–25 (2005)
Parekh, G., DeLatte, D., Herman, G.L., Oliva, L., Phatak, D., Scheponik, T., Sherman, A.T.: Identifying core concepts of cybersecurity: results of two Delphi processes. IEEE Trans. Educ. 61(11), 11–20 (2016)
DiBello, L.V., James, K., Jorion, N. and Schroeder, L., Pellegrino, J.W.: Concept inventories as aids for instruction: a validity framework with examples of application. In: Proceedings of Research in Engineering Education Symposium, Madrid, Spain (2011)
Pellegrino, J., Chudowsky, N., Glaser, R.: Knowing What Students Know: The Science and Design of Educational Assessment. National Academy Press, Washington DC (2001)
Peterson, P.A.H, **deel, M., Straumann, A., Smith, J., Pederson, A., Geraci, B., Nowaczek, J., Powers, C., Kuutti, K.: The security misconceptions project, October 2019. https://secmisco.blogspot.com/
Porter, L., Zingaro, D., Liao, S.N., Taylor, C., Webb, K.C., Lee, C., Clancy, M.: BDSI: a validated concept inventory for basic data structures. In: Proceedings of the 2019 ACM Conference on International Computing Education Research, ICER 2019, pp. 111–119. Association for Computing Machinery, New York (2019)
Scheponik, T., Golaszewski, E., Herman, G., Offenberger, S., Oliva, L., Peterson, P.A.H., Sherman, A.T.: Investigating crowdsourcing to generate distractors for multiple-choice assessments (2019). https://arxiv.org/pdf/1909.04230.pdf
Scheponik, T., Golaszewski, E., Herman, G., Offenberger, S., Oliva, L., Peterson, P.A. and Sherman, A.T.: Investigating crowdsourcing to generate distractors for multiple-choice assessments. In: Choo, K.K.R., Morris, T.H., Peterson, G.L. (eds.) National Cyber Summit (NCS) Research Track, pp. 185–201. Springer, Cham (2020)
Scheponik, T., Sherman, A.T., DeLatte, D., Phatak, D., Oliva, L., Thompson, J. and Herman, G.L.: How students reason about cybersecurity concepts. In: IEEE Frontiers in Education Conference (FIE), pp. 1–5, October 2016
Offensive Security. Penetration Testing with Kali Linux (PWK). https://www.offensive-security.com/pwk-oscp/
Sherman, A.T., DeLatte, D., Herman, G.L., Neary, M., Oliva, L., Phatak, D., Scheponik, T., Thompson, J.: Cybersecurity: exploring core concepts through six scenarios. Cryptologia 42(4), 337–377 (2018)
Sherman, A.T., Oliva, L., Golaszewski, E., Phatak, D., Scheponik, T., Herman, G.L., Choi, D.S., Offenberger, S.E., Peterson, P., Dykstra, J., Bard, G.V., Chattopadhyay, A., Sharevski, F., Verma, R., Vrecenar, R.: The CATS hackathon: creating and refining test items for cybersecurity concept inventories. IEEE Secur. Priv. 17(6), 77–83 (2019)
Sherman, A.T., Oliva, L., DeLatte, D., Golaszewski, E., Neary, M., Patsourakos, K., Phatak, D., Scheponik, T., Herman, G.L. and Thompson, J.: Creating a cybersecurity concept inventory: a status report on the CATS Project. In: 2017 National Cyber Summit, June 2017
Thompson, J.D., Herman, G.L., Scheponik, T., Oliva, L., Sherman, A.T., Golaszewski, E., Phatak, D., Patsourakos, K.: Student misconceptions about cybersecurity concepts: analysis of think-aloud interviews. J. Cybersecurity Educ. Res. Pract. 2018(1), 5 (2018)
Walker, M.: CEH Certified Ethical Hacker All-in-One Exam Guide, 1st edn. McGraw-Hill Osborne Media, USA (2011)
Wallace, C.S., Bailey, J.M.: Do concept inventories actually measure anything? Astron. Educ. Rev. 9(1), 010116 (2010)
West, M., Herman, G.L. and Zilles, C.: PrairieLearn: mastery-based online problem solving with adaptive scoring and recommendations driven by machine learning. In: 2015 ASEE Annual Conference and Exposition, ASEE Conferences, Seattle, Washington, June 2015
Acknowledgments
We thank the many people who contributed to the CATS project as Delphi experts, interview subjects, Hackathon participants, expert reviewers, student subjects, and former team members, including Michael Neary, Spencer Offenberger, Geet Parekh, Konstantinos Patsourakos, Dhananjay Phatak, and Julia Thompson. Support for this research was provided in part by the U.S. Department of Defense under CAE-R grants H98230-15-1-0294, H98230-15-1-0273, H98230-17-1-0349, H98230-17-1-0347; and by the National Science Foundation under UMBC SFS grants DGE-1241576, 1753681, and SFS Capacity Grants DGE-1819521, 1820531.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A A Crowdsourcing Experiment
A A Crowdsourcing Experiment
To investigate how subjects respond to the initial and final versions of the CCA switchbox question (see Sects. 3.5–3.7), we polled 100 workers on Amazon Mechanical Turk (AMT) [27]. We also explored the strategy of generating distractors through crowdsourcing, continuing a previous research idea of ours [38, 39]. Specifically, we sought responses from another 200 workers to the initial and final stems without providing any alternatives.
On March 29–30, 2020, at separate times, we posted four separate tasks on AMT. Tasks 1 and 2 presented the CCA switchbox test item with alternatives, for the initial and final versions of the test item, respectively. Tasks 3 and 4 presented the CCA switchbox stem with no alternatives, for the initial and final versions of the stem, respectively.
For Tasks 1–2, we solicited 50 workers each, and for Tasks 3–4, we solicited 100 workers each. For each task we sought human workers who graduated from college with a major in computer science or related field. We offered a reward of $0.25 per completed valid response and set a time limit of 10 min per task.
Expecting computer bots and human cheaters (people who do not make a genuine effort to answer the question), we included two control questions (see Fig. 4), in addition to the main test item. Because we expected many humans to lie about their college major, we constructed the first control question to detect subjects who were not human or who knew nothing about computer science. All computer science majors should be very familiar with the binary number system. We expect that most bots will be unable to perform the visual, linguistic, and cognitive processing required to answer Question 1. Because \(11\,+\,62\,=\,73\,=\,64\,+\,8\,+\,1\), the answer to Question 1 is 1001001.
We received a total of 547 responses, of which we deemed 425(78%) valid. As often happens with AMT, we received more responses than we had solicited, because some workers responded directly without payment to our SurveyMonkey form, bypassing AMT. More specifically, we received 40, 108, 194, 205 responses for Tasks 1–4, respectively, for which 40(100%), 108(100%), 120(62%), 157(77%) were valid, respectively. In this filtering, we considered a response valid if and only if it was non-blank and appeared to answer the question in a meaningful way. We excluded responses that did not pertain to the subject matter (e.g., song lyrics), or that did not appear to reflect genuine effort (e.g., “OPEN ENDED RESPONSE”).
Only ten (3%) of the valid responses included a correct answer to Question 1. Only 27(7%) of the workers with valid responses stated that they majored in computer science or related field, of whom only two (7%) answered Question 1 correctly. Thus, the workers who responded to our tasks are very different from the intended population for the CCA.
Originally, we had planned to deem a response valid if and only if it included answers to all three questions, with correct answers to each of the two control questions. Instead, we continued to analyze our data adopting the extremely lenient definition described above, understanding that the results would not be relevant to the CCA.
We processed results from Tasks 3–4 as follows, separately for each task. First, we discarded the invalid responses. Second, we identified which valid responses matched existing alternatives. Third, we grouped the remaining (non-matching) valid new responses into equivalency classes. Fourth, we refined a canonical response for each of these equivalency classes (see Fig. 8). The most time-consuming steps were filtering the responses for validity and refining the canonical responses.
Especially considering that the AMT worker population differs from the target CCA audience, it is important to recognize that the best distractors for the CCA are not necessarily the most popular responses from AMT workers. For this reason, we examined all responses. Figure 8 includes two examples, \(t*\) for initial version (0.7%) and \(t*\) for final version (1.2%), of interesting unpopular responses. Also, the alternatives should satisfy various constraints, including being non-overlap** (see Sect. 3.7), so one should not simply automatically choose the four most popular distractors.
Figures 5, 6, 7 and 8 summarize our findings. Figure 5 does not reveal any dramatic change in responses between the initial and final versions of the test item. It is striking how strongly the workers prefer Distractor E over the correct choice A. Perhaps the workers, who do not know network security, find Distractor E the most understandable and logical choice. Also, its broad wording may be appealing. In Sect. 3.7, we explain how we reworded Distractor E to ensure that Alternative A is now unarguably better than E.
Figure 6 shows the popularity of worker-generated responses to the stem, when workers were not given any alternatives. After making Fig. 6, we realized it contains a mistake: responses t2 (initial version) and t1 (final version) are correct answers, which should have been matched and grouped with Alternative A. We nevertheless include this figure because it is informative. Especially for the initial version of the stem, it is notable how more popular the two worker-generated distractors t2 (initial version) and t1 (final version) are than our distractors. This finding is not unexpected, because, in the final version, we had intentionally chosen Alternative A over t1, to make the correct answer less obvious (see Sect. 3.7). These data support our belief that subjects would find t1 more popular than Alternative A.
Figure 7 is the corrected version of Fig. 6, with responses t2 (initial version) and t1 (final version) grouped with the correct answer A. Although new distractor t1 (initial version) was very popular, its broad nebulous form makes it unlikely to contribute useful information about student conceptual understanding. For the first version of the stem, we identified seven equivalency classes of new distractors (excluding the new alternate correct answers); for the final version we identified 12.
Because the population of these workers differs greatly from the intended audience for the CCA, these results should not be used to make any inferences about the switchbox test item for the CCA’s intended audience. Nevertheless, our experiment illustrates some notable facts about crowdsourcing with AMT. (1) Collecting data is fast and inexpensive: we collected all of our responses within 24 hours paying a total of less than $40 in worker fees (we did not pay for any invalid responses). (2) We had virtually no control over the selection of workers, and almost all of them seemed ill-suited for the task (did not answer Question 1 correctly). Nevertheless, several of the open-ended responses reflected a thoughtful understanding of the scenario. (3) Tasks 3–4 did produce distractors (new and old) of note for the worker population, thereby illustrating the potential of using crowdsourcing to generate distractors, if the selected population could be adequately controlled. (4) Even when the worker population differs from the desired target population, their responses can be useful if they inspire test developers to improve test items and to think of effective distractors.
Despite our disappointment with workers answering Question 1 incorrectly, the experience helped us refine the switchbox test item. Reflecting on the data, the problem development team met and made some improvements to the scenario and distractors (see discussions near the ends of Sects. 3.5 and 3.7). Even though the AMT workers represent a different population than our CCA target audience, their responses helped direct our attention to potential ways to improve the test item.
We strongly believe in the potential of using crowdsourcing to help generate distractors and improve test items. Being able to verify the credentials of workers assuredly (e.g., cryptographically) would greatly enhance the value of AMT.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Sherman, A.T. et al. (2021). Experiences and Lessons Learned Creating and Validating Concept Inventories for Cybersecurity. In: Choo, KK.R., Morris, T., Peterson, G.L., Imsand, E. (eds) National Cyber Summit (NCS) Research Track 2020. NCS 2020. Advances in Intelligent Systems and Computing, vol 1271. Springer, Cham. https://doi.org/10.1007/978-3-030-58703-1_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-58703-1_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58702-4
Online ISBN: 978-3-030-58703-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)