Experiences and Lessons Learned Creating and Validating Concept Inventories for Cybersecurity

Sherman, Alan T.; Herman, Geoffrey L.; Oliva, Linda; Peterson, Peter A. H.; Golaszewski, Enis; Poulsen, Seth; Scheponik, Travis; Gorti, Akshita

doi:10.1007/978-3-030-58703-1_1

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1271))

Included in the following conference series:

National Cyber Summit

502 Accesses
2 Citations

Abstract

We reflect on our ongoing journey in the educational Cybersecurity Assessment Tools (CATS) Project to create two concept inventories for cybersecurity. We identify key steps in this journey and important questions we faced. We explain the decisions we made and discuss the consequences of those decisions, highlighting what worked well and what might have gone better.

The CATS Project is creating and validating two concept inventories—conceptual tests of understanding—that can be used to measure the effectiveness of various approaches to teaching and learning cybersecurity. The Cybersecurity Concept Inventory (CCI) is for students who have recently completed any first course in cybersecurity; the Cybersecurity Curriculum Assessment (CCA) is for students who have recently completed an undergraduate major or track in cybersecurity. Each assessment tool comprises 25 multiple-choice questions (MCQs) of various difficulties that target the same five core concepts, but the CCA assumes greater technical background.

Key steps include defining project scope, identifying the core concepts, uncovering student misconceptions, creating scenarios, drafting question stems, develo** distractor answer choices, generating educational materials, performing expert reviews, recruiting student subjects, organizing workshops, building community acceptance, forming a team and nurturing collaboration, adopting tools, and obtaining and using funding.

Creating effective MCQs is difficult and time-consuming, and cybersecurity presents special challenges. Because cybersecurity issues are often subtle, where the adversarial model and details matter greatly, it is challenging to construct MCQs for which there is exactly one best but non-obvious answer. We hope that our experiences and lessons learned may help others create more effective concept inventories and assessments in STEM.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Pathways in Cybersecurity: Translating Theory into Practice

RQ Labs: A Cybersecurity Workforce Skills Development Framework

Article 30 August 2022

Measuring Critical Thinking in Physics: Development and Validation of a Critical Thinking Test in Electricity and Magnetism

Article Open access 29 February 2016

Notes

1.
Science, Technology, Engineering, and Mathematics (STEM).
2.
Personal correspondence with Melissa Dark (Purdue).
3.
As an experiment, select answers to your favorite cybersecurity certification exam looking only at the answer choices and not at the question stems. If you can score significantly better than random guessing, the exam is defective.
4.
Contact Alan Sherman (sherman@umbc.edu).

References

Bechger, T.M., Maris, G., Verstralen, H., Beguin, A.A.: Using classical test theory in combination with item response theory. Appl. Psychol. Meas. 27, 319–334 (2003)
Article MathSciNet Google Scholar
Brame, C.J.: Writing good multiple choice test questions 2019. https://cft.vanderbilt.edu/guides-sub-pages/writing-good-multiple-choice-test-questions/. Accessed 19 Jan 2019
Brown, B.: Delphi process: a methodology used for the elicitation of opinions of experts. CA, USA, Rand Corporation, Santa Monica, September 1968
Google Scholar
U.S. Department of Labor Bureau of Labor Statistics. Information Security Analysts. Occupational Outlook Handbook, September 2019
Google Scholar
George Washington University Arlington Center. Cybersecurity education workshop, February 2014. https://research.gwu.edu/sites/g/files/zaxdzs2176/f/downloads/CEW_FinalReport_040714.pdf
Chi, M.T.H.: Methods to assess the representations of experts’ and novices’ knowledge. In: Cambridge Handbook of Expertise and Expert Performance, pp. 167–184 (2006)
Google Scholar
Ciampa, M.: CompTIA Security+ Guide to Network Security Fundamentals, Loose-Leaf Version, 6th edn. Course Technology Press, Boston (2017)
Google Scholar
CompTIA. CASP (CAS-003) certification study guide: CompTIA IT certifications
Google Scholar
International Information System Security Certification Consortium: Certified information systems security professional. https://www.isc2.org/cissp/default.aspx. Accessed 14 Mar 2017
International Information Systems Security Certification Consortium. GIAC certifications: The highest standard in cyber security certifications. https://www.giac.org/
Cybersecurity and Infrastructure Security Agency. The National Initiative for Cybersecurity Careers & Studies. https://niccs.us-cert.gov/featured-stories/take-cybersecurity-certification-prep-course
Epstein, J.: The calculus concept inventory: measurement of the effect of teaching methodology in mathematics. Not. ACM 60(8), 1018–1025 (2013)
Google Scholar
Ericsson, K.A., Simon, H.A.: Protocol Analysis. MIT Press, Cambridge (1993)
Book Google Scholar
Evans, D.L., Gray, G.L., Krause, S., Martin, J., Midkiff, C., Notaros, B.M., Pavelich, M., Rancour, D., Reed, T., Steif, P., Streveler, R., Wage, K.: Progress on concept inventory assessment tools. In 33rd Annual Frontiers in Education, 2003. FIE 2003, pp. T4G– 1, December 2003
Google Scholar
CSEC2017 Joint Task Force. Cybersecurity curricula 2017. Technical report, CSEC2017 Joint task force, December 2017
Google Scholar
Gibson, D., Anand, V., Dehlinger, J., Dierbach, C., Emmersen, T., Phillips, A.: Accredited undergraduate cybersecurity degrees: four approaches. Computer 52(3), 38–47 (2019)
Article Google Scholar
Glaser, B.G., Strauss, A.L., Strutzel, E.: The discovery of grounded theory: strategies for qualitative research. Nurs. Res. 17(4), 364 (1968)
Article Google Scholar
Goldman, K., Gross, P., Heeren, C., Herman, G.L., Kaczmarczy, L., Loui, M.C., Zilles, C.: Setting the scope of concept inventories for introductory computing subject. ACM Trans. Comput. Educ. 10(2), 1–29 (2010)
Article Google Scholar
Hake, R.: Interactive-engagement versus traditional methods: a six-thousand-student survey of mechanics test data for introductory physics courses. Am. J. Phys. 66(1), 64–74 (1998)
Article Google Scholar
Hake, R.: Lessons from the physics-education reform effort. Conserv. Ecol. 5, 07 (2001)
Article Google Scholar
Hambleton, R.K., Jones, R.J.: Comparison of classical test theory and item response theory and their applications to test development. Educ. Meas. Issues Pract. 12, 253–262 (1993)
Google Scholar
Herman, G.L., Loui, M.C., Zilles, C.: Creating the digital logic concept inventory. In: Proceedings of the 41st ACM Technical Symposium on Computer Science Education, pp. 102–106, January 2010
Google Scholar
Hestenes, D., Wells, M., Swackhamer, G.: Force concept inventory. Phys. Teach. 30, 141–166 (1992)
Article Google Scholar
**deel, M.: I just don’t get it: common security misconceptions. Master’s thesis, University of Minnesota, June 2019
Google Scholar
Association for computing machinery (ACM) joint task force on computing curricula and IEEE Computer Society. Computer Science Curricula: Curriculum Guidelines for Undergraduate Degree Programs in Computer Science. Association for Computing Machinery, New York (2013)
Google Scholar
Jorion, N., Gane, B., James, K., Schroeder, L., DiBello, L., Pellegrino, J.: An analytic framework for evaluating the validity of concept inventory claims. J. Eng. Educ. 104(4), 454–496 (2015)
Article Google Scholar
Kittur, A., Chi, E.H., Suh, B.: Crowdsourcing user studies with Mechanical Turk. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 453–456, April 2008
Google Scholar
Litzinger, T.A., Van Meter, P., Firetto, C.M., Passmore, L.J., Masters, C.B., Turns, S.R., Gray, G.L., Costanzo, F., Zappe, S.E.: A cognitive study of problem solving in statics. J. Eng. Educ. 99(4), 337–353 (2010)
Article Google Scholar
Mestre, J.P., Dufresne, R.J., Gerace, W.J., Hardiman, P.T., Touger, J.S.: Promoting skilled problem-solving behavior among beginning physics students. J. Res. Sci. Teach. 30(3), 303–317 (1993)
Article Google Scholar
NIST. NICE framework. http://csrc.nist.gov/nice/framework/. Accessed 8 Oct 2016
Offenberger, S., Herman, G.L., Peterson, P., Sherman, A.T., Golaszewski, E., Scheponik, T., Oliva, L.: Initial validation of the cybersecurity concept inventory: pilot testing and expert review. In: Proceedings of Frontiers in Education Conference, October 2019
Google Scholar
Olds, B.M., Moskal, B.M., Miller, R.L.: Assessment in engineering education: evolution, approaches and future collaborations. J. Eng. Educ. 94(1), 13–25 (2005)
Article Google Scholar
Parekh, G., DeLatte, D., Herman, G.L., Oliva, L., Phatak, D., Scheponik, T., Sherman, A.T.: Identifying core concepts of cybersecurity: results of two Delphi processes. IEEE Trans. Educ. 61(11), 11–20 (2016)
Google Scholar
DiBello, L.V., James, K., Jorion, N. and Schroeder, L., Pellegrino, J.W.: Concept inventories as aids for instruction: a validity framework with examples of application. In: Proceedings of Research in Engineering Education Symposium, Madrid, Spain (2011)
Google Scholar
Pellegrino, J., Chudowsky, N., Glaser, R.: Knowing What Students Know: The Science and Design of Educational Assessment. National Academy Press, Washington DC (2001)
Google Scholar
Peterson, P.A.H, **deel, M., Straumann, A., Smith, J., Pederson, A., Geraci, B., Nowaczek, J., Powers, C., Kuutti, K.: The security misconceptions project, October 2019. https://secmisco.blogspot.com/
Porter, L., Zingaro, D., Liao, S.N., Taylor, C., Webb, K.C., Lee, C., Clancy, M.: BDSI: a validated concept inventory for basic data structures. In: Proceedings of the 2019 ACM Conference on International Computing Education Research, ICER 2019, pp. 111–119. Association for Computing Machinery, New York (2019)
Google Scholar
Scheponik, T., Golaszewski, E., Herman, G., Offenberger, S., Oliva, L., Peterson, P.A.H., Sherman, A.T.: Investigating crowdsourcing to generate distractors for multiple-choice assessments (2019). https://arxiv.org/pdf/1909.04230.pdf
Scheponik, T., Golaszewski, E., Herman, G., Offenberger, S., Oliva, L., Peterson, P.A. and Sherman, A.T.: Investigating crowdsourcing to generate distractors for multiple-choice assessments. In: Choo, K.K.R., Morris, T.H., Peterson, G.L. (eds.) National Cyber Summit (NCS) Research Track, pp. 185–201. Springer, Cham (2020)
Google Scholar
Scheponik, T., Sherman, A.T., DeLatte, D., Phatak, D., Oliva, L., Thompson, J. and Herman, G.L.: How students reason about cybersecurity concepts. In: IEEE Frontiers in Education Conference (FIE), pp. 1–5, October 2016
Google Scholar
Offensive Security. Penetration Testing with Kali Linux (PWK). https://www.offensive-security.com/pwk-oscp/
Sherman, A.T., DeLatte, D., Herman, G.L., Neary, M., Oliva, L., Phatak, D., Scheponik, T., Thompson, J.: Cybersecurity: exploring core concepts through six scenarios. Cryptologia 42(4), 337–377 (2018)
Article Google Scholar
Sherman, A.T., Oliva, L., Golaszewski, E., Phatak, D., Scheponik, T., Herman, G.L., Choi, D.S., Offenberger, S.E., Peterson, P., Dykstra, J., Bard, G.V., Chattopadhyay, A., Sharevski, F., Verma, R., Vrecenar, R.: The CATS hackathon: creating and refining test items for cybersecurity concept inventories. IEEE Secur. Priv. 17(6), 77–83 (2019)
Article Google Scholar
Sherman, A.T., Oliva, L., DeLatte, D., Golaszewski, E., Neary, M., Patsourakos, K., Phatak, D., Scheponik, T., Herman, G.L. and Thompson, J.: Creating a cybersecurity concept inventory: a status report on the CATS Project. In: 2017 National Cyber Summit, June 2017
Google Scholar
Thompson, J.D., Herman, G.L., Scheponik, T., Oliva, L., Sherman, A.T., Golaszewski, E., Phatak, D., Patsourakos, K.: Student misconceptions about cybersecurity concepts: analysis of think-aloud interviews. J. Cybersecurity Educ. Res. Pract. 2018(1), 5 (2018)
Google Scholar
Walker, M.: CEH Certified Ethical Hacker All-in-One Exam Guide, 1st edn. McGraw-Hill Osborne Media, USA (2011)
Google Scholar
Wallace, C.S., Bailey, J.M.: Do concept inventories actually measure anything? Astron. Educ. Rev. 9(1), 010116 (2010)
Article Google Scholar
West, M., Herman, G.L. and Zilles, C.: PrairieLearn: mastery-based online problem solving with adaptive scoring and recommendations driven by machine learning. In: 2015 ASEE Annual Conference and Exposition, ASEE Conferences, Seattle, Washington, June 2015
Google Scholar

Download references

Acknowledgments

We thank the many people who contributed to the CATS project as Delphi experts, interview subjects, Hackathon participants, expert reviewers, student subjects, and former team members, including Michael Neary, Spencer Offenberger, Geet Parekh, Konstantinos Patsourakos, Dhananjay Phatak, and Julia Thompson. Support for this research was provided in part by the U.S. Department of Defense under CAE-R grants H98230-15-1-0294, H98230-15-1-0273, H98230-17-1-0349, H98230-17-1-0347; and by the National Science Foundation under UMBC SFS grants DGE-1241576, 1753681, and SFS Capacity Grants DGE-1819521, 1820531.

Author information

Authors and Affiliations

Cyber Defense Lab, University of Maryland, Baltimore County (UMBC), Baltimore, MD, 21250, USA
Alan T. Sherman, Linda Oliva, Enis Golaszewski, Travis Scheponik & Akshita Gorti
Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
Geoffrey L. Herman & Seth Poulsen
Department of Computer Science, University of Minnesota Duluth, Duluth, MN, 55812, USA
Peter A. H. Peterson

Authors

Alan T. Sherman
View author publications
You can also search for this author in PubMed Google Scholar
Geoffrey L. Herman
View author publications
You can also search for this author in PubMed Google Scholar
Linda Oliva
View author publications
You can also search for this author in PubMed Google Scholar
Peter A. H. Peterson
View author publications
You can also search for this author in PubMed Google Scholar
Enis Golaszewski
View author publications
You can also search for this author in PubMed Google Scholar
Seth Poulsen
View author publications
You can also search for this author in PubMed Google Scholar
Travis Scheponik
View author publications
You can also search for this author in PubMed Google Scholar
Akshita Gorti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alan T. Sherman .

Editor information

Editors and Affiliations

Department of Information Systems, The University of Texas at San Antonio, San Antonio, TX, USA
Kim-Kwang Raymond Choo
University of Alabama in Huntsville, Huntsville, AL, USA
Tommy Morris
Air Force Institute of Technology, Wright-Patterson AFB, OH, USA
Gilbert L. Peterson
University of Alabama in Huntsville, Decatur, AL, USA
Eric Imsand

A A Crowdsourcing Experiment

To investigate how subjects respond to the initial and final versions of the CCA switchbox question (see Sects. 3.5–3.7), we polled 100 workers on Amazon Mechanical Turk (AMT) [27]. We also explored the strategy of generating distractors through crowdsourcing, continuing a previous research idea of ours [38, 39]. Specifically, we sought responses from another 200 workers to the initial and final stems without providing any alternatives.

On March 29–30, 2020, at separate times, we posted four separate tasks on AMT. Tasks 1 and 2 presented the CCA switchbox test item with alternatives, for the initial and final versions of the test item, respectively. Tasks 3 and 4 presented the CCA switchbox stem with no alternatives, for the initial and final versions of the stem, respectively.

For Tasks 1–2, we solicited 50 workers each, and for Tasks 3–4, we solicited 100 workers each. For each task we sought human workers who graduated from college with a major in computer science or related field. We offered a reward of $0.25 per completed valid response and set a time limit of 10 min per task.

Expecting computer bots and human cheaters (people who do not make a genuine effort to answer the question), we included two control questions (see Fig. 4), in addition to the main test item. Because we expected many humans to lie about their college major, we constructed the first control question to detect subjects who were not human or who knew nothing about computer science. All computer science majors should be very familiar with the binary number system. We expect that most bots will be unable to perform the visual, linguistic, and cognitive processing required to answer Question 1. Because $11\,+\,62\,=\,73\,=\,64\,+\,8\,+\,1$, the answer to Question 1 is 1001001.

We received a total of 547 responses, of which we deemed 425(78%) valid. As often happens with AMT, we received more responses than we had solicited, because some workers responded directly without payment to our SurveyMonkey form, bypassing AMT. More specifically, we received 40, 108, 194, 205 responses for Tasks 1–4, respectively, for which 40(100%), 108(100%), 120(62%), 157(77%) were valid, respectively. In this filtering, we considered a response valid if and only if it was non-blank and appeared to answer the question in a meaningful way. We excluded responses that did not pertain to the subject matter (e.g., song lyrics), or that did not appear to reflect genuine effort (e.g., “OPEN ENDED RESPONSE”).

Only ten (3%) of the valid responses included a correct answer to Question 1. Only 27(7%) of the workers with valid responses stated that they majored in computer science or related field, of whom only two (7%) answered Question 1 correctly. Thus, the workers who responded to our tasks are very different from the intended population for the CCA.

Originally, we had planned to deem a response valid if and only if it included answers to all three questions, with correct answers to each of the two control questions. Instead, we continued to analyze our data adopting the extremely lenient definition described above, understanding that the results would not be relevant to the CCA.

We processed results from Tasks 3–4 as follows, separately for each task. First, we discarded the invalid responses. Second, we identified which valid responses matched existing alternatives. Third, we grouped the remaining (non-matching) valid new responses into equivalency classes. Fourth, we refined a canonical response for each of these equivalency classes (see Fig. 8). The most time-consuming steps were filtering the responses for validity and refining the canonical responses.

Especially considering that the AMT worker population differs from the target CCA audience, it is important to recognize that the best distractors for the CCA are not necessarily the most popular responses from AMT workers. For this reason, we examined all responses. Figure 8 includes two examples, $t*$ for initial version (0.7%) and $t*$ for final version (1.2%), of interesting unpopular responses. Also, the alternatives should satisfy various constraints, including being non-overlap** (see Sect. 3.7), so one should not simply automatically choose the four most popular distractors.

Figures 5, 6, 7 and 8 summarize our findings. Figure 5 does not reveal any dramatic change in responses between the initial and final versions of the test item. It is striking how strongly the workers prefer Distractor E over the correct choice A. Perhaps the workers, who do not know network security, find Distractor E the most understandable and logical choice. Also, its broad wording may be appealing. In Sect. 3.7, we explain how we reworded Distractor E to ensure that Alternative A is now unarguably better than E.

Figure 6 shows the popularity of worker-generated responses to the stem, when workers were not given any alternatives. After making Fig. 6, we realized it contains a mistake: responses t2 (initial version) and t1 (final version) are correct answers, which should have been matched and grouped with Alternative A. We nevertheless include this figure because it is informative. Especially for the initial version of the stem, it is notable how more popular the two worker-generated distractors t2 (initial version) and t1 (final version) are than our distractors. This finding is not unexpected, because, in the final version, we had intentionally chosen Alternative A over t1, to make the correct answer less obvious (see Sect. 3.7). These data support our belief that subjects would find t1 more popular than Alternative A.

Figure 7 is the corrected version of Fig. 6, with responses t2 (initial version) and t1 (final version) grouped with the correct answer A. Although new distractor t1 (initial version) was very popular, its broad nebulous form makes it unlikely to contribute useful information about student conceptual understanding. For the first version of the stem, we identified seven equivalency classes of new distractors (excluding the new alternate correct answers); for the final version we identified 12.

Because the population of these workers differs greatly from the intended audience for the CCA, these results should not be used to make any inferences about the switchbox test item for the CCA’s intended audience. Nevertheless, our experiment illustrates some notable facts about crowdsourcing with AMT. (1) Collecting data is fast and inexpensive: we collected all of our responses within 24 hours paying a total of less than $40 in worker fees (we did not pay for any invalid responses). (2) We had virtually no control over the selection of workers, and almost all of them seemed ill-suited for the task (did not answer Question 1 correctly). Nevertheless, several of the open-ended responses reflected a thoughtful understanding of the scenario. (3) Tasks 3–4 did produce distractors (new and old) of note for the worker population, thereby illustrating the potential of using crowdsourcing to generate distractors, if the selected population could be adequately controlled. (4) Even when the worker population differs from the desired target population, their responses can be useful if they inspire test developers to improve test items and to think of effective distractors.

Despite our disappointment with workers answering Question 1 incorrectly, the experience helped us refine the switchbox test item. Reflecting on the data, the problem development team met and made some improvements to the scenario and distractors (see discussions near the ends of Sects. 3.5 and 3.7). Even though the AMT workers represent a different population than our CCA target audience, their responses helped direct our attention to potential ways to improve the test item.

We strongly believe in the potential of using crowdsourcing to help generate distractors and improve test items. Being able to verify the credentials of workers assuredly (e.g., cryptographically) would greatly enhance the value of AMT.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sherman, A.T. et al. (2021). Experiences and Lessons Learned Creating and Validating Concept Inventories for Cybersecurity. In: Choo, KK.R., Morris, T., Peterson, G.L., Imsand, E. (eds) National Cyber Summit (NCS) Research Track 2020. NCS 2020. Advances in Intelligent Systems and Computing, vol 1271. Springer, Cham. https://doi.org/10.1007/978-3-030-58703-1_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-58703-1_1
Published: 09 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58702-4
Online ISBN: 978-3-030-58703-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics