Evolution of publicly available large language models for complex decision-making in breast cancer care

Griewing, Sebastian; Knitza, Johannes; Boekhoff, Jelena; Hillen, Christoph; Lechner, Fabian; Wagner, Uwe; Wallwiener, Markus; Kuhn, Sebastian

doi:10.1007/s00404-024-07565-4

Evolution of publicly available large language models for complex decision-making in breast cancer care

Gynecologic Oncology
Open access
Published: 29 May 2024

Volume 310, pages 537–550, (2024)
Cite this article

Download PDF

You have full access to this open access article

Archives of Gynecology and Obstetrics Aims and scope Submit manuscript

Evolution of publicly available large language models for complex decision-making in breast cancer care

Download PDF

716 Accesses
4 Altmetric
Explore all metrics

Abstract

Purpose

This study investigated the concordance of five different publicly available Large Language Models (LLM) with the recommendations of a multidisciplinary tumor board regarding treatment recommendations for complex breast cancer patient profiles.

Methods

Five LLM, including three versions of ChatGPT (version 4 and 3.5, with data access until September 3021 and January 2022), Llama2, and Bard were prompted to produce treatment recommendations for 20 complex breast cancer patient profiles. LLM recommendations were compared to the recommendations of a multidisciplinary tumor board (gold standard), including surgical, endocrine and systemic treatment, radiotherapy, and genetic testing therapy options.

Results

GPT4 demonstrated the highest concordance (70.6%) for invasive breast cancer patient profiles, followed by GPT3.5 September 2021 (58.8%), GPT3.5 January 2022 (41.2%), Llama2 (35.3%) and Bard (23.5%). Including precancerous lesions of ductal carcinoma in situ, the identical ranking was reached with lower overall concordance for each LLM (GPT4 60.0%, GPT3.5 September 2021 50.0%, GPT3.5 January 2022 35.0%, Llama2 30.0%, Bard 20.0%). GPT4 achieved full concordance (100%) for radiotherapy. Lowest alignment was reached in recommending genetic testing, demonstrating a varying concordance (55.0% for GPT3.5 January 2022, Llama2 and Bard up to 85.0% for GPT4).

Conclusion

This early feasibility study is the first to compare different LLM in breast cancer care with regard to changes in accuracy over time, i.e., with access to more data or through technological upgrades. Methodological advancement, i.e., the optimization of prompting techniques, and technological development, i.e., enabling data input control and secure data processing, are necessary in the preparation of large-scale and multicenter studies to provide evidence on their safe and reliable clinical application. At present, safe and evidenced use of LLM in clinical breast cancer care is not yet feasible.

Utilizing large language models in breast cancer management: systematic review

Article Open access 19 March 2024

Large language models streamline automated machine learning for clinical studies

Article Open access 21 February 2024

Reliability of large language models for advanced head and neck malignancies management: a comparison between ChatGPT 4 and Gemini Advanced

Article Open access 25 May 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

FormalPara What does this study add to the clinical work?

This early feasibility study demonstrates that publicly available Large Language Models (LLM), especially GPT4, increasingly align with the decision-making of a multidisciplinary tumor board regarding high complexity breast cancer treatment choices. It indicates that methodological advancement, i.e. the optimization of prompting techniques, and technological development, i.e. enabling data input control and secure data processing, are necessary in the preparation of large-scale and multicenter studies. At present, safe and evidenced use of LLM in clinical breast cancer care is not yet feasible.

Introduction

In Germany, invasive breast cancer is the most prevalent cancer affecting women, with the annual national incidence exceeding 70,000 cases [1]. The implementation of a comprehensive nationwide mammography screening program between the years of 2005 to 2009 resulted in an initial peak in detected breast cancer cases. Subsequently, the increased efforts led to a consistent reduction in the incidence of advanced tumors and a gradual decline of primary disease. Despite these improvements, a high disease burden of breast cancer persists and given the aging demographic, a future increase in the incidence of breast cancer is anticipated. This development is accompanied by intensified shortage in healthcare professionals and care capacity [1, 2]. In addition, extensive research continuously expands the spectrum of treatment modalities, encompassing surgical interventions, endocrine therapy, radiotherapy, both neoadjuvant and adjuvant chemotherapy, and genetic testing for hereditary breast and ovarian cancer syndromes [3]. Moreover, the swift advancement in diagnostic and treatment technologies, including increasing adoption of next-generation sequencing, genetic arrays for the prediction of disease prognosis or chemotherapy benefit and the use of precision-targeted therapies such as antibody-drug conjugates, shape a transformative phase in gynecological oncology [4, 5]. This development is marked by abundance of evidenced knowledge and health data, which increasingly overwhelm practitioners in terms of complexity [6]. There is a growing optimism that technological innovations will bridge the gap between scientific possibilities and practical healthcare delivery by providing support to caregivers and will enable more individualized and effective treatment strategies in an environment with high volumes of data [7].

High expectations are set on artificial intelligence-based clinical decision support tools to augment doctoral intelligence in order to keep pace with this rapid development [8, 9]. Historically, the cumbersome digitization of German healthcare has led to a gap between technological capabilities and current practices, which keeps on widening [10]. A nationwide survey conducted by the Commission Digital Medicine of the German Association of Gynecology and Obstetrics (DGGG) revealed high heterogeneity in digital infrastructure within the field of gynecology, characterized by low interoperability and outdated systems, leading to dissatisfaction among healthcare providers [11]. In contrast, most gynecology specialists are optimistic that digitization could ease their growing workloads, enhance patient care, and foresee the adoption of smart algorithms to assist in patient treatment [12]. In the meantime, it has become a normality on the patient’s side to assess new symptoms digitally before visiting a doctor, i.e., using online-search engines and dedicated app-based symptom checkers [13,14,15]. This includes the recent widespread availability of Large Language Models (LLM), with tech-savvy individuals increasingly turning to public chatbots for health-related inquiries [9, 16]. This shift toward relying on easily accessible online resources, evolving from simple Google keyword searches to consulting advanced tools like ChatGPT, highlights a new reality that likewise demands to promote scientific evidence in the medical use of LLM.

The emergence of publicly available LLM in artificial intelligence has opened a new field in medical research, which still lacks the definition of methodological guard rails and best practices. Preliminary proof-of-concept analyses have indicated potential in using these models as supplementary tools in tumor boards [16,17,18,19,20,21]. In breast cancer care, few preliminary assessments have explored the accuracy of LLM in supporting decision-making through the evaluation of brief clinical scenarios [22, 23], but also high complexity cases [24]. The most recent literature increasingly challenges the consistency of LLM, highlighting significant changes in explanatory value over short intervals while emphasizing the necessity for their ongoing monitoring [31]. The quality of AI-generated abstracts has reached a level of medical appropriateness that leaves experts to find it challenging to distinguish them from specialist-written content in a blinded review process [32].

With regard to tumor board decision-making, Lukac et al. and Sorin et al. retrospectively compared the answers of GPT3.5 (version September 2021) to the past treatment recommendations of a single tumor board [22, 23]. The latter research represents initial explorations of this technology, rather than definitive benchmarks for evaluating the capabilities of ChatGPT3.5. Their experiments only included the LLM ChatGPT3.5, involved a constrained and unstructured collection of patient profiles with restricted health data, and they utilized a short and limited prompting strategy. Additionally, their assessments were based on a self-developed scoring system. Notably, the studies omitted genetic testing for most cases, which is a crucial factor in the characterization of breast cancer. Both preliminary assessments inferred from their findings that the advice given by language model-based systems could align with that of a tumor board, but refrain from definitive statements about the specific performance level of LLM in their conclusions. Our research builds on the findings of Lukac et al. and Sorin et al. and seeks to extend them in a systematic manner [22, 23]. Therefore, we confirmed GPT3.5’s potential for managing high complexity case by employing a standardized prompting model and using comprehensive health data profiles as described in the methodology [24]. This subsequent EFS provides further insights by comparing different LLM versions and monitoring development over time, with access to more data and technological upgrade. It matches a generic observation by Eriksen et al. of superior performance by GPT4 for diagnosing complex clinical cases and confirms this finding in the field of breast cancer care [33]. Furthermore, it confirms the most recently raised critique of LLM regarding a persisting challenge in answer consistency in the field of breast cancer treatment [25] This relates to the deterioration in GPT3.5’s accuracy over the observation period. It points toward the possible issue, that an extension of data access with uncontrolled sources used for decision-making does not necessarily lead to an improvement in LLM accuracy but could lead to confusion in the models.

Limitations and implications for methodological and technological development of LLM application in breast cancer care

By monitoring the evolution of LLM, this study shows that especially the update to the GPT4 algorithm enables an increasing alignment with the recommendations of the MTB. It indicates that technological applicability rapidly develops toward technological maturity to provide clinical decision support, even for complex decision-making in breast cancer care. Nevertheless, at present, the study also underlines that a clinical use of LLM is not yet feasible. Several unresolved regulatory hurdles and missing evidence on the peculiarities of clinical application should forbid their current use in clinical care. The current level of evidence regarding the use of LLM in breast cancer therapy leaves crucial questions unanswered, which can also be derived this study.

The initially chosen prompting model only required the LLM to indicate whether chemotherapy should be given or not. However, the recommendation of multi-gene assays to assess disease prognosis and predict chemotherapy benefit in patients was not queried. Due to the increasing use of such tests and the associated increasing clinical relevance, future prompting models should include a query relating to the need for multi-gene assays to assess the chemotherapy necessity. This finding underscores the methodological need to develop sophisticated prompting models that should be tailored to the specifics of the oncologic entity being investigated in order to improve the consistency in LLM answering.

Furthermore, the study uses the recommendations by a single MTB as gold standard for comparing concordance in LLM decision-making. Large-scale observational studies, conducted by several international study groups, have revealed notable disparities in breast cancer treatment choices and outcomes [34, 35]. There is often considerable scope for decision-making on available treatment options, such as varying intensities of chemotherapy regimens, which reflects the diversity in national standards and respective guidelines. This issue also explains the rather moderate results for DCIS profiles in this study. The LLM have consistently recommended endocrine therapy, as, for example, suggested in a meta-analysis by Yan et al. from 2020 [36]. In contrast, the MTB in the study decided against endocrine therapy in the DCIS cases, a decision that was taken in interdisciplinary discourse in the MTB and within the decision-making scope of the German guidelines. However, as a dedicated EFS with a small group of 20 cases, no conclusions should be drawn regarding the LLM accuracy for different cancer subtypes stages of the disease, i.e., precancerous or advanced metastasized illness, and treatment options. Hence, in order to ensure the evidence-based and safe use of LLM in breast cancer care, these open questions must be adequately addressed by further research. Subsequent studies should incorporate larger study populations and multicenter study designs to expand findings from a preclinical simulation environment into clinical care.

At a technological level, a lack of control over the sources used for decision-making and a lack of security in the processing of health data have so far prevented the use of LLM in clinical care. The deterioration in GPT3.5’s accuracy over the observation period, which appears to be connected to the extension of data access, underlines how uncontrolled and enlarged input of sources can contribute to confusion in the models. It remains unclear which sources the open LLM use for decision-making, a problem that can also be seen in the moderate DCIS results, as it cannot be derived from the LLM answering which evidence is used by the LLM to recommend endocrine therapy. In alignment with the Explainable AI approach, the technological application should offer the possibility of gaining control over the sources used for decision-making while ensuring security in the processing of personal health data, i.e., by limiting it to local servers.

Opportunities for breast cancer care

Considering the findings of the national survey conducted by the Commission Digital Medicine of the German Association of Gynecology and Obstetrics (DGGG), 61.4% of specialists either agree of strongly agree that intelligent algorithms will support clinicians to treat patients and the majority support the perception that this will improve patient care (65.1% agree of strongly agree) and help to reduce increasing workload (78.4% agree of strongly agree) [12]. These concerns are accompanied by the aforementioned, intensified care complexity due to the rapid increase in evidence-based knowledge and case load in gynecological oncology [4, 5]. In this perspective, easily accessible and user-friendly publicly available LLM may provide a prospective solution in breaking down prevailing barriers [37]. As presented in this study, a clinical use of LLM is not yet feasible. Nevertheless, the controlled and evidence-based adaptation of LLM , i.e., the optimization of prompting techniques or enabling data input control and secure data processing, offers potential that LLM could bring value to patients in clinical breast cancer care.

Conclusion

This early feasibility study demonstrates that publicly available LLM, especially GPT4, increasingly align with the decision-making of a multidisciplinary tumor board and confirms decision consistency to remain a major issue for the application of LLM in breast cancer care. The findings underline that clinical use of LLM is not yet feasible. Nevertheless, the study gathers insights that could inform potential modifications to the LLM application. Methodological advancement, i.e., the optimization of prompting techniques, and technological development, i.e., enabling data input control and secure data processing, are necessary in the preparation of large-scale and multi-centric studies. These will subsequently provide further essential evidence on the safe and reliable application of LLM in breast cancer care to maximize benefits for providers and patients alike.

Data availability

Datasets generated and analyzed during the current study are available from the corresponding author on reasonable request.

References

Brustdrüse–C 50 (2023) In: Robert Koch Institut (ed) Krebs in Deutschland für 2019/2020, 14th edition, Berlin, pp 78–81. https://doi.org/10.25646/11357
European Commission (2021) Europe’s beating cancer plan. https://health.ec.europa.eu/system/files/2022-02/eu_cancer-plan_en_0.pdf. Accessed 20 Dec 2023
German Guideline Program in Oncology (German Cancer Society, German Cancer Ais, AWMF (2021) Interdisciplinary evidence-based pratice guideline for early detection, diagnosis, treatment and follow-up of breast cancer long version 4.4 AWMF registration number: 032/045OL. https://www.leitlinienprogramm-onkologie.de/fileadmin/user_upload/S3_Guideline_Breast_Cancer.pdf. Accessed 20 Dec 2023
Tarawneh TS, Rodepeter FR, Teply-Szymanski J et al (2022) Combined focused next-generation sequencing assays to guide precision oncology in solid tumors: a retrospective analysis from an institutional molecular tumor board. Cancers (Basel). https://doi.org/10.3390/cancers14184430
Article PubMed Google Scholar
Santa-Maria CA, Wolff AC (2023) Antibody-drug conjugates in breast cancer: searching for magic bullets. J Clin Oncol 41(4):732–735. https://doi.org/10.1200/JCO.22.02217
Article PubMed Google Scholar
Bhattacharya T, Brettin T, Doroshow JH et al (2019) AI meets exascale computing: advancing cancer research with large-scale high performance computing. Front Oncol 2(9):984. https://doi.org/10.3389/fonc.2019.00984
Article Google Scholar
Barker AD, Lee JSH (2022) Translating “Big Data” in oncology for clinical benefit: progress or paralysis. Cancer Res 82:2072–2075. https://doi.org/10.1158/0008-5472.CAN-22-0100
Article CAS PubMed PubMed Central Google Scholar
Poon H (2023) Multimodal generative AI for precision health. NEJM AI. https://doi.org/10.1056/AI-S2300233
Article Google Scholar
Goldberg C (2023) Patient portal. NEJM AI. https://doi.org/10.1056/AIp2300189
Article Google Scholar
Rainer Thiel A, Deimel L, Schmidtmann D, et al (2018) Gesundheitssystem-Vergleich Fokus Digitalisierung #SmartHealthSystems Digitalisierungsstrategien im internationalen Vergleich. https://www.bertelsmann-stiftung.de/fileadmin/files/Projekte/Der_digitale_Patient/VV_SHS-Gesamtstudie_dt.pdf. Accessed 20 Dec 2023
Pfob A, Griewing S, Seitz K et al (2023) Current landscape of hospital information systems in gynecology and obstetrics in Germany: a survey of the commission Digital Medicine of the German Society for Gynecology and Obstetrics. Arch Gynecol Obstet 308:1823–1830. https://doi.org/10.1007/s00404-023-07223-1
Article PubMed PubMed Central Google Scholar
Pfob A, Hillen C, Seitz K et al (2023) Status quo and future directions of digitalization in gynecology and obstetrics in Germany: a survey of the commission Digital Medicine of the German Society for Gynecology and Obstetrics. Arch Gynecol Obstet. https://doi.org/10.1007/s00404-023-07222-2
Article PubMed PubMed Central Google Scholar
Millenson ML, Baldwin JL, Zipperer L et al (2018) Beyond Dr. Google: the evidence on consumer-facing digital tools for diagnosis. Diagnosis (Berl) 5(3):95–105. https://doi.org/10.1515/dx-2018-0009
Article PubMed Google Scholar
Pergolizzi J Jr, LeQuang JAK, Vasiliu-Feltes I et al (2023) Brave new healthcare: a narrative review of digital healthcare in American medicine. Cureus. https://doi.org/10.7759/cureus.46489
Article PubMed PubMed Central Google Scholar
Knitza J, Muehlensiepen F, Ignatyev Y et al (2022) Patient’s perception of digital symptom assessment technologies in rheumatology: results from a multicentre study. Front Public Health. https://doi.org/10.3389/fpubh.2022.844669
Article PubMed PubMed Central Google Scholar
Betzler BK, Chen H, Cheng CY et al (2023) Large language models and their impact in ophthalmology. Lancet Digit Health 5(12):e917–e924. https://doi.org/10.1016/S2589-7500(23)00201-7
Article CAS PubMed PubMed Central Google Scholar
Buhr CR, Smith H, Huppertz T et al (2023) ChatGPT versus consultants: blinded evaluation on answering otorhinolaryngology case–based questions. JMIR Med Educ 9:e49183. https://doi.org/10.2196/49183
Article PubMed PubMed Central Google Scholar
Massey PA, Montgomery C, Zhang AS (2023) Comparison of ChatGPT–3.5, ChatGPT-4, and orthopaedic resident performance on orthopaedic assessment examinations. J Am Acad Orthop Surg 31(23):1173–1179. https://doi.org/10.5435/JAAOS-D-23-00396
Article PubMed PubMed Central Google Scholar
Roos J, Kasapovic A, Jansen T et al (2023) Artificial intelligence in medical education: comparative analysis of ChatGPT, Bing, and medical students in Germany. JMIR Med Educ. https://doi.org/10.2196/46482
Article PubMed PubMed Central Google Scholar
Takagi S, Watari T, Erabi A et al (2023) Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: comparison study. JMIR Med Educ. https://doi.org/10.2196/48002
Article PubMed PubMed Central Google Scholar
Schopow N, Osterhoff G, Baur D (2023) NLP applications in clinical practice: a comparative study and augmented systematic review with ChatGPT (Preprint). JMIR Med Inform. https://doi.org/10.2196/48933
Article PubMed PubMed Central Google Scholar
Lukac S, Dayan D, Fink V et al (2023) Evaluating ChatGPT as an adjunct for the multidisciplinary tumor board decision-making in primary breast cancer cases. Arch Gynecol Obstet 308:1831–1844. https://doi.org/10.1007/s00404-023-07130-5
Article PubMed PubMed Central Google Scholar
Sorin V, Klang E, Sklair-Levy M et al (2023) Large language model (ChatGPT) as a support tool for breast tumor board. NPJ Breast Cancer. https://doi.org/10.1038/s41523-023-00557-8
Article PubMed PubMed Central Google Scholar
Griewing S, Gremke N, Wagner U et al (2023) Challenging ChatGPT 3.5 in senology—an assessment of concordance with breast cancer tumor board decision making. J Pers Med. https://doi.org/10.3390/jpm13101502
Article PubMed PubMed Central Google Scholar
Chen L, Zaharia M, Zou J (2023) How is ChatGPT’s behavior changing over time? (preprint). arxiv. https://doi.org/10.48550/ar**v.2307.09009
U.S. Food and Drug Administration (2013) Investigational Device Exemptions (IDEs) for early feasibility medical device clinical studies, including certain First in Human (FIH) studies guidance for industry and food and drug administration staff. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/investigational-device-exemptions-ides-early-feasibility-medical-device-clinical-studies-including Accessed 5 Mar 2024
Innovative Health Initiative (2023) Improving patient access to innovative medical technologies in the European Union. https://heuefs.eu/wp-content/uploads/2024/01/HEU-EFS_consortium_press_release.pdf. Accessed 5 Mar 2024
Rao A, Pang M, Kim J et al (2023) Assessing the utility of ChatGPT throughout the entire clinical workflow: development and usability study. J Med Internet Res 22(25):e48659. https://doi.org/10.2196/48659
Article Google Scholar
Rao A, Kim J, Kamineni M et al (2023) Evaluating GPT as an adjunct for radiologic decision making_ GPT-4 versus GPT-3.5 in a breast imaging pilot. J Am Coll Radiol. https://doi.org/10.1016/j.jacr.2023.05.003
Article PubMed PubMed Central Google Scholar
Haver HL, Ambinder EB, Bahl M et al (2023) Appropriateness of breast cancer prevention and screening recommendations provided by ChatGPT. Radiology. https://doi.org/10.1148/radiol.230424
Article PubMed PubMed Central Google Scholar
Choi HS, Song JY, Shin KH et al (2023) Develo** prompts from large language model for extracting clinical information from pathology and ultrasound reports in breast cancer. Radiat Oncol J 41:209–216. https://doi.org/10.3857/roj.2023.00633
Article PubMed PubMed Central Google Scholar
Gao CA, Howard FM, Markov NS et al (2023) Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers. NPJ Digit Med. https://doi.org/10.1038/s41746-023-00819-6
Article PubMed PubMed Central Google Scholar
Eriksen AV, Möller S, Ryg J (2023) Use of GPT-4 to diagnose complex clinical cases. NEJM AI. https://doi.org/10.1056/aip2300031
Article Google Scholar
van Walle L, Verhoeven D, Marotti L, Ponti A, Tomatis M, Rubio IT, EUSOMA Working Group (2023) Trends and variation in treatment of early breast cancer in European certified breast centres: an EUSOMA-based analysis. Eur J Cancer 192:113244. https://doi.org/10.1016/j.ejca.2023.113244
Article CAS PubMed Google Scholar
Derks MGM, Bastiaannet E, Kiderlen M et al (2018) Variation in treatment and survival of older patients with non-metastatic breast cancer in five European countries: a population-based cohort study from the EURECCA Breast Cancer Group. Br J Cancer 119:121–129. https://doi.org/10.1038/s41416-018-0090-1
Article PubMed PubMed Central Google Scholar
Yan Y, Zhang L, Tan L et al (2020) Endocrine therapy for Ductal Carcinoma In Situ (DCIS) of the breast with Breast Conserving Surgery (BCS) and Radiotherapy (RT): a meta-analysis. Pathol Oncol Res 26:521–531. https://doi.org/10.1007/s12253-018-0553-y
Article CAS PubMed Google Scholar
Gottlieb S, Silvis L (2023) How to safely integrate large language models into health care. JAMA Health Forum 4(9):e233909. https://doi.org/10.1001/jamahealthforum.2023.3909
Article PubMed Google Scholar

Download references

Acknowledgements

We would like to thank our colleague Julia Greenfield from the Institute of Digital Medicine for her support in the linguistic revision of the final manuscript.

Funding

Open Access funding enabled and organized by Projekt DEAL. The authors declare that no funds, grants or other support were received during the preparation of this manuscript. S.G. was supported by the Clinician Scientist program (SUCCESS-program) of Philipps-University of Marburg and the University Hospital of Giessen and Marburg.

Author information

Authors and Affiliations

Institute for Digital Medicine, Philipps-University Marburg, Marburg, Germany
Sebastian Griewing, Johannes Knitza & Sebastian Kuhn
Institute for Artificial Intelligence in Medicine, Philipps-University Marburg, Marburg, Germany
Fabian Lechner
Department of Gynecology and Obstetrics, Philipps-University Marburg, Marburg, Germany
Sebastian Griewing, Jelena Boekhoff & Uwe Wagner
Department of Gynecology and Obstetrics, Martin-Luther University Halle-Wittenberg, Halle, Germany
Markus Wallwiener
Department of Gynecology and Gynecologic Oncology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
Christoph Hillen
Kommission Digitale Medizin, Deutsche Gesellschaft für Gynäkologie und Geburtshilfe, Berlin, Germany
Sebastian Griewing, Christoph Hillen, Uwe Wagner & Markus Wallwiener

Authors

Sebastian Griewing
View author publications
You can also search for this author in PubMed Google Scholar
Johannes Knitza
View author publications
You can also search for this author in PubMed Google Scholar
Jelena Boekhoff
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Hillen
View author publications
You can also search for this author in PubMed Google Scholar
Fabian Lechner
View author publications
You can also search for this author in PubMed Google Scholar
Uwe Wagner
View author publications
You can also search for this author in PubMed Google Scholar
Markus Wallwiener
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Kuhn
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S Griewing Protocol, project development, data collection, data management, data analysis, manuscript writing, manuscript editing. J Knitza Project development, manuscript editing, data analysis. J Boekhoff Protocol, project development, data analysis. C HillenManuscript editing, data analysis. F LechnerProtocol, project development. U WagnerManucript editing, project development. S KuhnManucript editing, project development. M Wallwiener Manucript editing, project development. All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Sebastian Griewing, Johannes Knitza and Jelena Boekhoff. The first draft of the manuscript was written by Sebastian Griewing and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Sebastian Griewing.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Ethics approval

This is an observational study. The Philipps-University Marburg Research Ethics Committee has confirmed that no ethical approval is required (23–300 ANZ).

Consent to participate

Does not apply.

Consent to publish

Does not apply.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Griewing, S., Knitza, J., Boekhoff, J. et al. Evolution of publicly available large language models for complex decision-making in breast cancer care. Arch Gynecol Obstet 310, 537–550 (2024). https://doi.org/10.1007/s00404-024-07565-4

Download citation

Received: 21 December 2023
Accepted: 17 May 2024
Published: 29 May 2024
Issue Date: July 2024
DOI: https://doi.org/10.1007/s00404-024-07565-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Evolution of publicly available large language models for complex decision-making in breast cancer care