Introduction

Used as a proxy for the relevance of academic work, citation counts are among the most frequently—employed measures of impact in science. Publications with higher citation counts are deemed more influential than those with lower ones (Cronin, 1984). Citation counts unclutter the mass of publications, separating the wheat from the chaff. Additionally, they form the basis for the field of citation analysis, in which bibliographic references of research publications are used to form networks that can be explored using graph—analytical methods to uncover hidden relations and the flow of knowledge (De Bellis, 2009). Frequently—used techniques include author networks, co-author networks, co-citation networks, and bibliographic coupling (Zupic & Čater, 2015). For several decades, citation analysis has been used to map the landscape of scientific disciplines, to observe knowledge transfer across disciplinary boundaries, and to assess the impact of publications (Gläser et al., 2017). Traditional analyses based on citation counts, however, provide only a rather shallow image of the scientific landscape (Moravcsik, 1973; Zhu et al., 2015). Most importantly, they do not differentiate between the various types of citations. Yet citations differ based on the citing author’s motivation, the thematic context of a citation, or a citation’s argumentative relation to its subject (Amsterdamska & Leydesdorff, 1989; Cronin, 1984). Typically, all citations are assumed to be of equal weight. While this simplification might be useful for some purposes, it prevents researchers from gaining a deeper and more subtle understanding of the scientific impact generated by an individual paper, an author, a department, or an entire field. Traditional citation analyses are limited by their inability to discern the level of intellectual indebtedness (Ding et al., 2014; Phelan, 1999).

Unpacking the precise nature of citations—and doing so at scale using automated approaches—is hence important to make impact metrics more meaningful for performance assessments and more informative for the research community (Angrosh et al., 2010). Some researchers have begun to address this question by breaking down citation counts by journal and scientific field. Others have focused on the syntactic features of citations, the argumentative context (i.e. whether the author’s reference to a previous work is appreciative or disparaging), and the author’s motives (Ding et al., 2014; Hernández-Alvarez & Gomez, 2016). While certainly valuable, these efforts fall short of hel** to understand the thematic contribution to the citing work and hence its impact. For example, some large survey-based studies may be cited for a small methodological refinement, other studies may be cited for historical reasons or because a term was coined in them, and still others to serve as cautionary tales. Existing methodological approaches are ill-equipped to reveal the thematic context of citations. While manual coding of citations using qualitative techniques might seem a solution at first glance, such an approach is limited in its practical value, given the large number of citations typically examined. Scalable methods to uncover for the context of citations are thus needed.

The present study breaks new ground in this regard. We develop and test a novel automated method to extract the context of citations, rather than simply their numbers or locations. We aim to uncover the thematic contributions of academic publications through their absorption and actual use by the scientific community. For this purpose, we employ a combination of state-of-the-art techniques from computational linguistics and data mining in order to analyze the textual environment of every citation in a given set of publications. As part of this process, we identify and extract the text environments of all citations to the focal publication. We then apply text-mining techniques to compute the set of topics a focal publication is being cited for. This topic map provides a unique impact profile for one or more publications that depicts their direct impact within a defined scientific community.

To demonstrate our method, we apply it to a document collection known as the AIS Senior Scholars’ Basket, comprising eight leading journals from the field of Information systems (IS). Information systems (IS) outstandingly benefits from such efforts, as the discipline is still searching for its place in the scientific landscape and a concise identity (Grover, 2012). Developments in IS call for regular assessment of its intellectual structure, its boundaries, and its relation to its reference disciplines. Impact plays a key role in this assessment. Our study, therefore, also provides novel insights into the context of citations within the IS field. Ultimately, the proposed method may be used to simplify impact assessment in science in general. Authors of literature reviews, for instance, will be able to assess with greater ease how certain articles impact the content of others. Search engines for publication data may automatically inform users about what publications are cited for instead of simply displaying citation counts. Further, the analysis we propose can be combined straightforwardly with metadata such as publication dates to plot knowledge absorption curves over time. A more far-reaching application may consist of comparing and contrasting a publication’s computed impact profile depicting its absorption with its positioning as reflected in the set of keywords the authors select for it. It could also be used for the evaluation of funding initiatives in academia, by comparing generated citation profiles before and after a funding initiative. Using our novel method, researchers can thus not only compute impact profiles, but also use them as inputs to subsequent analyses to unearth novel patterns in knowledge diffusion within and across scientific fields.

In the upcoming section, we briefly outline the evolution of citation analysis. We then explain our methodological approach in detail. We showcase the feasibility and value of our approach with exemplary citation profiles based on data from the IS field. Subsequently, we use our approach to trace the impact of the Technology acceptance model (TAM), a key concept in IS research, within the IS literature. Hereafter, we contrast the analytical approach presented in this article with text-based methods for map** scientific disciplines. We conclude by summarizing our findings and sketching ideas for further research.

Evolution of citation analysis

Not all citations are based on the same type of intellectual involvement with the cited article. While some citations are listed in a bibliography because the original sources truly influenced the present work, others basically constitute name—drop** without any actual impact, and still others are noted purely with disapproval. Concerns about this situation have led to two research streams in citation analysis. The first stream is, above all, theory-driven, whereas the second stream is data-driven and does not rely on any theoretical framework but draws heavily from computer science and linguistics. An overview of different theories of citation and open research questions can be found in the overview article by Leydesdorff and Milojevic (2001). NMF is a matrix factorization method similar to Singular value decomposition (SVD) that factorizes any matrix into the product of three separate matrices. NMF adds one additional constraint for the decomposition: the decomposed matrices must consist of only non-negative values. NMF has become popular due to its inherent clustering property and thus its ability to automatically extract sparse and easily interpretable factors. The constrained matrix factorization cannot be calculated exactly; it is commonly approximated numerically and formulated as a constrained non-linear optimization problem. The algorithm generates topics defined by wordlists representative of them, as well as a list of relationships between text and topics in the form of probabilities. The wordlists are also called topic descriptors.

We conducted a comprehensive comparison of text clustering algorithms such as Latent sematic analysis (LSA), Probabilistic latent sematic analysis (PLSA), Latent Dirichlet allocation (LDA), NMF, and SVD as well as various evaluation metrics. Based on the results, we chose NMF over other frequently—employed topic modeling algorithms like Latent Dirichlet allocation (LDA). In our study, NMF outperformed all other tested clustering procedures for datasets with only a small number of clusters. It was better at both speed and result scattering and showed comparable clustering accuracy (Chen et al., 2019). Hence, we did not require a separate testing procedure to determine the optimal run with the same parameter settings, in contrast to LDA which would require multiple executions due to the result scattering. As our approach requires high speed in several use cases, but usually only a small number of clusters have to be extracted, NMF was the topic modeling algorithm of choice. As we operated within an unsupervised setting, the number of clusters k in a dataset, i.e. the number of topics, was not known, and had to be determined in the process. Therefore, we needed an evaluation criterion to determine the optimal number of clusters.

The primary objective of our newly—developed instrument is to facilitate the identification of a publication’s absorption in the scientific community. Consequently, we favor an evaluation metric that focuses on topic interpretability. We tested several topic modeling evaluation metrics regarding their relation to optimal clustering results. We found that for certain bandwidths of k, some of the metrics did correlate with an optimal clustering result definition. Among them was the topic coherence measure \({C}_{NPMI}\), which performed well on datasets with a small number of clusters (Bouma, 2009). Such a word-based topic coherence measure scores a single topic by measuring the degree of semantic similarity between high scoring words in the topic. \({C}_{NPMI}\) uses the normalized pointwise mutual information (NPMI) to measure the semantic similarity of word pairs based on a reference corpus. The calculated score for each word pair within a topic is then aggregated for each topic and finally averaged across all the topics. This way, they can be used to assess the overall quality of a topic model. Coherence measures help to distinguish between topics that are semantically interpretable topics and topics that are artifacts of statistical inference. In our setup, \({C}_{NPMI}\) made it possible to automatically optimize the number of topics extracted from a set of citation environments by computing a clustering using NMF for k = 2, k = 3, …, k = 20 clusters. The clustering yielding the highest \({C}_{NPMI}\) value was then deemed to be the most appropriate one. In the end, we obtained a clustering of the citation environments of our unit of analysis. Each cluster represented a topic and was characterized by a list of representative words, also called topic descriptors.

Result visualization

After the clustering procedure, several topic descriptors were obtained. Each topic descriptor contained several terms representative for the topic. The lists were then processed further to generate a graphical depiction of the impact profile. We chose a tree-map-like visualization (Shneiderman, 1992). Tree-map visualizations were originally developed to depict hierarchical dependencies. However, they also provide an intuitive picture of impact profiles. In this profile, each of the colored areas designates the impact the one or more analyzed publications on a specific topic. The larger its area relative to the other areas, the more impacted the topic is by the publication(s). The topics themselves are specified by lists of characteristic words shown in the colored areas, arranged according to their importance. It is important to stress that the resulting impact profiles are created in a fully automated manner. These visualizations allow scholars to quickly review the impact of one or more publications of interest. The automatic unpacking of thematic impact in this way greatly simplifies the assessment of the relevance of a paper for one’s own research. It hence will serve as a useful feature of databases for scientific publications or search engines. Accordingly, tree-map-like visualizations of citation profiles may be integrated into lists such as those of search results for each article. Such visualizations could support the assessment of candidates in academic hiring committees or portray the impact of entire institutions at a glance. A detailed illustration of the entire process is shown in Fig. 2.

Fig. 2
figure 2

Detailed process outline to unpack the thematic context of citations. (Color figure online)

Results

Our corpus included 7,382 full-text articles from the AIS Seniors Scholars’ Basket, nearly all the articles published in the eight journals from their first issue up to and including 2019. We filtered out some texts such as book reviews, errata, and any supplementary materials; most of these texts are less than one page long, and lack bibliographies, so they are of no use for our analyses. An overview of the included journals and the coverage is given in Table 1. Due to license restrictions, we were not able to acquire a small number of the earliest articles from the Information Systems Journal and the Journal of the Association for Information Systems. We identified a total of 397,077 entries in all bibliographies and 542,503 in-text references and their corresponding citation environments within the set of articles. The number of extracted citation environments may exceed the number of bibliographic entries because references can be used multiple times in an article. The citation environments were categorized into groups using the topic modeling algorithm, according to their similarity based on the co-occurrence of words within the citation’s surroundings. In the end, the algorithm put out topic descriptors, i.e. lists containing terms characterizing each cluster. The lists were then taken to create a tree map impact profile. In this section, we will present three use cases for our method. The first two cases show citation profiles focusing on two units of interest: publications and authors. In the following, we present the ten most cited publications and authors within our article collection and discuss one result of each of them with the help of a visualization as an example of the citation profile. After that, we show how our approach can be utilized for the analysis of a concept instead of one or more articles. We do so with the help of an impact analysis of the method of citation analysis.

Table 1 Coverage of full-text articles from the AIS senior scholars’ basket

Impact analysis of a most-cited publication

The ten most-cited publications within our article collection are listed in Table 2. As our first exemplary use case, we graphically illustrate the impact profile of the most-cited paper, “User Acceptance of Computer Technology: A Comparison of Two Theoretical Models,” by Fred D. Davis, Richard Bagozzi, and Paul R. Warshaw, published in Management Science in 1989, Vol. 35, No. 8, pp. 982–1003. We identified 176 papers from the corpus that cited the article. The article received a total of 394 in-text references yielding an equal number of extracted citation environments. Figure 3 shows the result of the overall process for the article. In this impact profile, every colored area represents one topic and contains words representative of the topic. The relative importance of each word in a topic is denoted as a percentage. The relative area of each rectangle corresponds to the number of mentions in the text associated with the topic, indicated by the number in round brackets. The colors and order of the areas, as well as the topic numbers, were assigned randomly. As its title suggests, the article discusses two theoretical models of user acceptance of information technology, one being the Technology acceptance model (TAM) and the other the Theory of reasoned action (TRA). Topic 0 and Topic 2 represent the contribution of the paper to the further development of the TAM, whereas Topic 3 denotes its contribution to the TRA. Topic 1 shows the article’s use as a source for its constructs, items, and scales. From this aggregated view, Topics 0 and 2 seem similar; in fact, it may be necessary to look at the citation environments directly to tell the precise difference. Since machines are unable to understand real meaning—they merely exploit statistical features of texts—the uncovering of fine-grained differences at this level is left to the user, as is the labeling of each topic. To assist with this, we devised an interactive variant of the impact profile showing the extracted text snippets as well as bibliographic information on the originating publication. We can make this variant available upon request.

Table 2 Most cited publications within our set of documents from the AIS seniors scholars’ basket
Fig. 3
figure 3

Impact profile of the article “user acceptance of computer technology: a comparison of two theoretical models”. (Color figure online)

Impact analysis of a most-cited author

Instead of focusing on individual articles, it is also possible to aggregate them and thus calculate citation profiles on a higher level. One obvious variant would be to aggregate them by author. Of course, this may be taken further and extended to departments, faculties or entire institutions. Other types of aggregation may yield valuable insights, too, e.g. grou** articles by country, journal, or discipline. In this section, we aggregate by author. The ten most-cited authors within our set of articles are listed in Table 3. We chose to graphically illustrate the impact profile of the second most cited author, Detmar Straub. Similar analyses of the other top 10 IS authors can be found in the appendix to this article. We identified 1063 references to him and 2194 corresponding citation environments. Figure 4 depicts his impact profile. In the following, we will provide exemplary citations by topic to support our interpretation. The references can be obtained from the source article list of citation environments our procedure produces. Topics 4 and 5 emerge as the most impactful ones. Topic 4 shows his influence on the further development of structural equation modeling and regression analysis (Gefen & Straub, 2005; Gefen et al., 2000). Topic 5 illustrates his impact on the development of the IS discipline (Straub, 1989; Straub et al., 2004). Topic 0 reflects his contributions to the field of it outsourcing (Ang & Straub, 1998; Koh et al., 2004). Topic 1 and 2 are linked to his work on technology adoption and technology acceptance and trust (Karahanna & Straub, 1999; Karahanna et al., 1999). Finally, topic 3 refers to Straub’s research on computer abuse in organizations and IS security (Straub, 1990; Straub & Welke, 1998). Visualized in this way, an impact profile allows for a straightforward identification of the topics that a scientist has a significant influence on. It may pose a valuable addition to author profiles in library services, where the information may be further supplemented with the referenced and referencing articles.

Table 3 Most cited authors within our set of documents from the AIS seniors scholars’ basket
Fig. 4
figure 4

Impact profile of Detmar Straub. (Color figure online)

Impact analysis of the method of the technology acceptance model

In the following, we showcase the potential of our approach for an entire area of research. The present procedure differs from the previous analyses in that it does not build on an impact profile in the first place but on a search in the citation environments. In this way, it enables the analysis of the impact, the influence, of a specific topic, and serves as a starting point for the search for impact profiles of publications that impact the topic in question. This approach is therefore particularly suitable for integration into search engines for academic literature or library services.

To support the presentation of our results and to facilitate the interpretation of the result visualization, we will briefly introduce the Technology Acceptance Model in the following few sentences. It is one of the most prominent theories in Information Systems (Davis, 1986). Drawing on the Theory of reasoned action (TRA) (Fishbein, 1975), the TAM explains new technology’s adoption on the individual and organizational levels. According to the TAM, “perceived ease of use,” “perceived usefulness,” “attitude toward use,” and “behavioral intention” predict the actual usage of new technology. Since its inception, the TAM has been the basis of numerous further developments, such as the Unified Theory of the Acceptance and Use of Technology (Venkatesh et al., 2003). The TAM is one of the most popular theories in Information Systems and has been acknowledged in many fields. However, the large amount of literature in IS makes it difficult to assess its impact manually. Hence, it serves as an ideal showcase to demonstrate the usefulness of our method. However, as the main purpose of this article is not to analyze the TAM but to present a new instrument for impact analysis, we will not provide an in-depth analysis here.

Usually, as shown by Mortenson and Vidgen (2016), among others, impact is assessed based on the number of citations an article has received. Table 4 shows the 10 most-cited articles obtained from the Thomson Reuters Web of Science using the search terms “technology acceptance model,” sorted descending by citation count. The articles may be further aggregated by journal or author to identify the titles and researchers that have the highest impact on the matter in question. This data usually constitutes the basis for a quantitative or qualitative impact analysis or a literature review. However, this procedure implies that the analysis of the impact is based on articles that contain certain subject-specific keywords in their title or abstract and that have been cited frequently. It is therefore not based on articles that have been cited for the respective topic. Our newly—developed method may be utilized to enhance impact analyses. Instead of searching for keywords in titles or abstracts, we searched for keywords in citation environments of the citing articles. This way, we identified all articles that had been cited in the direct context of the desired keywords. We are hence able to unveil articles that contribute to the respective topic, but do not necessarily focus on it and may use diverging terminology. This way, we base our analysis on the level of topical indebtedness of citing and cited work.

Table 4 Most cited TAM publication, ordered by citation count

Table 5 provides an overview of the ten most cited articles that have been referenced (up to 2019) in direct proximity to the key term “technology acceptance model.” Half of the articles also appear in Table 4, but the other publications found in Table 5 are not present in Table 4. Those publications are marked bold and are directly related to the discourse on the TAM within IS, but they do not necessarily mention the TAM. Especially interesting in this regard, and illustrative for the strength of our method, is the article by Ajzen (1991), “The Theory of Planned Behavior.” The article does not refer to the TAM, nor does it mention it, so it does not show up in the traditional impact analysis (Table 4). However, it is directly related to the discourse on the TAM within IS. The Theory of planned behavior (TPB) is an alternative model to the TAM and was designed to predict behavior across many settings (Ajzen, 1985); it has often been applied to the uptake of information technology. According to the TPB, behavior is determined by intention and intention is predicted by the factors “attitude toward the behavior,” “subjective norms,” and “perceived behavioral control.” The TPB addresses behavioral intentions—for example, to adopt IT innovations—and is hence regularly mentioned along with, and compared, to the TAM. This last fact is clearly discernible through investigating the citation environments of the article “The Theory of planned behavior”, and may be further investigated by the citation profile of the article, as shown in Fig. 5. Topic 1 of the citation profile represents the discussions where TAM and TPB are compared or related to each other, supporting our interpretation. The other two topics denote the use of the TBP in several scenarios.

Fig. 5
figure 5

Impact profile of the article “the theory of planned behavior”. (Color figure online)

Table 5 Most cited publications with TAM in their citation environments within our set of documents from the AIS seniors scholars’ basket

Our exemplary analysis the discourse on the TAM within IS illustrates how our method can extend traditional citation analysis by identifying research that truly impacts a topic of interest. Our method helps researchers to identify impactful literature with ease. We also developed an interactive variant of the illustration. It may serve as a starting point to dive deeper into the literature on the concept and can display citation profiles for the contained articles. Again, we can make this variant available upon request. Accordingly, we propose to enhance search engines for academic literature or library services in such a way.

Comparison with established content-based approaches

In the following, we compare our approach with other established content-based methods to unveil the scientific impact of authors or publications and discuss the merits of our approach. In this section, however, we only compare approaches that operate on the actual texts of publications. In general, these approaches can all be subsumed under science map** (Gläser et al., 2017), that is, the thematic exploration and delineation of scientific disciplines. We do not include citation analytical approaches that are based solely on the graph—or network structure established from the references, without taking into account the publications’ texts.

Established content-based methods from science map** usually rely on abstracts or full texts of the citing publications and group them into meaningful clusters or construct semantic networks of characteristic terms, generating a map of the topics that appear in the abstracts or full texts (van Eck & Waltman, 2010). This is different from the topics for which a publication or author is cited. At the heart of this distinction is a more precise definition of impact. While methods based on abstracts or full texts of citing publications primarily reveal the topics in which an author or a publication is used, our approach reveals the topics for which an author or a publication is cited. The two approaches are therefore not to be considered substitutive but complementary; they each measure something different. Of course, there are also cases in which both analysis methods may produce identical results. This is the case when the topics found in the citation environments correspond to the topics in the abstracts or full texts. To further explicate this important difference, in the following three examples, we contrast both analytical approaches: a clustering of the citation environments and the clustering of the abstracts of the citing papers. Both approaches are each based on the dataset already presented here and use the same clustering and visualization technique to ensure proper comparability.

The left depiction in Fig. 6 shows the citation profile of the author Christian M. Ringle, who is well known for his works on structural equation modeling, computed from the 227 citation environments found in the publications in our dataset. The right depiction is a treemap representation of the clustering of the 142 abstracts of publications citing his papers. It is clearly evident that the thematic diversity in the illustration on the right is much greater than in the illustration on the left. This discrepancy is due to the fact that the left figure summarizes the topics for which the author is cited, whereas the right figure summarizes the topics in which the author is cited. Thus, the left figure represents the direct impact of the articles of Christian M. Ringle in the scientific community whereas the right figure reflects the thematic diversity of the publications the author is referenced in. To show that such a picture does not only emerge in the case of mainly methodological contributions, we replicate this analysis to assess the impact of an influential theory article. Figure 7 depicts the citation profile and the clustering of citing articles’ abstracts of the article “Firm Resources and Sustained Competitive Advantage” published in Journal of Management by Jay Barney in 1991. Again, the side-by-side comparison of the two results shows the different focus of the analytical approaches. Our citation profile represents the themes the work is cited for and the clustering of abstracts shows the themes it is cited in. In the appendix to this paper, we included similar side-by-side comparisons of the top 10 IS authors from our dataset.

Fig. 6
figure 6

Side-by-side comparison of Christian M. Ringle's citation profile (left) and a clustering of the papers' abstracts citing his publications (right). (Color figure online)

Fig. 7
figure 7

Side-by-side comparison of the citation profile of the article “firm resources and sustained competitive advantage” (left) and a clustering of the papers' abstracts citing this publication (right). (Color figure online)

Finally, we return to the paper mentioned at the beginning of this section to provide a somewhat more complex example of the method's application and to discuss the relation of the two deptictions in more detail. Figure 8 contains the already shown citation profile of the article “User acceptance of computer technology: A comparison of two theoretical models” published in Management Science by Fred D. Davis et al. (1989) from Fig. 3 in addition to the clustering of the citing articles’ abstracts. Again, the side-by-side comparison of the two results shows the different focus of the analytical approaches. As mentioned above, some topics do not seem to be clearly distinguishable in the citation profile shown on the left-hand side. In this case, it is necessary to take a direct look at the citation environments to be able to interpret the results. All three topics 0, 2, and 3 have, not surprisingly, some connection to the Technology acceptance model (TAM), the Theory of reasoned action (TRA) and the constructs underlying those models. Topic 0, however, seems more ambiguous and less clearly defined. An examination of the citation environments underlying this topic reveals that the articles in this topic embed TAM in the broader context of IT adoption and IT acceptance, whereas in topic 2 and 3, the articles instead the model directly along with related theories such as TRA and/or its constitutive constructs. A different thematic picture emerges in the right depiction showing the treemap representation of the clustering of the papers’ abstracts citing the focal article by Davies et al. (1989). The thematic map is much more diverse and mainly shows areas of application of the TAM. A closer look at the citing articles’ abstracts reveals that the articles use the TAM directly, for example in the context of trust and online applications (topic 0) and online/virtual communities (topic 8). It is widely used in the context of IT implementation and change management (topic 6) and even more generally in the context of IS evolution, where the TAM is discussed as one of the most prominent IS theories, e.g., related to the nature of theory in IS research (topic 5).

Fig. 8
figure 8

Side-by-side comparison of the citation profile of the article “user acceptance of computer technology: A comparison of two theoretical models” (left) and a clustering of the papers' abstracts citing this publication (right). (Color figure online)

In summary, the combined citation profile shown helps to unearth the specific nature of the article’s theoretical contribution (left) and the thematic scope of its application (right). Such a comparison can be of value in that it allows researches to distinguish the direct intellectual impact of an article and the theory proposed therein on the one hand and the areas of application on the other hand. Importantly, a sole focus on the clustering results on the right might researchers to underestimate the article’s significant contribution to subsequent scale and construct development as well as to broader theoretical discussions with emphasis on TAM, TRA, and TPB. In cases such as this, methods exclusively based on abstracts or full texts of citing publications risk revealing application areas only. The approach we advance complements such efforts by hel** to unearth articles’ intellectual contribution to scholarly conversations. Overall, this example demonstrates that the method presented is indeed capable of extracting the topics for which a publication is cited, as opposed to the topics in which a publication is used.

Conclusion

Impact assessment of extant research is central to all academic activities. Researchers new to a field must quickly identify the most relevant literature for the research question at hand. Academic disciplines are concerned about the impact of journals and specific topics examined. Hiring committees demand informative and meaningful academic profiles. And policy makers are interested in assessing the effectiveness of their measures. However, the increasing volume of academic output makes kee** track of the impact of research a challeging endeavor, especially if one relies on manual assessment only.

Contribution to research and practice

In this paper, we propose a new method to enhance citation analysis by focusing on the thematic context of citations. Our method differs from previous content-based citation analysis approaches which employ classification schemas to find the reasons and functions for citations (Ding et al., 2014). Our method automatically derives and visualizes impact profiles by exploring the topical references in the full texts of the citing works. In an experiment to demonstrate the method, we obtained a profile of a focal paper’s impact and absorption within the scientific community. The data generated can be used to visualize an impact profile, as shown in the previous section. This way, our method may assist in impact assessments of publications in a scalable and automated fashion. The keyword set characterizing each topic in the citation profiles may then serve as a starting point for further exploration of academic literature. Temporal citation profiles may help uncover reinvigorating and weakening topics in a scientific field in general or the changing influence of a paper or paper set in particular. Another use case may be contrasting the authors’ positioning of a paper and its actual absorption within the community by comparing the author’s keywords and the obtained thematic citation profile. All in all, the proposed approach has the potential to serve as a starting point for a considerable number of meaningful follow-up analyses. In our case study of the Technology Acceptance Model (TAM), we show how to exploit our approach to facilitate impact analysis of a concept within an entire field of research. Here, we searched for the topic of interest within the citation environments, instead of in article metadata, as is usually done. We identified articles that have been cited in the direct context of the TAM and thus were able to reveal articles that contribute to the discourse around TAM but do not mention it by name or discuss it explicitly. Apart from the use cases we presented to enhance citation analysis, our method might also prove useful for funding bodies, ranking agencies, and, more broadly, anyone interested in the impact of scholars, research groups, or institutions. It may be used to make funding activities more targeted and more accessible to systematic impact assessment. It also makes it possible to compare the impact of scientific institutions on a thematic level, which might also render university rankings more insightful.

Contribution to IS research

From the perspective of an IS researcher, this study offers an effective way to identify significant contributions to IS topics. It gives a preliminary insight into how impact analyses may be improved. And it may begin to help IS researchers to better consolidate a generally-accepted body of IS knowledge based on the most significant IS research in a particular area. Following up on Walstrom and Leonard (2000) and Whitley and Galliers (2007), it also facilitates the development of reading lists for scholars for artifacts, concepts, and theories. A search engine could integrate our approach to allow researchers to perform analyses similar to our case study. Future work could expand our showcase analyses to contribute to a better understanding of the IS field. Additionally, our selection of journals may be complemented with additional titles in the future to capture an even clearer portrait of impact within the IS community.

Limitations and future research

A general limitation of our approach stems from the fact that the full texts of all citing papers are necessary to compute the impact profile for one or more publications. It might not always be possible to obtain the data due to license restrictions or the technical efforts necessary for its acquisition. However, we expect the ongoing process of digitalization to ease the problem in the future. Some publishers already provide machine-readable full-text data and/or reference data. Additionally, more and more articles are made available open access. In order to further explore the possible application areas of the presented method and also to point out the differences to other methods, we plan to undertake a more comprehensive comparison of methods for determining impact including methods that are not based on full papers or abstracts but on graph-analytical methods. The approach itself may be further developed in many respects and we have identified several other areas for improvement. The citation matching and reference identification procedures we employed yield acceptable results, but an increase in reference identification accuracy will further enhance analytical capabilities, as the quality and generalizability of the analyses depend on the references identified. The text preprocessing may be further enhanced to account for synonyms. The presentation of the results may be improved by optimizing font sizes and color schemes. Additional visualization techniques may enhance usability (Chaney & Blei, 2012; Chuang et al., 2012; Sievert & Shirley, 2014). Automatic labelling of the derived thematic contexts may greatly enhance the comprehensiveness of the visualizations (Chang et al., 2009; Chuang et al., 2014; Hindle et al., 2013; Mei et al., 2007; Nolasco & Oliveira, 2016). Another useful extension would be to track changes of thematic impact over time, similar to what is already possible in the science map** tools. This would be feasible by including the timestamps available in the publication metadata. (van Eck & Waltman, 2010) and would make it possible to plot knowledge absorption curves over time. Finally, complementing our approach with already-existing semantic and syntactic content-based citation analysis techniques, especially those analyzing the argumentative context (Bertin, 2008; Ritchie et al., 2008; Valenzuela et al., 2015), would connect the dots between the distinct current developments in citation analysis.