Temporal analysis of computational economics: a topic modeling approach

Mishra, Malvika; Vishwakarma, Santosh Kumar; Malviya, Lokesh; Anjana, S.

doi:10.1007/s41060-024-00596-9

Temporal analysis of computational economics: a topic modeling approach

Regular Paper
Open access
Published: 11 July 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Data Science and Analytics Aims and scope Submit manuscript

Temporal analysis of computational economics: a topic modeling approach

Download PDF

Malvika Mishra¹,
Santosh Kumar Vishwakarma²,
Lokesh Malviya³ &
…
S. Anjana⁴

Abstract

This study offers a comprehensive investigation into the thematic evolution within computational economics over the past two decades, leveraging advanced topic modeling techniques. Utilizing latent semantic analysis, latent dirichlet allocation (LDA), and BERTopic models, we discerned the distinctiveness, relevance, and coherence of topics generated. BERTopic emerged superior, capturing more distinct and relevant themes in the field. It is noteworthy that our work offers a broader analysis than previous studies which examined a smaller subset of research with only the LDA algorithm. Through this methodology, we identified and labeled 14 key research themes and visualized their interrelationships and prevalence over time. Notably, general domains like market analysis, equilibrium analysis, and financial modeling maintained substantial representation, while specific areas such as game theory and allocation mechanisms observed increased attention. In contrast, domains like asset pricing and fair division saw a decline in interest. This study provides a systematic organization of the literature, captures shifts in trends, and offers recommendations for future research in computational economics.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Computational economics is a rapidly evolving field that combines computer science and mathematics to study complex economic systems. Historically, the evolution of computational technologies facilitated a paradigm shift in economic methodologies, a transition further evidenced by the establishment of specialized journals and conferences catering to this niche. Kendrick (1993) commented on the popularity of computational methods in economics and highlighted the extent of the dependency on them. The journey of this discipline, which commenced with rudimentary statistical analyses, has witnessed an evolution into precise, nuanced research questions that command rigorous scholarly attention. With amplification in support and resources, there’s an unequivocal potential to redefine and expand the very horizons of economics. As the field of research continues to grow, it becomes increasingly important to elucidate emergent research trends and themes. Such comprehensive understanding serves not only as a beacon for subsequent investigations but also propels the field toward systematic and targeted advancements.

Research topics and themes denote the emerging themes and changes in research focus over time; the areas of new development, especially in technology development, are being outlined through it. It maps the research landscape that exists, identifying dense and sparse areas of research, which would help a researcher identify areas with little literature and give an idea for future research. It enhances scholarly communication by giving a thematic structure of a field that allows categorization, navigation, and referencing of multitudes of research articles and promotes effective collaboration in the interdisciplinary academic community.

Second, topic analysis will assist the policy and decision-making process by presenting a systematic survey of the field, guiding policymakers and industry leaders to ground decisions in line with the trends of research. It is highly educational and pedagogical in value in the sense that it offers educators and students an overview of the development of the field that can be drafted into curricula and teaching materials. Lastly, by pointing out inter-field relationships, topic analysis fosters the interdisciplinary study, particularly between computational economics and other fields such as computer science, mathematics, and finance, in develo** innovative solutions to complicated economic problems.

To achieve this, we employ topic modeling, a robust machine-learning technique adept at unearthing latent themes within textual corpora. In the current academic landscape, the relevance of topic modeling, particularly in research trend analysis, is gaining unprecedented traction. The pivotal factor behind this is the exponential augmentation of research literature, which may present challenges in terms of effective data management and information extraction, but opens a myriad of opportunities for insightful exploration. Topic models have been utilized to understand underlying themes, temporal evolutions, and latent connections among diverse research areas. This methodology has shown great potential to streamline research analysis, enhance the quality of insights, and provide a nuanced perspective, all of which directly aid forward-looking academic endeavors and strategic scholarly decision-making.

In this study, we employ three popular topic modeling algorithms, namely, latent semantic analysis (LSA), latent dirichlet allocation (LSA), and BERTopic. LSA is based on the distributional hypothesis that states that similar-meaning words will frequently appear in texts with a similar tone. LSA makes use of singular value decomposition to condense a large TF-IDF weighted document-term matrix into a document-topic and topic-term matrix. LDA is a generalization of the probabilistic rendition of LSA, pLSA, which eschews dimensionality reduction in favor of probabilistic modeling. This approach established a significant improvement over previous work for considering the inference of unseen documents. It works on the assumption that a document is created using a probability distribution over topics and each topic is created using a probability distribution over the words in the vocabulary of the corpus, and that this generative process can be reverse-engineered. However, a noteworthy drawback is its inability to account for correlation between topics due to the bag-of-words approach, which is inherently a poor representation of real-world textual data. More recently, BERTopic was introduced as a topic modeling technique based on the popular pre-trained transformer multi-language model, BERT, and TF-IDF, which creates clusters that produce easy-to-decipher topics and keywords that describe the topic. BERTopic follows a sequence of steps to discover the latent topics and create their representations. The first step is text embedding, followed by dimensionality reduction, clustering, tokenization, and finally adding a weighting scheme.

There has been very limited examination of the literature encompassing the economic application of computer science, and none has focused on a temporal analysis. To the best of our knowledge, [1] is the only similar study which aims to model the themes of research in computational economics. However, this work analyzes only a small segment of available research and is restricted to a singular source for their data and this approach may be subject to the possibility of a research bias. Further, the study only utilizes the LDA algorithm to understand the thematic landscape of the domain, which, although the most widely used for ease of application, has faced critique due to the statistical approach to textual data. Our study aims to bridge the gap in the existing literature by conducting a comprehensive temporal analysis of research themes in computational economics. By employing multiple topic modeling techniques-LSA, LDA, and BERTopic-we provide a robust comparison and identify the most effective method for capturing thematic trends. This approach not only validates the robustness of our findings but also offers nuanced insights into the field’s evolution, informing both current research directions and future scholarly endeavors.

We hypothesize the trends in this research community follow the global trends in economics that constantly change, adapting to and from new discoveries and historical events. This has been witnessed in similar analyses of scientific literature on transportation, computational linguistics, and finance. By applying topic models to a comprehensive collection of computational economics research, we aim to identify salient research topics, monitor their temporal trajectories, and discern emerging research directions. Through rigorous comparison, we unveil the nuanced dynamics within the academic discourse of computational economics. The primary objectives of this study are:

1.
To Identify Key Research Themes: Using advanced topic modeling techniques, we aim to systematically identify and categorize key research themes within the field of computational economics.
2.
To Compare Topic Modeling Methods: By employing LSA, LDA, and BERTopic, we seek to compare the effectiveness of these methods in capturing distinct, relevant, and coherent research themes.
3.
To Analyze Temporal Trends: We aim to conduct a temporal analysis of the identified research themes to understand shifts in research focus and emerging trends over the past two decades.
4.
To Provide Actionable Insights: Based on our findings, we intend to offer actionable recommendations for researchers, companies, and policymakers to optimize research and development strategies and formulate evidence-based policies.

The remainder of this paper is organized as follows. Section 2 presents a review of related literature. In Sect. 3, we introduce the three topic models and various measures for studying the temporal trends of topics. Section 4 includes a brief on the paper abstract dataset extracted with the help of web scra**. In Sect. 5, we perform an analysis of the extracted topics.

2 Related work

There have been several studies that aim to analyze scientific literature with varying approaches. A lot of them have based their investigations on citation networks, [2,3,4]. Lately, a critical standpoint has emerged, which challenges the sole reliance on citation-based investigations. References to an article or book neither cover the expanse of available literature nor do they delve into the content of the publications. Critics [5] argue that major indexing sources operate on the questionable assumption that citations by authors are accurate. Worrall and Cohn [6] also express concern regarding the quality of citations, manipulation by academics, and indexing criteria.

The primary limitation of citation-based studies is in their partial representation of the vast landscape of scholarly work. They tend to focus on the interconnectivity between papers and are not able to encapsulate the depth of content within the publications. Further concerns about bias in indexing criteria and inaccuracy of citations hamper the reliability of these investigations.

In response to these limitations, topic modeling has emerged as a valuable methodology. Topic models have been applied to scientific literature to uncover themes and trends within a particular field, which is a methodological advancement that offers researchers the ability to dissect large volumes of textual data in an automated and meaningful way. It allows researchers to explore the relatedness of articles in terms of the contained terms, phrases, and even context.

The latent dirichlet allocation ([7]) method shows promising results in identifying relevant topics. Several studies have effectively employed LDA to uncover insights from research. For instance, through a case study to predict stock prices, [8] extract, describe, and structure 14 topics and map how the field’s focus has evolved over time. Additionally, [9] induce topic clusters from the ACL Anthology by using topic models to analyze the historical trends from 1978 to 2006 and propose a model to incorporate diversity using topic entropy. An interdisciplinary perspective to this work is added by [10], where the authors analyze co-occurrences across three fields, computational linguistics, education, and linguistics, and classify the core topics of research fields. They implement a semi-supervised Naïve–Bayes approach, using the categorization provided by publishers and manually adding similar labels for linguistics literature while adapting unsupervised LDA to generate topics for the other two fields. Onan [11] used this method in biomedical literature analysis to classify information into research domains and subdomains. An analysis of 1560 articles published between 2010 and 2015 by [12] showed a bifurcation in research between technological and methodological domains, with Big Data publications not effectively channeling advanced techniques for Marketing advantages. Leverage a structural topic modeling method on sentiment analysis articles sourced from the Web of Science to conduct a comprehensive bibliometric review. Text analysis, using topic models on abstracts from (bio)medical literature, was conducted by [13] to understand how big data definitions are articulated within this domain.

While topic models like LDA offer a more nuanced approach, they are not without limitations. The interpretability of the derived topics ad the challenge of reproducibility in different studies remain areas of critical consideration. Moreover, the static nature of the data being analyzed in these works cannot help researchers who may refer to them after significant additions to the scholarly works of their field. As the analysis results from these studies are not ready for direct consumption by academia, researchers have lately started expanding the scope of literature analysis using topic models to develop methodologies and frameworks that can assist researchers in reviewing publications. Chen et al. [3.2.3 Temporal analysis

The topics discovered by each topic were qualitatively compared based on their distinctiveness, relevance, and coherence. Based on the topics’ keywords, they were intuitively associated with a subdomain of economics or computation and labeled accordingly. The cosine inter-topic similarity was also calculated.

The temporal analysis of the topics discovered by the best-performing algorithm is done by finding the topic distribution in a particular year for all articles. The first step is to create a year-topic count matrix, which stores the frequency of each topic in each year. Using this matrix, the topic distribution is calculated as shown below. The equation given below represents the distribution $\theta $ of a particular topic $ K $ within a corpus at a specific time $ T $:

$$\begin{aligned} \theta _K^{[T]} = \frac{n_K^{[T]}}{ \sum _{i=0}^k n_i^{[T]} } \end{aligned}$$

(1)

In this formula:

$ \theta _K^{[T]} $ is the proportion of topic $ K $ in year $ T $.

$ n_K^{[T]} $ is the count of documents or occurrences of topic $ K $ in year $ T $.

To further investigate the popularity of these topics over time, the increase index, rk, was calculated as the increase in the topic’s popularity between two timeframes. An increase index, $r_K$, that is more than 1 implies that the topic has increased from the first timeframe to the second, while an increase index, $r_K$, that is less than 1 indicates a decreasing trend. The definition of hot/cold is purely based on the measure of the increase index, $r_K$.

$$\begin{aligned} r_K = \frac{\sum _{t=2017}^{2022} \theta _K^{[t]}}{ \sum _{t=2000}^{2004} \theta _K^{[t]}} \end{aligned}$$

(2)

4 Corpora

Abstracts from two distinguished sources within computational economics domain—the ACM SIGecom’s Economics and Computation Conference proceedings and the Computational Economics Journal by Springer—were systematically analyzed, with a chronological consideration based on publication year. The encompassed articles are delineated in Table 3. The Computational Economics Journal boasts over 1700 articles published between 1988 and 2023, while the annual Economics and Computation Conference has contributed more than 1500 papers from 1999 to 2022. Both sources are highly specialized and reputable within the field of computational economics, ensuring that the collected data is highly relevant to our research objectives. These sources provide a comprehensive collection of research articles spanning several decades, offering both depth and breadth in terms of content and temporal coverage. The consistency and structured format of articles in these sources facilitate effective data extraction and analysis.

Upon initial collection, a segment of content retrieved from the aforementioned sources did not qualify as genuine research publications but were identified as non-research notifications such as Editorials, Publisher’s Notes, and Call for Papers. These were discerned through meticulous examination of titles and content and were consequently excluded from the dataset owing to their irrelevance to the research objectives. Additionally, during the data exploratory phase, certain articles lacking abstracts were identified and removed from the dataset. Encoding discrepancies, detected during data analysis, resulted in inaccurate character rendering within abstracts, particularly during import for topic modeling. These discrepancies were rectified following a thorough review of the abstracts.

The finalized corpus for analysis comprised 2978 paper abstracts, encapsulating a broad spectrum of research themes within computational economics. Regarding tokenization, the corpus encompassed 11,947 unique tokens, symbolizing the diverse lexicon employed across the abstracts. The compilation of a comprehensive corpus, with a significant count of unique tokens, was instrumental in ensuring a rich and diversified dataset conducive for ensuing analysis and topic modeling.

Table 3 Articles for the study

Full size table

5 Experimentation

Before modeling topics, the dataset underwent preprocessing which included conversion of all text to lowercase, tokenization, and the removal of stop words along with other frequently occurring keywords.

In order to discern the themes within computational economics literature, we leveraged LSI and LDa models from the Gensim library for Python. The optimal number of topics for each model was determined through the coherence score-a metric indicative of topic quality. Following an evaluation across a topic range of 0 to 50, it was established that both LDA and LSA models attained peak coherence scores with the topic number parameter fixed at 12. Following this, the Gensim models were utilized on the corpus with this optimized topic number setting.

We incorporated the BERTopic library to delve deeper into the topic analysis. As BERTopic does not require the number of topics as inputs, we did not perform such an analysis. After multiple runs of the model, the number and content of the topics stayed relatively consistent. We utilized the results of the final test run for the comparison. The BERTopic algorithm was applied in its base configuration, which uses sentence transformers, thus enabling us to harness contextual word embeddings to discover more subtle interrelationships among words and topics. For a more refined and robust representation, OpenAI’s Generative Pretrained Transformers (GPT) were utilized to fine-tune the topics generated and create more sophisticated and human-decipherable descriptions.

The investigative study combined the capabilities of LDA, LSA, BERTopic, and GPT models to distill and refine the themes in the computational economics literature. Through these techniques, our objective was to conduct a thorough and insightful examination of the evolving research contours within this field.

Table 4 Topics identified

Full size table

6 Discussion

This section focuses on the findings of the above experiments and their significance.

6.1 Results

We present an in-depth comparison of the results of the three algorithms used in our study and incorporate these results into our subsequent exploration and temporal analysis.

6.1.1 Comparison of topic models

The three models work on different principles and mechanisms, hence offer comprehensive views when compared. LSA is based on the distributional hypothesis, according to which words appearing in similar contexts tend to have similar meanings. It uses singular value decomposition (SVD) for dimensionality reduction of the term-document matrix while retaining the latent structure in the data. LSA is very effective in mathematical projection into a lower-dimensional space that captures synonymy and polysemy. LDA is a generative probabilistic model that assumes documents are mixtures of topics and, in turn, topics are mixtures of words. It defines a generative process for document creation, thus enabling it to infer the latent topic structure based on observed data. LDA assumes dirichlet priors to enforce sparsity and help guide the interpretation of topics. In contrast to LSA, LDA explicitly models the distribution of topics in documents and words in topics. This paper uses pre-trained BERT embeddings to capture the semantic meaning of the texts. It combines dimensional reduction, clustering, and topic modeling with ease. BERTopic is a BERT-based model that creates dense embeddings, performs dimensionality reduction using UMAP, and is clustered using HDBSCAN, followed by TF-IDF for topic labeling. One of the strengths of BERTopic is to find nuanced semantic relationships and relevant, coherent, and interpretable topics (Table 4).

Topic Models are generally evaluated using coherence and perplexity. Coherence is a measure used to evaluate the quality of topics generated by topic models. It assesses the degree of semantic similarity between high-scoring words in a topic, aiming to determine whether the words within a topic are related. We compare the results of the three models based on coherence, in Table 5. perplexity measures how well a probabilistic model predicts a sample of data. It is a measure of uncertainty when predicting the next word or document. It is a statistical measure used to evaluate the quality of probabilistic models, like LDA. Since BERTopic and LSA are not probabilistic model, comparison using perplexity scores.

We compare the results of the models on three human parameters, the distinctiveness of topics, the relevance of topics, and the coherence of top words.

The distinctiveness of topics refers to how easily distinguishable each topic is from the other topics. In terms of distinctiveness, LSA performs the worst among the three models. The topics generated by LSA tend to be more generic and not easily distinguishable. On the other hand, LDA generates topics that are more distinct and easier to differentiate. However, BERTopic generates the most distinct topics out of the three models. The topics generated by BERTopic are more specific and have a clear separation between them.

The relevance of topics refers to how well the topics are related to the field of computational economics. In terms of relevance, all three models perform reasonably well. However, BERTopic generates topics that are more closely related to the field of computational economics. The topics generated by it tend to be more specific and relevant to the field. LSA generates topics that are less specific and often not as relevant to the field. LDA generates topics that are also relevant to the field of computational economics, but it has generated some topics that are not related to the field.

The top n words in each topic refer to the most important words in each topic that help identify and describe the topic. In terms of the top n words, all three models perform well. However, LDA tends to generate topics that have a more coherent set of top n words. The top n words in LDA-generated topics are more closely related to each other and form a more cohesive description of the topic. LSA generates topics that have less coherent top n words, and BERTopic generates topics that have a mix of coherent and less coherent top n words. As the topics identified by BERTopic are more distinct, relevant, and coherent, they were used for all further analysis.

We identify 14 topics from the research paper abstracts in computational economics. The topic set identified by BERTopic is of superior quality in terms of distinctiveness, relevance to computational economics, and coherence. This can be attributed to the fact that it takes into account the context of words and can better capture their semantic meaning. In comparison, LDA relies on a bag-of-words approach, which can limit its ability to capture the context of words. Additionally, LSA produced noisy and less meaningful topics as it is known to struggle with sparsity.

Table 5 Coherence score for topic models

Full size table

Table 6 Most similar topics for topics

Full size table

Several factors can be the reason behind the considerably better performance of BERTopic. BERTopic leverages SpaCy for embeddings, which captures the semantic meaning of sentences more effectively compared to the bag-of-words approach in LDA and the SVD approach in LSA. This results in more coherent and meaningful topics. The use of UMAP for dimensionality reduction helps in effectively projecting high-dimensional data into a lower-dimensional space and preserves the global structure, which results in better clustering and topic formation. HDBSCAN, being a density-based clustering technique, identifies clusters of varying shapes and sizes and effectively handles noise in the data. Finally, the use of c-TF-IDF for token weighting ensures that the relevance of tokens is taken into account based on their contextual importance within the corpus. This enhances the distinctiveness and specificity of the topics generated.

6.1.2 Topic discovery

We labeled the topics by examining the top words. In this way, on reviewing the top n words in each topic, we can identify common themes or keywords that can be used to label the topic. A few keywords are chosen that represent the main idea of the topic and they can be used as labels. These topics and their labels are presented as word clouds in Table 2.

To further explore the relationships between topics, we constructed a similarity matrix and presented the results in Table 6. This illustrates the cosine similarity between each pair of topics, indicating how comparable they are based on their shared features or attributes.

6.1.3 Topics distribution over time

We generated a plot to visualize the temporal distribution of articles assigned to each topic, from the earliest year in the dataset to the latest year. The resulting plot, shown in Fig. 1, shows how the proportion of articles assigned to each topic has changed over time, providing insights into the shifting trends and patterns of topic prevalence within the computational economics research landscape. The graph shows the proportion of all 14 topics from 2000 to 2022, in bottom-to-top order, starting with Topic 0 at the bottom.

To further investigate the popularity of these topics over time, the increase index,$r_K$, was calculated as the increase of the topic’s popularity between two timeframes. A value that is more than 1 implies that the topic has increased from the first timeframe to the second, while one that is less than 1 indicates a decreasing trend. It is important to note that the increase index captures the relative change in popularity over time but does not consider the absolute magnitude of the topic’s proportion or its overall significance within the field of computational economics and categorizing topics can benefit greatly from a more comprehensive assessment that takes into account various qualitative factors as well . These factors could include the overall prominence of the topic within the research community, its impact on the field, the level of interest and engagement from researchers, and its relevance to current trends and challenges in computational economics.

Figure 2 depicts the proportion over time for Hot Topics, which showcase a clearly discernible upward trend. These are topics that have experienced an appreciable growth in popularity and have garnered a significant amount of attention within the field. On the other hand, Cold Topics, illustrated in Fig. 3, exhibit a noticeable downward trend, and have witnessed a steady decline in their prominence and have attracted less interest among researchers in comparison to the initial years.

In Fig. 3, Topics 0, 1, and 2 make up the bulk of the corpus. This agrees with the fact that the four topics, Market Analysis, Equilibrium Analysis, and Financial Modeling, are more general domains of computational economics, and not very specialized areas of research, which attracts far more researchers. Looking at it from the perspective of topic modeling, it is easier for an algorithm to recognize these broad topics and assign more documents to them.

Topics 10 and 12 have an extremely high increase index because the initial frequency of these topics is very low, so even a slight increase results in a large increase index. They have been excluded from this analysis. Topic 2: Equilibrium analysis has a slight decline but is still a very popular topic having a high proportion each year.

Topics like Topic 3: Game Theory, and Topic 4: Allocation Mechanism have gained popularity. The increase in game theory research can be attributed to the fact that the development of new game-theoretic models and techniques, such as mechanism design, repeated games, and evolutionary game theory, has expanded the scope of game theory and its applications. Additionally, the study of game theory has been fuelled by the recognition that many real-world problems can be modeled as games, and that game theory provides a powerful tool for analyzing and understanding these problems. This has led to increased interest and investment in game theory research and education. Topic 4, Allocation Mechanisms has gained popularity over the past two decades due to the increasing prevalence of electronic markets and the need for efficient allocation of resources. With the growth of online platforms and e-commerce, the demand for algorithms that can efficiently match buyers and sellers has increased.

Topics like Topic 9: Asset Pricing, and Topic 10: Fair Division have lost popularity. There is no available source that can confirm the decline of asset pricing as a research domain. In fact, asset pricing is still an active and popular research area in finance and economics, with many recent publications and ongoing research in the field. The decrease captured may have been influenced by shifts in the academic interests of the researchers that target the particular sources that have been used. Nonetheless, Topic 10, fair division, is a well-established area of research, and many of the key questions have already been answered. As a result, researchers may be less motivated to continue working in this area, as there are fewer novel research questions to explore. Additionally, other areas of economics and related fields may have gained more attention and funding, leading to a shift in research focus.

6.2 Trends and suggestions for future

Based on the analysis of the emerging trends in computational economics research, specifically some of the Hot Topics, we hope to make beneficial recommendations that future researchers can utilize.

The domain of dynamic mechanism design can be further explored, delving deeper to understand the design of efficient and adaptable mechanisms. The application of dynamic mechanisms in real-world scenarios can also be studied. Game theory continues to be a relevant and evolving field, and researchers can focus on exploring new dimensions and complexities of games. This can include analyzing strategic interactions that occur in different economic contexts and develo** novel solutions and equilibrium refinements. The study of efficient resource allocation offers opportunities for future researchers to explore new allocation models and fairness considerations. The impact of certain constraints that can contribute to more efficient and equitable allocation mechanisms can also be studied. Researchers can investigate different matching markets, study preference structures, analyze dynamics in matching processes, and propose improvements in the same.

In this study, we aim to discover some emerging themes that have gained popularity over the past two decades. Researchers can benefit from exploring the trends and the causes that contribute to them. A lot more insight can be drawn from the analysis by an experienced subject matter expert with more domain knowledge of individual topics.

6.3 Proposed decision-making framework

The evolving field of computational economics benefits significantly from the systematic identification and analysis of research themes. In response to the necessity for a structured approach that supports both companies and governments, we propose a comprehensive framework leveraging advanced topic modeling insights.

Advanced topic modeling techniques such as LSA, LDA, and BERTopic are applied to perform research theme identification and classification to explore a large corpus of literature on computational economics. Strong points of each technique are applied toward identification and categorization of the research themes. The thematic analysis aids in the labeling of key research areas. It categorizes them into broader domains, such as Market Analysis, Equilibrium Analysis, and Financial Modeling, or specific areas, such as Game Theory and Allocation Mechanisms. Trend analysis via temporal methods captures shifts in research focus, highlighting emerging trends and declining areas of interest.

A systematic data collection through high-quality sources ensures an extensive and representative dataset. High-quality sources like those of the ACM SIGecom’s Economics and Computation Conference proceedings, the Computational Economics Journal by Springer, etc. have been considered here. To ensure the quality of data and the filtration of noise, significant preprocessing steps such as normalization, tokenization, and removal of documents from the corpus have also been carried out. The model evaluation compares multiple topic modeling algorithms with BERTopic as the best choice, given the distinctiveness, relevance, and coherence of topics.

The visualized research themes and their interrelations using word clouds, similarity matrices, and document-topic map**s assist in understanding the thematic structure and its evolution over time. Interactive dashboards: Users can interact with the system to dynamically search for thematic trends and underlying topic distributions. Insightful reports: Summarize the essential findings and insights on the topic analysis, offering the view of the whole research landscape.

Corporations can gain value from the insights of topic modeling to assist in optimizing R &D strategies, thereby evolving strategies to align and adjust with the trending topics and high-impact areas. Moreover, it can be used to identify potential areas for innovation and investment and as a benchmark against industry peers to conduct competitive analysis. On the governance aspect, targeting the policy formulation process in ways to foster innovation and support emerging research areas in computational economics helps to define strategic allocation in research funding to support areas that are not only high impact but underexplored and to develop regulatory frameworks that enable the expansion and application of research in computational economics.

7 Conclusion

In this paper, our main objective was to perform a temporal analysis of the research publication in computational economics using topic models. We applied and compared three popular techniques, latent semantic analysis, latent dirichlet allocation, and BERTopic. The topics discovered by the three algorithms were discussed and compared on the basis of their distinctiveness, relevance, and coherence. BERTopic emerged to be the technology that yielded the highest quality of topics. A deeper exploration of the topics helped generate labels that distinctly pointed to an area of research in computational economics.

Using the topic proportion over two decades, Market Analysis, Equilibrium Analysis, Financial Modeling, and Game Theory appeared in the bulk of the corpus, which is attributed to their generality as domains. To further investigate the temporal trends, the increase index was calculated for each topic, based on which the topics were categorized as Hot and Cold topics. Some emerging trends include Dynamic Mechanism Design, Game Theory, and Allocation Mechanism, which have shown steady growth over the past two decades. On the other hand, topics like Asset Pricing, and Fair Division have lost popularity over the years and are not as appealing to researchers anymore.

This research presents a comprehensive and systematic approach to analyzing the latent themes in computational economics research, which can be applied to other research domains. The importance of using advanced natural language processing techniques, such as web scra** and topic modeling, to identify latent themes in large datasets is highlighted in this work. It also emphasizes the need to compare and evaluate different topic modeling algorithms to determine the most effective one.

The results provide significant insights into the temporal trends of the identified topics and shed light on the most prominent and popular research domains in computational economics over the past two decades. This can enable the stakeholders of this domain to predict the direction in which the field may evolve in the future. This information is essential for prioritizing research funding and directing research efforts toward areas of high impact and potential.

While this study provides valuable insights into research trends in computational economics, it is essential that we acknowledge the limitations inherent in the research methodology and data sources. By understanding these limitations, we hope that readers can better interpret the findings and identify potential avenues for future improvement and exploration.

This project examined two sources for research articles. However, to gain a comprehensive understanding of the research trends it is essential to expand the analysis to other prominent computational economics journals and conferences to compare the trends and patterns of research topics across diverse sources. This will provide a more robust representation of the development in the field.

While this study used paper abstracts, one can also conduct the analysis with the full text of available research publications in order to gain a better understanding of each paper. Analyzing the complete content of articles allows for a more detailed examination of the research findings, methodologies, and discussions within each paper. However, additional resources, access to publishers’ content, and other necessary permissions to publish the work may be required for this.

In this work, three topic modeling algorithms, namely LSA, LDA, and BERTopic, were implemented to analyze the research topics in computational economics. It is worth noting that several other popular topic modeling algorithms can be incorporated to develop a more comprehensive understanding. Furthermore, to enrich the analysis and gain deeper insights into the research topics, complementary natural language processing techniques can be employed. Sentiment analysis and entity recognition further enhance the understanding of the topics and trends by shedding light on the opinion of researchers and the keywords being discussed. This will broaden the scope of our analysis and obtain a more nuanced understanding of the discipline.

Further investigation of the reasons for the decline in certain research domains, such as asset pricing and fair division and exploration of potential avenues for reviving interest in these areas can benefit greatly from the assistance of subject experts. This research aimed to discover the latent themes of the corpus, and a more detailed analysis of the individual papers within each topic can be conducted to identify specific research questions and methods that have been used over time and to identify potential gaps in the literature. Finally, it is important to explore the potential applications of the identified research areas in real-world contexts. It is crucial to understand how this study can be applied to address economic policy-making or inform financial decision-making processes. By exploring the practical applications, we can enhance the relevance and applicability of the research topics identified, encouraging their integration into decision-making operations and driving optimistic outcomes in different economic sectors.

Data availability

The dataset will be made publicly available.

References

Alexakis, C., Dowling, M., Eleftheriou, K., Polemis, M.: Textual machine learning: an application to computational economics research. Comput. Econ. (2021). https://doi.org/10.1007/s10614-020-10077-3
Article Google Scholar
Kammari, M., S, D.B.: Time-stamp based network evolution model for citation networks. Scientometrics 128(6), 3723–3741 (2023). https://doi.org/10.1007/s11192-023-04704-7
Kleminski, R., Kazienko, P., Kajdanowicz, T.: Analysis of direct citation, co-citation and bibliographic coupling in scientific topic identification. J. Inf. Sci. 48(3), 349–373 (2020). https://doi.org/10.1177/0165551520962775
Tang, K.-Y., Chang, C.-Y., Hwang, G.-J.: Trends in artificial intelligence-supported e-learning: a systematic review and co-citation network analysis (1998–2019). Interact. Learn. Environ. 31(4), 2134–2152 (2021). https://doi.org/10.1080/10494820.2021.1875001
Article Google Scholar
Sweetland, J.H.: Errors in bibliographic citations: a continuing problem. Libr. Q. 59(4), 291–304 (1989). https://doi.org/10.1086/602160
Article Google Scholar
Worrall, J.L., Cohn, E.G.: Citation data and analysis: limitations and shortcomings. J. Contemp. Crim. Justice 39(3), 327–340 (2023). https://doi.org/10.1177/10439862231170972
Article Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). https://doi.org/10.5555/944919.944937
Article Google Scholar
Aziz, S., Michael Dowling, H.H., Piepenbrink, A.: Machine learning in finance: a topic modeling approach. Eur. Financ. Manag. 28(3), 744–770 (2022). https://doi.org/10.1111/eufm.12326
Article Google Scholar
Hall, D., Jurafsky, D., Manning, C.D.: Studying the history of ideas using topic models. In: Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pp. 363–371. Association for Computational Linguistics, Honolulu, Hawaii (2008). https://aclanthology.org/D08-1038
Paul, M., Girju, R.: Topic modeling of research fields: An interdisciplinary perspective. In: International Conference RANLP 2009—Borovets, Bulgaria, pp. 337–342 (2009). https://aclanthology.org/R09-1061/
Onan, A.: Biomedical Text Categorization Based on Ensemble Pruning and Optimized Topic Modelling. Hindawi Limited (2018). https://doi.org/10.1155/2018/2497471
Amado, A., Cortez, P., Rita, P., Moro, S.: Research Trends on Big Data in Marketing: A Text Mining and Topic Modeling Based Literature Analysis. Elsevier, New York (2018). https://doi.org/10.1016/j.iedeen.2017.06.002
Book Google Scholar
Altena, A.J., Moerland, P.D., Zwinderman, A.H., Olabarriaga, S.D.: Understanding Big Data Themes from Scientific Biomedical Literature Through Topic Modeling. Springer, New York (2016). https://doi.org/10.1186/s40537-016-0057-0
Book Google Scholar
Chen, H., Wang, X., Pan, S., **ong, F.: Identify topic relations in scientific literature using topic modeling. IEEE Trans. Eng. Manag. 68(5), 1232–1244 (2021). https://doi.org/10.1109/TEM.2019.2903115
Article Google Scholar
Asmussen, C.B., Møller, C.: Smart Literature Review: A Practical Topic Modelling Approach to Exploratory literature review. Springer, New York (2019). https://doi.org/10.1186/s40537-019-0255-7
Book Google Scholar
A., A.M.G., Robledo, S., Zuluaga, M.: Topic Modeling: Perspectives From a Literature Review. Institute of Electrical and Electronics Engineers (IEEE) (2023). https://doi.org/10.1109/access.2022.3232939
Chakkarwar, V., Tamane, S.C.: Quick Insight of Research Literature Using Topic Modeling. Springer, New York (2019). https://doi.org/10.1007/978-981-15-0077-0_20
Book Google Scholar
Barua, A., Thomas, S.W., Hassan, A.E.: What are developers talking about? An analysis of topics and trends in Stack Overflow. Empirical Softw. Eng. 19, 619–654 (2014). https://doi.org/10.1007/s10664-012-9231-y
Franz, P.J., Nook, E.C., Mair, P. and Nock, M.K.: Using topic modeling to detect and describe self-injurious and related content on a large-scale digital platform. Suicide Life Threat. Behav. 50(1):5–18 (2020). https://doi.org/10.1111/sltb.12569
Ogunleye, B., Maswera, T., Hirsch, L., Gaudoin, J., Brunsdon, T.: Comparison of topic modelling approaches in the banking context. Appl. Sci. 13, 797 (2023). https://doi.org/10.3390/app13020797
Article Google Scholar
Grootendorst, M.: Bertopic: Neural topic modeling with a class-based tf-idf procedure. ar**v preprint ar**v:2203.05794 (2022)
Abdellaoui, R., Foulquié, P., Texier, N., Faviez, C., Burgun, A., Schück, S.: Detection of Cases of Noncompliance to Drug Treatment in Patient Forum Posts: Topic Model Approach. JMIR Publications Inc. (2018). https://doi.org/10.2196/jmir.9222
Kigerl, A.: Profiling Cybercriminals. SAGE Publications, Thousand Oaks (2017). https://doi.org/10.1177/0894439317730296
Book Google Scholar
Linton, M., Teo, E.G.S., Bommes, E., Chen, C.Y., Härdle, W.K.: Dynamic Topic Modelling for Cryptocurrency Community Forums. Springer, New Yorl (2017). https://doi.org/10.1007/978-3-662-54486-0_18

Download references

Funding

Open access funding provided by Manipal Academy of Higher Education, Manipal

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Manipal University Jaipur, Jaipur, Rajasthan, India
Malvika Mishra
Department of Computer Science and Engineering, Gyan Ganga Institute of Technology and Sciences, Jabalpur, Madhya Pradesh, India
Santosh Kumar Vishwakarma
SCSE, VIT Bhopal University, Sehore, Madhya Pradesh, India
Lokesh Malviya
Department of Computer Science and Engineering, Manipal Institute of Technology Manipal, Manipal Academy of Higher Education, Udupi, Karnataka, India
S. Anjana

Authors

Malvika Mishra
View author publications
You can also search for this author in PubMed Google Scholar
Santosh Kumar Vishwakarma
View author publications
You can also search for this author in PubMed Google Scholar
Lokesh Malviya
View author publications
You can also search for this author in PubMed Google Scholar
S. Anjana
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Author Malvika Mishra wrote the manuscript under the supervision of Santosh Kumar Vishwakarma, Authors Lokesh Malviya and Anjana.S edited the manuscript and communicated

Corresponding author

Correspondence to S. Anjana.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Mishra, M., Vishwakarma, S.K., Malviya, L. et al. Temporal analysis of computational economics: a topic modeling approach. Int J Data Sci Anal (2024). https://doi.org/10.1007/s41060-024-00596-9

Download citation

Received: 21 March 2024
Accepted: 24 June 2024
Published: 11 July 2024
DOI: https://doi.org/10.1007/s41060-024-00596-9

Temporal analysis of computational economics: a topic modeling approach

Abstract

1 Introduction

2 Related work

4 Corpora

5 Experimentation