1 Introduction

Social media, such as Twitter, Facebook, Sina Weibo, and WeChat, has become part of many people’s daily lives. The rapid popularity of smartphones and the 5G network enables every citizen to report what is happening around him or her at any time. This behaviour is not only limited to daily events but can also be observed in emergencies. People tend to publish posts on social media to express their concerns and perceptions, providing a large amount of crowdsourcing information (** of natural disasters using social media. IEEE Intell Syst 29:9–17. https://doi.org/10.1109/MIS.2013.126 " href="/article/10.1007/s11069-021-05081-1#ref-CR35" id="ref-link-section-d654812e484">2014; Schnebele et al. 2014; Wang et al. 2015). To extract operational information, studies focus on develo** analytic approaches to advance information extraction from social media in disasters. Specifically, text classification is the main task related to information extraction. Multiple machine learning (ML) classifiers, such as SVMs (Caragea et al. 2011), naïve Bayesian classifiers (Imran et al. 2013), convolutional neural networks (Huang et al. 2019), and generative adversarial networks (Dao et al. 2018), have been deployed to classify social media data with disaster-related topics. For example, Li et al. (2012) proposed an event detection and analysis system based on Twitter and applied it to detect traffic accidents in Houston. Sakaki et al. (2013) proposed an algorithm to monitor tweets and to detect earthquakes. The development of these techniques enables the detection of disaster events using posts shared on social media.

Although progress is satisfying, some challenges still exist. Detecting and characterizing emergency-related events where the type of event of interest is not known in advance is still a problem (Atefeh and Khreich 2015). There are many kinds of emergency events, including natural disasters, man-made accidents, public health events, and social security events. For emergency managers, it is more practical to adopt one model that can cover all kinds of possible events rather than several special models for earthquakes or typhoons. The models for a type of emergency can achieve high classification accuracy; however, the results may be poor when they are tested on other types. Pekar et al.’s experiments on tweets of different emergencies show that if a classifier trained for a specific type of emergency and evaluated for other types of emergency, its performance would be reduced by 70% (Pekar et al. 2016). Moreover, emergency response follows the territorial management and separate departmental management principle, which means, for first-time situation awareness, knowing the fine-grained 3 W attribute information (what, where, and when) is important.

This paper focuses on detecting “all hazards” from social media and extracting their 3 W attribute information. In our early research (Huang et al. 2021), we developed a similarity-based emergency event detection framework consisting of three phases: the classification phase, the extraction phase, and the clustering phase. Here, the overall process of the original framework is consistent, but we modify the specific models. The classification phase uses the integrated approach combining BERT and an attention-based bidirectional long short-term memory model (BERT-Att-BiLSTM) to detect emergency-related posts. The extraction phase extracts the what, where, and when information of the post. In the clustering phase, if all the 3 W attribute information of post x is extracted, our defined text similarity between post x and event e can be calculated, based on which an unsupervised dynamical text clustering algorithm is proposed to cluster social media posts into different events; otherwise, we use the logistic regression model to determine whether post x describes event e.

Our study contributions are twofold. First, it advances our capacity to classify different kinds of emergency events from massive social media data by a unifying and extensive method. Based on a certain amount of data accumulation, we refine the seed words for different types of emergencies to crawl the microblog posts and train the BERT-Att-BiLSTM model to discriminate the emergency-related posts. These seed words are assigned different weights, based on which emergency-related posts can be classified into different event types. Second, we introduce a complete framework of social network data processing for early emergency event detection, which integrates text classification, attribute information extraction, and a new text clustering approach, and the framework is proven to be feasible for case studies and practical applications. Our study can help to form a rapid, transparent, and timely emergency reporting mechanism.

The remainder of this paper is structured as follows. Section 2 provides a full discussion of related work. Section 3 provides an overview of the early detection of emergency event (EDEE) framework we followed. Section 4 demonstrates the advantages of the EDEE framework by comparing its performance with baseline models and presenting two specific case studies. Section 5 discusses the significance of our approach in practical applications, and Sect. 6 proposes possible directions for future work.

2 Related work

The existing literature related to the analysis of social media data for crisis response and disaster management is rapidly growing. This paper focuses more narrowly on text classification and 3 W attribute information extraction. Hence, we discuss related work in the extant literature focusing on this aspect.

Current studies on emergency event identification from social media data make use of supervised and unsupervised ML methods, such as classifiers, clustering, and language models (Atefeh and Khreich 2015). Recently, deep learning has emerged as a promising technique for capturing high-level abstractions in data, which provides significant improvement for text classification over traditional ML methods. Deep learning, in particular, convolutional neural networks (CNNs) (Kim 2014), has been applied with success to identify informative tweets during crisis situations such as flooding disasters (Caragea et al. 2016), Nepal Earthquake, Typhoon Hagupit, California Earthquake, and Cyclone Pam (Nguyen et al. 2017). Burel et al. (2017a, b) proposed semantically enhanced CNN models to detect crisis information categories, and their models were evaluated on the CrisisLexT26 data set (Olteanu et al. 2014), which consists of approximately 28,000 labelled tweets collected during 26 crisis events in 2012 and 2013. Burel et al.’s work is an exploration for “all hazard” detection; however, they focused only on one of the three tasks investigated in this paper.

The above approaches have been mostly pursued in academic contexts. Even though they show the potential of automatic approaches for dealing with a large number of social media posts during emergency events, adoption by practitioners is conditioned on their availability, efficiency, and stability. AIDR (Imran et al. 2014) and CREES (Burel and Alani 2018) are two of the very few tools that can automatically detect and classify multiple types of emergency-related content on social media; both of them were designed for handling English posts on the Twitter platform. Work to date has focused predominantly on Twitter as the social media source. Some researchers have proposed Chinese disaster detection approaches on Sina Weibo. Unfortunately, these studies either focused on detecting a certain type of event such as an earthquake (Robinson et al. 2014), or only identified damage-related information (Bai et al. 2015; Bai and Yu 2016; Liu et al. 2018), and they all relied on traditional ML models and required more effort to conduct feature engineering, still having a large distance from “all hazards” detection.

More recently, recurrent neural networks (RNNs) specialized for sequential modelling have been used more frequently in text classification. RNNs with gating mechanisms like long short-term memory (LSTM) and bidirectional long short-term memory (BiLSTM) (Nowak et al. 2017; Liu and Guo 2019) have been widely used, as they can capture long-term dependencies. The attention mechanism, which highlights the important information from the contextual information by setting different weights, has also been applied to improve the accuracy (Zhang et al. 2018). The more important event is the introduction of bidirectional transformers for language understanding (BERT) (Devlin et al. 2018). BERT based on transformers instead of the usual RNNs refreshed the previous optimal performance record of 11 NLP tasks, bringing a breakthrough development to the pretraining models. The above latest progress should be considered in the emergency detection task.

In addition, existing classifiers and event detection pipelines do not provide the capability to answer where and when emergency events occur. Fine-grained location and time information plays an important role in emergency response activities, that is, to coordinate territorial rescue forces according to the degree of urgency. Although location and time extraction techniques can estimate where and when a post came from based on geotags and the post content (Guan and Chen 2014; Fan et al. 2020), the resolution of estimated location/time extracted from different posts may be different. Some may be at the city/hour level, while some may be at the county/minute level. Emergency event detection systems need to avoid identifying posts describing the same event but with slight differences in location and time as different events. Therefore, a post clustering approach is necessary.

On the basis of the above-listed work, we develop an integrated framework using Chinese Weibo posts to detect “all hazard” events, which combines the latest deep learning text classification models, location and time extraction approaches, and event clustering algorithms. Our framework is application-oriented and can be employed by Chinese authorities for early emergency event detection and aiding emergency response.

3 Methodology

Here, we describe our EDEE framework. Its workflow is shown in Fig. 1. This framework has three phases. In Phase I, we collect microblog posts from the Sina platform with seed words, preprocess them, and then extract these emergency-related posts using the BERT-Att-BiLSTM model. In Phase II, we recognize the event type based on the weight scoring method of seed words and extract the location and time entities of the posts. In Phase III, if all three entities of post x are extracted, a similarity-based clustering algorithm is used to cluster this post into an event; otherwise, it is input into the logistic regression model to determine whether it describes a certain event.

Fig. 1
figure 1

The process flow of the EDEE framework

3.1 Phase I: text classification

We consider 30 types of emergencies, including 9 types of natural disasters, 13 types of man-made accidents, 6 types of public health events, and 2 types of social security events. To determine the seed words and these weights, first, we manually collect microblog posts related to each type of emergency, perform word frequency statistics, and select words with a probability greater than 1% as candidate seed words. The initial weights are approximately proportional to word frequency. Then, we use candidate seed words to crawl data and adjust seed words and their weights. The final seed words and their weights are shown in Table 1. After data collection, these raw posts are preprocessed by removing URLs, whitespaces, and punctuations and then unified in the UTF-8 encoding format.

Table 1 Emergency events considered in this paper and their seed words

Emergencies are unconventional events that occur infrequently. Sometimes, although a post contains seed words, it does not describe the emergency. To solve this problem, the BERT-Att-BiLSTM model is applied in this phase. The architecture of this model is shown in Fig. 2. First, the semantic representation of each post is obtained by the pretrained BERT model; then, the semantic representation of each character in the post is input into the Att-BiLSTM model for further semantic analysis; finally, the softmax layer outputs the label 0 (false) or 1 (true).

Fig. 2
figure 2

The BERT-Att-BiLSTM model architecture

BERT is a word vector generation model that adopts a bidirectional transformer architecture that analyses the context to the left and right of the word. This paper uses pretrained BERT-Base-Chinese with 12 layers, 768 hidden, 12 heads, and 110 M parameters. It is available from the Google BERT model site.

The BiLSTM layer contains the forward LSTM (represented as \(\overrightarrow{LSTM}\)) and the backward LSTM (represented as \(\overleftarrow{LSTM}\)), and its outputs are stated as:

$$\begin{array}{c}{h}_{i}^{s}=\left[\overrightarrow{{h}_{i}^{s}},\overleftarrow{{h}_{i}^{s}}\right]\end{array}$$
(1)

where \(\overrightarrow{{h}_{i}^{s}}\) represents the forward information of word i in sentence s, \(\overleftarrow{{h}_{i}^{s}}\) represents the backward information, and \({h}_{i}^{s}\) is the concatenated hidden vector. The attention weight of each word is expressed as follows:

$$\begin{array}{c}{e}_{i}^{s}={v}^{T}tanh\left({\omega }^{s}{h}_{i}^{s}+{b}^{s}\right)\end{array}$$
(2)
$$\begin{array}{c}{\alpha }_{i}^{s}=\frac{\mathrm{exp}\left({e}_{i}^{s}\right)}{{\sum }_{j=1}^{T}\mathrm{exp}\left({e}_{i}^{s}\right)}\end{array}$$
(3)

where \({\omega }^{s}\) and \({b}^{s}\) represent the weight and bias in the attention mechanism, tanh(.) is the hyperbolic tangent function, T is the number of words, and \({\alpha }_{i}^{s}\) is the attention weight of each word in sentence s. The output of the context representations is:

$$\begin{array}{c}F=\sum \left({\alpha }_{i}^{s}{*h}_{i}^{s}\right)\end{array}$$
(4)

F is considered the feature for text classification. Then, the softmax layer is used to generate the conditional probabilities over the class space to achieve classification.

3.2 Phase II: entity extraction

For an emergency-related post, three steps are needed to obtain its entity information, namely type recognition, location extraction, and time extraction.

(1) Type recognition

For a certain type of event, count the occurrence times of its seed words in the post, and calculate the weighted summation:

$$\begin{array}{c}{\omega }_{e}={\sum }_{j}{c}_{j}{\omega }_{j}\end{array}$$
(5)

where \({c}_{j}\) is the occurrence time of seed word j and \({\omega }_{j}\) is the corresponding weight, whose value is shown in Table 1. If \({\omega }_{e}>0.3\), the post is labelled as type e. (The threshold for \({\omega }_{e}\) is related to the weights of seed words. Here, we tested 0.09, 0.1, 0.19, 0.2, 0.29, 0.3, 0.39, 0.4, 0.49, 0.5, 0.59, 0.6, 0.69, and 0.7 and found that 0.3 was the best. It should be noted that one post may be labelled as more than one type. For example, due to the chain effects of disasters, a typhoon is often accompanied by a rainstorm, and one post may be labelled as a typhoon and a rainstorm simultaneously. We directly recognize the post as a typhoon event and a rainstorm event without further processing.

(2) Location extraction

There are two ways to obtain location information. One is from the geotag in the post. This information is accurate but sparse. If the post contains no geotag, we analyse the post content. The FoolNLTK packageFootnote 1 is used to extract the location entity (e.g., Bei**g, or Tiananmen Square) from the post content. If there is no location entity in the content, the location will be set as empty. Otherwise, we call the Gaode APIs (https://www.amap.com/) to query the extracted location entity and extend it to four-level structured data, including the province, the city, the county/district, and the village/town.

(3) Time extraction

Regular expression matching is used for time extraction. If the absolute time is contained in the post, such as a certain day, or a certain hour/minute/second, we extract it as the event time. Otherwise, if there is only relative time contained, like yesterday, last week, or early morning, we convert it to the absolute time based on the posting time. If there is no time information, the time is set as empty.

3.3 Phase III: event clustering

This phase clusters posts into different events. If all the 3 W entities of the post are extracted, a similarity-based clustering algorithm is used to cluster this post into an event. Otherwise, if the location entity is empty, the post is removed; if the time entity is empty, the post is input into the logistic regression model to determine whether it describes a certain event.

(1) Similarity-based clustering algorithm

It is believed that if post i and post j describe the same event, their event types must be the same, while their locations and times can be slightly different. Therefore, the similarity between post i and post j is defined as follows, giving event type sc a decisive role and location sl and time st equal roles:

$$\begin{array}{c}similarity={s}_{c}\times \left(0.5{s}_{l}+0.5{s}_{t}\right)\end{array}$$
(6)

where sc, sl, and st represent the similarity of the event type, the similarity of the location, and the similarity of the time, respectively. Table 2 shows the assignment rules of sc, sl, and st. The basic prior rule is that for posts with the same event type, if their location difference is small and time difference is not too large, or if their time difference is small and location difference is not too large, they should be considered as one event. Here, we define that if the similarity between post i and post j is greater than 0.5, the two posts describe one event. First, sc is set to 1 or 0 based on whether the event types of the two posts are the same. Second, as the location of most emergency-related posts can be accurate to the city level and the time can be accurate to hours, we set sl to 0.4 if the city is the same and st to 0.7 if the time difference is less than 1 h. In the above case, the similarity is 0.55 (> 0.5). In addition, if the locations at the county/district level are the same, we set sl to 0.6; if the time difference is less than 1 day, st is set to 0.5. In this case, the similarity is also 0.55 (> 0.5). Similarly, other values in Table 2 are defined.

Table 2 Assignment rules of sc, sl, and st

A set of vectors \(\chi =\left\{{x}^{(1)},{x}^{(2)},\dots ,{x}^{(l)}\right\}\) is defined, in which \({x}^{(i)}\epsilon {\mathbb{R}}^{3}\) is the vector consisting of the type, location, and time of post i. The purpose of the algorithm is to partition these posts into the event set \(\mathcal{E}=\left\{{e}_{1},{e}_{2},\dots ,{e}_{k}\right\}\). \({e}_{j}\epsilon {\mathbb{R}}^{3}\) is the vector consisting of event j’s type, location, and time. The process of the clustering algorithm is as follows:

  1. (a)

    Calculate the similarity between post \({x}^{(i)}\) and event \({e}_{j}\) based on Eq. (6). If the similarity > 0.5, merge \({x}^{(i)}\) into \({e}_{j}\). Otherwise, take \({x}^{(i)}\) as a new event \({e}_{k+1}\).

  2. (b)

    If there are new posts merging into event \({e}_{m}\) in step (a), compare the location and time of these posts with those of \({e}_{m}\), and then update \({e}_{m}\) by the most accurate description.

  3. (c)

    If these are not updated in step (b), the algorithm will finish. Otherwise, these updated events are taken as new posts and return to step (a).

(2) Logistic regression model

We compare a post in which time is empty with the events of the same type detected within one week. Here, we use the logistic regression (LR) model. Three independent variables are considered: Nw, the number of words both in post x and event e; Δt, the time difference between the posting time of x and the occurrence time of e; Np, the number of posts in event e. The LR formula is as follows:

$$\begin{array}{c}Logit\left(P\right)={\beta }_{0}+{\beta }_{1}{N}_{w}+{\beta }_{2}\Delta t+{\beta }_{3}{N}_{p}\end{array}$$
(7)

where P is 1 or 0, which means post x describes event e or not, respectively. Nw is calculated as follows. First, the latest 60 posts in event e are selected. Then, these posts and post x are segmented with the Jieba toolkit and stop words are removed. Nw is counted based on the remaining words.

4 Experiments and case studies

4.1 Experiments

(1) Text classification

For this task, we collected 890,938 Weibo posts using seed words, among which 70,927 posts were emergency-related and 820,011 were unrelated. These emergency-related posts are annotated with type labels, and the statistics are shown in Fig. 3.

Fig. 3
figure 3

The number of posts related to different types of emergencies

The data set is divided into the training set, the validation set, and the testing set at a ratio of 6:2:2. To verify the validity of the BERT-Att-BiLSTM model, we use word2vec-Att-BiLSTM and BERT as comparison baselines. The results are shown in Table 3 and Fig. 4. For almost all types of events, BERT can greatly improve classification performance. In addition, the combination of BERT and Att-BiLSTM further improves the recall, except for a few event categories such as tornados and stampedes. The number of posts for these types of events is small, which implies that the superiority of BERT-Att-BiLSTM may appear when the number of these posts increases.

Table 3 Classification performance of different models
Fig. 4
figure 4

Classification performance of different models for different types of events

(2) Entity extraction

The 70,927 emergency-related posts are used to test the word-based event-type recognition method. The accuracy is 90.58%, which is better than all model-based methods in our previous study (Huang et al. 2021). To test the location and time extraction models, we randomly select 27,000 posts in these 70,927 posts and annotate them with the location and time labels. The results show that the accuracy of the FoolNLTK package to extract location is 94.7% and the accuracy of the regular expression matching to extract time is 96.9%, which is believed acceptable.

(3) Event clustering

We select 20,000 posts that have been detected in a certain week to evaluate the event clustering method. Among these posts, 18,714 have all three entities and 1,286 have empty time entities. The accuracy of the similarity-based clustering algorithm is 94.15%. For the logistic regression model, we use 60% of the 1,286 posts for training and the rest for testing, and the testing accuracy is 84.91%. The overall accuracy of our event clustering approach is 93.56%.

4.2 Case studies

In this part, we apply the EDEE framework to real-time Weibo data and select two cases to show its effect. One case is an accident (the ** change in large networks. PLoS ONE. https://doi.org/10.1371/journal.pone.0008694 " href="/article/10.1007/s11069-021-05081-1#ref-CR42" id="ref-link-section-d654812e3461">2010), as shown in Fig. 6. Here, the blocks in the diagram represent high-frequency nouns and verbs, while the stream fields represent high-frequency adjectives. From 14:00 to 15:59, posts about the **angshui Explosion also mentioned the “earthquake”. In these posts, people speculated that there might have been an earthquake or an explosion. These posts were judged as an explosion as well as an earthquake. Similarly, posts about the Guannan Earthquake and the Guannan Explosion both contain the “earthquake” and the “explosion”. This is because CSN published a message at approximately 15:00, reporting that an earthquake with a focal depth of 0 km (a suspected explosion) occurred in Guannan, and the message and their forwarding posts were detected by our algorithm. The above results show that in the early stage of an emergency when it cannot accurately determine the event type with scarce information, our word-based event-type recognition approach may classify the posts into several event types rather than one type with the highest probability. By doing so, it is effective to avoid missing reports.

Fig. 6
figure 6

High-frequency word evolution and visualization of detected events from 14:00 to 23:59 on 21 March

(2) COVID-19

In December 2019, the first atypical pneumonia case, caused by a novel coronavirus (now renamed COVID-19), was identified and reported in Wuhan City, Hubei Province, and later, COVID-19 spread around the world and became a pandemic. Figure 7 shows the distribution of related posts detected. On the evening of 30 December, the EDEE framework first detected an unexplained pneumonia event at the “South China Seafood City” market in Wuhan (Fig. 8a). On the afternoon of 31 December, the China National Health Commission and the China CDC dispatched experts to Wuhan to assist in the investigation, and an official account, CCTV News, explained this epidemic, causing much discussion and forwarding (Fig. 8b). The word clouds of early posts show that the frequency of “SARS” is extremely high (Fig. 9). This is because the public speculated that unexplained pneumonia is related to SARS (severe acute respiratory syndrome coronavirus), while the official refuted the rumours of SARS. From 19 January, the Wuhan municipal government began to hold regular press conferences and answered questions on COVID-19. People’s attention continued to grow, and the number of related posts remained high.

Fig. 7
figure 7

Publishing time distribution of posts related to COVID-19 in Wuhan

Fig. 8
figure 8

The post was first detected (a) and the post with maximum forwarding times (b) for COVID-19

Fig. 9
figure 9

Word clouds of posts about COVID-19 from 30 December 2019, 6 to January 2020

After 19 January, COVID-19-related posts were detected in other provinces and cities outside Hubei Province. On 19 January, relevant posts were detected in Guangdong and Shanghai, and on 20 January, they were detected in Bei**g, followed by Hunan, Henan, Jiangxi, Sichuan, and Chongqing. In other cities beside Wuhan in Hubei Province, such as Huanggang, **gzhou, and **gmen, COVID-19-related posts were also detected. Figure 10 shows the time distribution of COVID-19-related posts detected in other cities of Hubei Province (a) and other provinces (b). Comparing Fig. 10 with the daily confirmed cases of these cities or provinces, it can be found that the time when the first related post was detected basically coincides with the time when the first case was published. In addition, there is a certain correlation between the number of posts and the number of cases in the cities of Hubei Province, but the correlation is weak in provinces outside Hubei. This is because in cities of Hubei, the economic development and population structure are relatively similar, and the proportion of people who use Weibo and their posting frequency are also similar. While it is quite different on the national scale, people from economically developed regions such as Bei**g and Shanghai use Weibo more frequently, so the number of related posts in these regions is significantly higher than that in other regions.

Fig. 10
figure 10

Publishing time distribution of posts related to COVID-19 in other cities and provinces

5 Practical applications and discussion

Based on the EDEE framework, we developed a cloud service system for emergency event detection with social media data. The system includes a PC terminal and a mobile terminal, and the interface is shown in Fig. 11. The homepage of the system shows the heatmap of emergencies detected in the last 5 days, the hot emergencies ranking according to the hot degree (which is represented by the number of relative posts), the time distribution of the four categories of emergencies in the recent 30 days, the sentiment analysis results of the posts, the regional public opinion hot degree, and the personalized push service setting module. When users click on an emergency, they go to the emergency information page, where the emergency-related posts are shown in detail. The personalized push service setting module supports users in setting the location and event type that they are interested in, by which users can receive alerts that satisfy their conditions in the WeChat application.

Fig. 11
figure 11

The system interface

The system has been in operation since June 2020 and now has more than 400 users. On average, approximately 80 emergency events are detected every day. We counted 3,170 events with more than 100 related posts during the six months from June to November 2020, as shown in Fig. 12. Man-made accidents had the largest number of 1,319 (accounting for 42%), followed by natural disasters with a number of 1,121 (accounting for 35%) and social security events with a number of 500 (accounting for 16%), and public health events had the smallest number of 230 (accounting for 7%). In terms of the specific types of emergencies, there were more traffic accidents and major criminal cases, followed by fire accidents, rainstorms, earthquakes, and typhoons. For public health events, due to the continuing pandemic of COVID-19, 195 related events were detected. In addition, it is worth noting that six other types of public health events were detected, including swine foot-and-mouth disease infection in Leizhou, Guangdong Province on 11 July, dengue infection in Taipei on 2 October, concentrated tuberculosis infection in Xuzhou, Jiangsu Province, on 14 October, and norovirus infections in Bayan County of Harbin City on 24 October and Zigong City of Sichuan Province on 25 November. These events are not in our detection list, but they were still detected because we adopted some common seed words for public health events. If we need to strengthen the detection of these events, more detailed seed words can be included.

Fig. 12
figure 12

Statistical chart of detected emergencies from June to November 2020

Figure 13 shows the location distribution of the detected emergencies. The regional distribution of emergencies in China is very uneven. According to the comprehensive statistics, there are many kinds and high frequencies of emergency events in the south-east, which gradually decrease to the north-west. In particular, Sichuan Province has the largest number of emergency events, especially for natural disasters. Sichuan Province is in an earthquake zone with many mountains. In 2020, Sichuan not only suffered from many large and small earthquakes but also experienced heavy rainstorms in August, with many secondary disasters such as landslides and floods.

Fig. 13
figure 13

Location distribution map of the detected emergencies

Figure 14 shows the streamgraph of different types of emergencies. The coloured stream flowing (narrow or wide) maps the decreased or increased number of a certain type of emergency event over time. From the figure, it can be seen that the frequency of natural disasters changes greatly with time, while the frequency of other events is relatively stable. In particular, affected by the rainy season, rainstorms, typhoons, floods, and other events occurred more frequently in June and August, and in September, the frequency of these events slowed down.

Fig. 14
figure 14

Streamgraph of detected emergencies of different types

We also compare the interval between the time of the emergency being detected and the time of the emergency occurring, as shown in Fig. 15. In general, most emergencies (approximately 55.6%) can be detected within one day. A total of 12.46% of these events were detected within 1 h, 9.46% within 1–4 h, 8.71% within 4–12 h, and 24.95% within 12–24 h. This demonstrates the effectiveness of our system in the early detection of emergencies. Considering different types of events, most natural disasters can be detected within one day, and a few will be detected within 1–3 days. For man-made accidents and social security events, the time interval distribution is relatively scattered. Most of these events were detected between 12 and 24 h, while some events were detected after many days, such as traffic accidents and major criminal cases. This is because these major traffic accidents and criminal cases may be adjudicated by the court after a period, and relevant posts are detected. Since the event clustering approach only considers the events that occurred in the last week, these events are judged as new events. This problem can be solved by considering a longer period for event clustering. For public health events, the COVID-19 epidemic became normal between June and November 2020, and people’s attention became low. Many of the COVID-19-related posts we detected were officially released, summarizing the case information of the previous day, so time intervals were mainly concentrated in 1–2 days.

Fig. 15
figure 15

The interval between the time of the emergency being detected and the time of the emergency occurring

The system has already been applied to the practical work of the Ministry of Emergency Management of China (MEM). Watchmen reviewed and summarized the event information detected by this system and that directly submitted by the local department and then pushed to relevant persons for disposal according to the urgency and severity of the events. Here, the definitions of the severity of different types of events are different. For example, for an earthquake, if its magnitude is more than 5 and the population density in the area 50 km away from the epicentre reaches 200 people per km2, the earthquake will be handled; for an explosion accident, as long as it happens in an enterprise, it needs to be dealt with. The recognition of emergency severity is performed manually, and it can be processed automatically by extracting more text information and making relevant rules in the future.

Practical application has proven that approximately 5 to 10 important emergency events can be first found by our system every day, and staff of MEM will check with the local department as soon as they receive alarm information. This effectively restrains staff of local departments from delaying to report or concealing emergencies to improve the efficiency of emergency disposal. However, it should be noted that not all emergencies can be detected the first time, as sometimes no one uses Weibo immediately after an emergency. At 2 p.m. on 10 January 2021, an explosion occurred at a gold mine in the rural areas of Qixia City, Shandong Province, with 22 miners trapped underground. The explosion was not reported to the local emergency department until 20:48 on 11 January, and the first Weibo post was detected at 23:51. We do not see this as a vital limitation because the types of social media become increasingly varied, with their prevalence steadily increasing. We can improve the efficiency of emergency detection by using more data sources, such as WeChat Moments and TikTok.

6 Conclusion

In this paper, we proposed a new framework aiming at the early detection of emergency events from social media. The framework integrates the emergency-related text classification, 3 W attribute information (what, where, and when) extraction, and emergency event clustering contributing to detecting emergencies and discovering valuable knowledge that is difficult to detect by humans from a large collection of texts. For text classification, massive Weibo posts were used to train different models, and the results show that the BERT-Att-BiLSTM model works well in discriminating different types of emergency-related posts. Based on the extracted 3 W attribute information, we created an unsupervised dynamical event clustering algorithm based on text similarity and combined it with the supervised logistical regression model, which makes the event clustering accuracy reach 93.56%.

The research facilitates the formation of a fast and transparent emergency reporting mechanism that transmits information transmit in a timely manner. The practical application verifies that emergency event detection through the proposed method is effective and has great significance for efficient emergency disposal.

We plan to further refine the emergency event detection framework in a number of directions. First, some emergency events that occur in the same location within a short period have chain relationships, such as rainstorms and landslides, and fires and explosions. These event chains should be identified and analysed as one topic. Second, more text information, such as the earthquake magnitude, explosive material, and casualty, should be extracted to judge the severity of emergency events and then push them to different people. Third, more social media data could be added to our framework.