Data Mining
20th Australasian Conference, AusDM 2022, Western Sydney, Australia, December 12–15, 2022, Proceedings
Chapter and Conference Paper
A manifold distributed dataset with limited labels makes it difficult to train a high-mean accuracy classifier. Transfer learning is beneficial in such circumstances. For transfer learning to succeed, the targ...
Chapter and Conference Paper
Blockmodelling is the process of determining community structure in a graph. Real graphs contain noise and so it is up to the blockmodelling method to allow for this noise and reconstruct the most likely role ...
Book and Conference Proceedings
20th Australasian Conference, AusDM 2022, Western Sydney, Australia, December 12–15, 2022, Proceedings
Chapter and Conference Paper
Labelling unlabeled data is a time-consuming and expensive process. Labelling initiatives should select samples that are likely to enhance the classification accuracy of the classifier. Several methods can be ...
Chapter and Conference Paper
Before constructing a classifier, we should examine the data to gain an understanding of the relationships between the variables, to assist with the design of the classifier. Using multi-label data requires us...
Chapter and Conference Paper
In multi-label classification, a large number of evaluation metrics exist, for example Hamming loss, exact match, and Jaccard similarity – but there are many more. In fact, there remains an apparent uncertaint...
Chapter and Conference Paper
When a Document Retrieval system receives a query, a Relevance model is used to provide a score to each document based on its relevance to the query. Relevance models have parameters that should be tuned to op...
Chapter and Conference Paper
Automatic hashtag segmentation is used when analysing twitter data, to associate hashtag terms to those used in common language. The most common form of hashtag segmentation uses a dictionary with a probabilit...
Chapter and Conference Paper
Multi-label classifiers allow us to predict the state of a set of responses using a single model. A multi-label model is able to make use of the correlation between the labels to potentially increase the accur...
Chapter and Conference Paper
When examining the robustness of systems that take ranked lists as input, we can induce noise, measured in terms of Kendall’s tau rank correlation, by applying a set number of random adjacent transpositions. T...
Chapter and Conference Paper
Outlier detection is an important process for text document collections, but as the collection grows, the detection process becomes a computationally expensive task. Random projection has shown to provide a go...
Chapter and Conference Paper
In this paper, we present a supervised framework for extracting blood vessels from retinal images. The local standardisation of the green channel of the retinal image and the Gabor filter responses at four dif...
Chapter and Conference Paper
Document clustering involves repetitive scanning of a document set, therefore as the size of the set increases, the time required for the clustering task increases and may even become impossible due to computa...
Article
Search effectiveness metrics are used to evaluate the quality of the answer lists returned by search services, usually based on a set of relevance judgments. One plausible way of calculating an effectiveness s...
Chapter and Conference Paper
Spectral co-clustering is a generic method of computing co-clusters of relational data, such as sets of documents and their terms. Latent semantic analysis is a method of document and term smoothing that can a...
Chapter and Conference Paper
Data clustering is a difficult and challenging task, especially when the hidden clusters are of different shapes and non-linearly separable in the input space. This paper addresses this problem by proposing a ...
Chapter and Conference Paper
Probabilistic latent semantic analysis (PLSA) is a method of calculating term relationships within a document set using term frequencies. It is well known within the information retrieval community that raw te...
Chapter and Conference Paper
It has been shown that the use of topic models for Information retrieval provides an increase in precision when used in the appropriate form. Latent Dirichlet Allocation (LDA) is a generative topic model that ...
Chapter and Conference Paper
Web page prefetching has shown to provide reduction in Web access latency, but is highly dependent on the accuracy of the Web page prediction method. Conditional Random Fields (CRFs) with Error Correcting Outp...
Chapter and Conference Paper
Reducing the Web access latency perceived by a Web user has become a problem of interest. Web prefetching and caching are two effective techniques that can be used together to reduce the access latency problem...