-
Article
An integrated system for building enterprise taxonomies
Although considerable research has been conducted in the field of hierarchical text categorization, little has been done on automatically collecting labeled corpus for building hierarchical taxonomies. In this...
-
Article
A general approximation framework for direct optimization of information retrieval measures
Recently direct optimization of information retrieval (IR) measures has become a new trend in learning to rank. In this paper, we propose a general framework for direct optimization of IR measures, which enjoy...
-
Article
LETOR: A benchmark collection for research on learning to rank for information retrieval
LETOR is a benchmark collection for the research on learning to rank for information retrieval, released by Microsoft Research Asia. In this paper, we describe the details of the LETOR collection and show how ...
-
Article
Guest editorial: special issue on data mining with matrices, graphs and tensors
-
Article
Open AccessCorrecting evaluation bias of relational classifiers with network cross validation
Recently, a number of modeling techniques have been developed for data mining and machine learning in relational and network domains where the instances are not independent and identically distributed (i.i.d.)...
-
Article
Guest editorial: Integrated spatio-temporal analysis and data mining
-
Article
Comparison of trends in the quantity and variety of Science Citation Index (SCI) literature on human pathogens between China and the United States
The proportion of pathogenic microorganisms in the microbial world is relatively small, while their threat to human health, economic development and social stability is severe. The quantity and variation of Sc...
-
Article
Guest editorial: spatial and temporal databases
-
Article
Emerging infectious disease: trends in the literature on SARS and H7N9 influenza
Severe acute respiratory syndrome (SARS) and human infection H7N9 influenza are emerging infectious diseases having a relatively high mortality. Epidemics of each began in China. By searching through Science C...
-
Article
Decoding multi-click search behavior based on marginal utility
Query logs contain rich feedback information from users interacting with search engines. Therefore, various click models have been developed to interpret users’ search behavior and to extract useful knowledge ...
-
Article
Open AccessToward multi-label sentiment analysis: a transfer learning based approach
Sentiment analysis is recognized as one of the most important sub-areas in Natural Language Processing (NLP) research, where understanding implicit or explicit sentiments expressed in social media contents is ...
-
Article
Open AccessA survey of community detection methods in multilayer networks
Community detection is one of the most popular researches in a variety of complex systems, ranging from biology to sociology. In recent years, there’s an increasing focus on the rapid development of more compl...
-
Article
Open AccessFairness in graph-based semi-supervised learning
Machine learning is widely deployed in society, unleashing its power in a wide range of applications owing to the advent of big data. One emerging problem faced by machine learning is the discrimination from d...
-
Article
An in-depth study on adversarial learning-to-rank
In light of recent advances in adversarial learning, there has been strong and continuing interest in exploring how to perform adversarial learning-to-rank. The previous adversarial ranking methods [e.g., IRGA...
-
Article
Open AccessPrivacy-aware document retrieval with two-level inverted indexing
Previous work on privacy-aware ranking has addressed the minimization of information leakage when scoring top k documents, and has not studied on how to retrieve these top documents and their features for ranking...
-
Article
Open AccessEffective interpretable learning for large-scale categorical data
Large scale categorical datasets are ubiquitous in machine learning and the success of most deployed machine learning models rely on how effectively the features are engineered. For large-scale datasets, param...