Skip to main content

and
  1. No Access

    Chapter

    Introduction

    In recent years, with the development of the information age, the amount of data has grown dramatically. At the same time, dirty data have already existed in various types of databases. Due to the negative imp...

    Zhixin Qi, Hongzhi Wang, Zejiao Dong in Dirty Data Processing for Machine Learning (2024)

  2. No Access

    Chapter

    Dirty Data Impacts on Regression Models

    Due to the negative influence of dirty data on the accuracy of regression models, the relation between the data quality and model results is able to be used in the selection of proper regression models and dir...

    Zhixin Qi, Hongzhi Wang, Zejiao Dong in Dirty Data Processing for Machine Learning (2024)

  3. No Access

    Book

  4. No Access

    Chapter and Conference Paper

    ANSWER: Automatic Index Selector for Knowledge Graphs

    Efficient access to knowledge graphs is identified as the basic premise to make full use of knowledge graphs. Since the query processing efficiency is mainly affected by index configuration, it is necessary to...

    Zhixin Qi, Haoran Zhang, Hongzhi Wang, Zemin Chao in Web and Big Data (2024)

  5. No Access

    Chapter

    Density-Based Clustering for Incomplete Data

    In real world, missing values exist in a lot of data sets and cause data incompleteness. However, traditional missing value imputation methods are not suitable for density-based clustering and affect the accur...

    Zhixin Qi, Hongzhi Wang, Zejiao Dong in Dirty Data Processing for Machine Learning (2024)

  6. No Access

    Chapter

    Cost-Sensitive Decision Tree Induction on Dirty Data

    As the rapid growth of data in our society, dirty data are increasingly common. In the process of cost-sensitive decision tree induction, dirty data in training data sets have negative impacts on the selection...

    Zhixin Qi, Hongzhi Wang, Zejiao Dong in Dirty Data Processing for Machine Learning (2024)

  7. No Access

    Chapter

    Impacts of Dirty Data on Classification and Clustering Models

    Since dirty data have negative influence on the accuracy of machine learning models, the relation between data quality and model results could be used in the selection of the proper model and data cleaning str...

    Zhixin Qi, Hongzhi Wang, Zejiao Dong in Dirty Data Processing for Machine Learning (2024)

  8. No Access

    Chapter

    Incomplete Data Classification with View-Based Decision Tree

    Missing values bring negative influence in data analyses and decrease the accuracy of machine learning models. Since traditional classification methods are only able to be adopted on complete data sets, this c...

    Zhixin Qi, Hongzhi Wang, Zejiao Dong in Dirty Data Processing for Machine Learning (2024)

  9. No Access

    Chapter

    Feature Selection on Inconsistent Data

    With the explosive growth of data size, inconsistent data appear more frequently. Due to inconsistent data detection and repairing in data preprocessing, feature selection approaches are lack of efficiency. To...

    Zhixin Qi, Hongzhi Wang, Zejiao Dong in Dirty Data Processing for Machine Learning (2024)

  10. No Access

    Chapter and Conference Paper

    Multi-SQL: An Automatic Multi-model Data Management System

    Nowadays, data in applications become diverse and large in scale. In order to meet the increasing demand for multi-model data management, multi-model databases have evolved into huge systems with many knobs. H...

    Yu Yan, Hongzhi Wang, Yutong Wang, Zhixin Qi, Jian Ma, Chang Liu in Web and Big Data (2023)

  11. No Access

    Chapter and Conference Paper

    Dirty-Data Impacts on Regression Models: An Experimental Evaluation

    Data quality issues have attracted widespread attentions due to the negative impacts of dirty data on regression model results. The relationship between data quality and the accuracy of results could be applie...

    Zhixin Qi, Hongzhi Wang in Database Systems for Advanced Applications (2021)

  12. No Access

    Article

    TAILOR: time-aware facility location recommendation based on massive trajectories

    In traditional facility location recommendations, the objective is to select the best locations which maximize the coverage or convenience of users. However, since users’ behavioral habits are often influenced...

    Zhixin Qi, Hongzhi Wang, Tao He, Chunnan Wang in Knowledge and Information Systems (2020)

  13. No Access

    Article

    A survey of query result diversification

    Nowadays, in information systems such as web search engines and databases, diversity is becoming increasingly essential and getting more and more attention for improving users’ satisfaction. In this sense, que...

    Kai** Zheng, Hongzhi Wang, Zhixin Qi, Jianzhong Li in Knowledge and Information Systems (2017)

  14. No Access

    Chapter and Conference Paper

    Capture Missing Values with Inference on Knowledge Base

    Data imputation is a basic step for data cleaning. Traditional data imputation approaches are lack of accuracy in the absence of knowledge. Involving knowledge base in imputation could overcome this shortcomin...

    Zhixin Qi, Hongzhi Wang, Fanshan Meng in Database Systems for Advanced Applications (2017)