Search
Search Results
-
A Performance Comparison of Clustering Algorithms for Big Data on DataMPI
Clustering algorithms for big data have important applications in finance. DataMPI is a communication library based on key-value pairs that extends... -
Investigating the performance of Hadoop and Spark platforms on machine learning algorithms
One of the most challenging issues in the big data research area is the inability to process a large volume of information in a reasonable time....
-
MapReduce scheduling algorithms in Hadoop: a systematic study
Hadoop is a framework for storing and processing huge volumes of data on clusters. It uses Hadoop Distributed File System (HDFS) for storing data and...
-
Interference-aware co-scheduling method based on classification of application characteristics from hardware performance counter using data mining
Computational scientists and engineers who are eager to obtain the best performance of scientific applications need efficient application...
-
MapReduce: an infrastructure review and research insights
In the current decade, doing the search on massive data to find “hidden” and valuable information within it is growing. This search can result in...
-
xCCL: A Survey of Industry-Led Collective Communication Libraries for Deep Learning
Machine learning techniques have become ubiquitous both in industry and academic applications. Increasing model sizes and training data volumes...
-
CirroData: Yet Another SQL-on-Hadoop Data Analytics Engine with High Performance
This paper presents CirroData, a high-performance SQL-on-Hadoop system designed for Big Data analytics workloads. As a home-grown enterprise-level...
-
Big Data and HPC Convergence: The Cutting Edge and Outlook
The data growth over the last couple of decades increases on a massive scale. As the volume of the data increases so are the challenges associated... -
Combining Hadoop with MPI to Solve Metagenomics Problems that are both Data- and Compute-intensive
Metagenomics, the study of all microbial species cohabitants in an environment, often produces large amount of sequence data varying from several GBs...
-
Accelerating Iterative Big Data Computing Through MPI
Current popular systems, Hadoop and Spark, cannot achieve satisfied performance because of the inefficient overlap** of computation and...
-
Performance Benefits of DataMPI: A Case Study with BigDataBench
Apache Hadoop and Spark are gaining prominence in Big Data processing and analytics. Both of them are widely deployed in Internet companies. On the... -
Feasibility analysis of AsterixDB and Spark streaming with Cassandra for stream-based processing
For getting up-to-date insight into online services, extracted data has to be processed in near real time. For example, major big data companies...