Preview

Herald of the Kazakh-British Technical University

Advanced search

COMPARING BIG DATA ANALYTIC TOOLS USING MUSIC DATASET

Abstract

A huge repository of petabytes of data is generated each day from modern information systems and digital technologies such as scientific data analysis, social media data mining, recommendation systems, and analysis on web service logs.The data has a huge power to directly guide us to knowledge detection. Big data in turn requires whole new approach and tools to handle it. Analysing these massive data requires a lot of efforts to extract knowledge for decision making. Huge volumes of data and its unstructured nature raise new challenges and issues regarding its management and processing. This paper covers some of the most popular tools for analyzing big data. Hadoop, Spark and Pig are major and modern tools in big data analytics. Thus and so these tools were chosen for comparison. Results of this research show that various tasks require different tools and there is no all-in-one solution. Any big data problems stand in need developers to use proper tool to make job done in a way better and quicker.

About the Authors

R. I. Bektemirov
Университет им. Сулеймана Демиреля
Kazakhstan


U. T. Nurkey
Университет им. Сулеймана Демиреля
Kazakhstan


References

1. Agneeswaran V. S., Tonpay P., Tiwary J. (2013) Paradigms for realizing machine learning algorithms. Big Data 1 (4) : 207-214

2. https://www.kaggle.com/

3. Lee K.-H., Lee Y.-J., Choi H., Chung Y. D., Moon B. (2012) Parallel data processing with MapReduce: a survey. ACM SIGMOD Record 40 (4) : 11-20

4. Big Data Analysis: Comparison of Hadoop MapReduce, Pig and Hive. Available from: https://www.researchgate.net/publication/308074477_Big_Data_Analysis_Comparision_of_Hadoop_MapReduce_Pig_and_Hive

5. MapReduce vs. Pig vs. Hive - Comparison between the key tools of Hadoop, Available article from: https://www.dezyre.com/article/mapreduce-vs-pig-vs-hive/163

6. Dilpreet Singh and Chandan K. Reddy, “A Survey on Platforms for Big Data Analytics”, Journal of Big Data, 1:1, 8, 2014.

7. https://www.scnsoft.com/blog/spark-vs-hadoop-mapreduce

8. https://dzone.com/articles/hadoop-vs-spark-a-head-to-head-comparison

9. https://www.todaysoftmag.com/article/1553/finding-similar-entities-in-bigdata-models

10. https://neo4j.com/docs/graph-algorithms/current/algorithms/similarity-jaccard/

11. Szmit R. (2013) Locality Sensitive Hashing for Similarity Search Using MapReduce on Large Scale Data. In: Klopotek M. A., Koronacki J., Marciniak M., Mykowiecka A., Wierzchon S. T. (eds) Language Processing and Intelligent Information Systems. IIS 2013. Lecture Notes in Computer Science, vol. 7912. Springer, Berlin, Heidelberg

12. C. Sadowski and G. Levin. Simhash: Hash-based Similarity Detection. Technical report, Technical report, Google, 2007.

13. Tom Kenter , Maarten de Rijke, Short Text Similarity with Word Embeddings, Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, October 18-23, 2015, Melbourne, Australia


Review

For citations:


Bektemirov R.I., Nurkey U.T. COMPARING BIG DATA ANALYTIC TOOLS USING MUSIC DATASET. Herald of the Kazakh-British Technical University. 2019;16(4):97-104.

Views: 356


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1998-6688 (Print)
ISSN 2959-8109 (Online)