COMPARING BIG DATA ANALYTIC TOOLS USING MUSIC DATASET
Abstract
A huge repository of petabytes of data is generated each day from modern information systems and digital technologies such as scientific data analysis, social media data mining, recommendation systems, and analysis on web service logs.The data has a huge power to directly guide us to knowledge detection. Big data in turn requires whole new approach and tools to handle it. Analysing these massive data requires a lot of efforts to extract knowledge for decision making. Huge volumes of data and its unstructured nature raise new challenges and issues regarding its management and processing. This paper covers some of the most popular tools for analyzing big data. Hadoop, Spark and Pig are major and modern tools in big data analytics. Thus and so these tools were chosen for comparison. Results of this research show that various tasks require different tools and there is no all-in-one solution. Any big data problems stand in need developers to use proper tool to make job done in a way better and quicker.
About the Authors
R. I. BektemirovKazakhstan
U. T. Nurkey
Kazakhstan
References
1. Agneeswaran V. S., Tonpay P., Tiwary J. (2013) Paradigms for realizing machine learning algorithms. Big Data 1 (4) : 207-214
2. https://www.kaggle.com/
3. Lee K.-H., Lee Y.-J., Choi H., Chung Y. D., Moon B. (2012) Parallel data processing with MapReduce: a survey. ACM SIGMOD Record 40 (4) : 11-20
4. Big Data Analysis: Comparison of Hadoop MapReduce, Pig and Hive. Available from: https://www.researchgate.net/publication/308074477_Big_Data_Analysis_Comparision_of_Hadoop_MapReduce_Pig_and_Hive
5. MapReduce vs. Pig vs. Hive - Comparison between the key tools of Hadoop, Available article from: https://www.dezyre.com/article/mapreduce-vs-pig-vs-hive/163
6. Dilpreet Singh and Chandan K. Reddy, “A Survey on Platforms for Big Data Analytics”, Journal of Big Data, 1:1, 8, 2014.
7. https://www.scnsoft.com/blog/spark-vs-hadoop-mapreduce
8. https://dzone.com/articles/hadoop-vs-spark-a-head-to-head-comparison
9. https://www.todaysoftmag.com/article/1553/finding-similar-entities-in-bigdata-models
10. https://neo4j.com/docs/graph-algorithms/current/algorithms/similarity-jaccard/
11. Szmit R. (2013) Locality Sensitive Hashing for Similarity Search Using MapReduce on Large Scale Data. In: Klopotek M. A., Koronacki J., Marciniak M., Mykowiecka A., Wierzchon S. T. (eds) Language Processing and Intelligent Information Systems. IIS 2013. Lecture Notes in Computer Science, vol. 7912. Springer, Berlin, Heidelberg
12. C. Sadowski and G. Levin. Simhash: Hash-based Similarity Detection. Technical report, Technical report, Google, 2007.
13. Tom Kenter , Maarten de Rijke, Short Text Similarity with Word Embeddings, Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, October 18-23, 2015, Melbourne, Australia
Review
For citations:
Bektemirov R.I., Nurkey U.T. COMPARING BIG DATA ANALYTIC TOOLS USING MUSIC DATASET. Herald of the Kazakh-British Technical University. 2019;16(4):97-104.