Preview

Herald of the Kazakh-British technical university

Advanced search

INTELLIGENT MODULE FOR «SMART» NEWS AGGREGATOR

https://doi.org/10.55452/1998-6688-2021-18-1-109-116

Abstract

Nowadays more and more people get information from online resources such as news portals, blogs, etc. With the development of Internet technologies, the volume of published information has grown so much that it has become difficult and long to find relevant and interesting information. News aggregators are a solution that allows the user to receive only fresh and relevant news from various sources. The content aggregator platform collects information from all over the web and publishes it in one place for visitors to access. This paper presents an intelligent news aggregator system that collects the latest news from different sources using an RSS / Atom feed and displays them in one platform. The news aggregator has an intelligent module that recommends similar news based on the news saved by users. In order to recommend similar news to users, the cosine similarity method is applied to news headlines, which measures the similarity of two vectors by calculating the cosine of the angle between the two vectors. Thus, the news headlines that have the highest cosine similarity value are recommended to users. The following natural language processing technologies are applied to the news headline: tokenization, removing unnecessary characters and punctuation, converting headlines to vectors using the TF-IDF method. In this paper, similarity measurements were compared for the most popular metrics, such as cosine similarity, Euclidean distance, and Jaccard distance. Comparison results are presented for news received via RSS / Atom resource feeds from the programming and business / marketing categories.

About the Author

N. Ibragim
Казахский Национальный университет имени аль-Фараби
Kazakhstan


References

1. Sudatta Chowdhury Monica Landoni. "News aggregator services: user expectations and experience" // Online Information Review.– 2006. – Т 30. –100-115 с.

2. William A. Hanff. News aggregator [Электронный ресурс].-URL: https://www.britannica.com/topic/news-aggregator

3. Агрегатор социальных сетей: материал из Википедии [Электронный ресурс].-URL: https://en.wikipedia.org/wiki/News_aggregator

4. Franziska Zimmer. An Evaluation of the Social News Aggregator Reddit // European Conference on Social Media. – 2018. – Лимерик, Ирландия.

5. Adrienne Erin. 10 social news aggregators to help you reach new audiences [Электронный ресурс].-URL: https://socialnomics.net/2015/01/08/10-social-news-aggregators-to-help-you-reach-new-audiences/

6. Alex Stolz, Martin Hepp. From RDF to RSS and Atom: Content Syndication with Linked Data // 24th ACM Conference on Hypertext and Social Media. – 1-3 Мая 2013. – Париж, Франция.

7. V. Srividhya, R. Anitha. Evaluating Preprocessing Techniques in Text Categorization // International Journal of Computer Science and Application Issue.-2010.

8. Dr. S. Vijayarani, MS. J. Ilamathi, Ms. Nithya. Preprocessing Techniques for Text Mining - An Overview // International Journalof Computer Science & Communication Networks. – Т 5(1). – 7-16 с.

9. Prasoon Singh. Fundamentals of Bag Of Words and TF-IDF [Электронный ресурс].-URL: https://medium.com/analytics-vidhya/fundamentals-of-bag-of-words-and-tf-idf-9846d301ff22

10. Korbinian Koch. A friendly introduction to text clustering [Электронный ресурс].-URL: https://towardsdatascience.com/a-friendly-introduction-to-text-clustering-fa996bcefd04

11. Tan Thongtan, Tanasanee Phienthrakul. Sentiment Classification using Document Embeddings trained with Cosine Similarity // Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop.-28 Июля-2 Августа 2019. – Флоренция, Италия. – 407-414 с.

12. Varun. Cosine similarity: How does it measure the similarity, Maths behind and usage in Python [Электронный ресурс].-URL: https://towardsdatascience.com/cosine-similarity-how-does-it-measure-the-similarity-maths-behind-and-usage-in-python-50ad30aad7db

13. Chris Emmery. Euclidean vs. Cosine Distance [Электронный ресурс].-URL: https://cmry.github.io/notes/euclidean-v-cosine#:~:text=Cosine%20similarity%20is%20generally%20used,data%20represented%20by%20word%20counts.

14. Shashank Gupta, Vasudeva Varma. Scientific Article Recommendation by using Distributed Representations of Text and Graph // International World Wide Web Conference Committee (IW3C2). – 2017.

15. Ziwon Hyung, Kibeom Lee, Kyogu Lee. Music recommendation using text analysis on song requests to radio stations // Music and Audio Research Group, Graduate School of Convergence Science and Technology, Seoul National University. – 2013. – Сеул, Корея.


Review

For citations:


Ibragim N. INTELLIGENT MODULE FOR «SMART» NEWS AGGREGATOR. Herald of the Kazakh-British technical university. 2021;18(1):109-116. (In Russ.) https://doi.org/10.55452/1998-6688-2021-18-1-109-116

Views: 325


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1998-6688 (Print)
ISSN 2959-8109 (Online)