A data-driven approach to analyzing athletics peak performance

My passion for running and track and field has always tried to accompany me during the different phases of my life. I used to run for fun when I was a child and to compete in school races during my high-school years. It was only during my college years that running took a pause from my life while my curiosity about data science progressively took … Continue reading A data-driven approach to analyzing athletics peak performance

MovieSearch: a smart movie search engine

MovieSearch is a content specific search engine with the aim to retrieve movie information given the contents of a user’s query. The search engine relies on the OkapiBM25 algorithm and takes into consideration the text present in the overview, the title, the names of the cast, and the production companies of each movie. The backend has been developed with the framework Django while the front-end … Continue reading MovieSearch: a smart movie search engine

Analysis of parallel version of PageRank algorithm

In this post, we are going to analyze a simplified parallel version of the famous algorithm PageRank, the algorithm used by Google Search to rank web pages in their search engine results. The code of the PageRank algorithm has been written in C++ exploiting the library OpenMP to parallelize the code. Finally, it has been tested over a different number of threads as well as different scheduling policies in order … Continue reading Analysis of parallel version of PageRank algorithm

Visualization and analysis of web engine data

Data preprocessing and visualization are important skills of managing Internet services such as search engines and online social networks. In this post, we are going deal with two weeks of search logs from a large search engine and learn some practical techniques to analyze the data. Background and data Once you submit a query to a search engine, the search engine will log some related … Continue reading Visualization and analysis of web engine data