Here, you can find an overview of personal data-driven projects that I have worked on for the last few years. You also can find all projects on Github.

Open Source

Although all projects are open sources, these are typically actively maintained by me and regularly used by the community. They are meant as a way to give back to the community with, hopefully, helpful packages.

BERTopic Permalink

BERTopic is a novel topic modeling technique that leverages BERT embeddings and c-TF-IDF to create dense clusters allowing for easily interpretable topics.

PolyFuzz Permalink

PolyFuzz performs fuzzy string matching, string grouping, and contains extensive evaluation functions. PolyFuzz is meant to bring fuzzy string matching techniques together within a single framework.

KeyBERT Permalink

KeyBERT is a minimal and easy-to-use keyword extraction technique that leverages BERT embeddings to create keywords and keyphrases that are most similar to a document.

Concept Permalink

Concept introduces the concept of Concept Modeling. It takes inspiration from topic modeling techniques to cluster images, find commonalities (i.e. concepts) and create a multimodal representation.

SoAn Permalink

Created a package that allows in-depth analyses (sentiment analysis, topic modelling, etc.) on whatsapp conversations.

ReinLife Permalink

Using Reinforcement Learning, entities learn to survive, reproduce, and make sure to maximize the fitness of their kin.

c-TF-IDF Permalink

c-TF-IDF is a class-based TF-IDF procedure that can be used to generate features from textual documents based on the class they are in.

VLAC Permalink

Leveraging clusters of word embeddings to create features from a collection of documents allowing for classification of documents.


A small selection of projects and analyses I have done in the past to further develop my Data Science skills.

Reviewer Permalink

A package for scraping user reviews from IMDB, generate C-TF-IDF based word clouds, and extract popular characters from reviews.

Disney Permalink

Tournament brackets are generated based on a seed score calculated through scraping data from IMDB and RottenTomatoes.

Hurdle Model Permalink

Used Apple Store data to analyze which business model aspects (entry timing and technological innovation) influence performance of mobile games.

Boardgame Exploration Permalink

Created an application for exploring board game matches that I tracked over the last year. Streamlit and Heroku were used for deployment.