Projects
Here, you can find an overview of personal data-driven projects that I have worked on
for the last few years. You also can find all projects on Github.
Open Source
Although all projects are open sources, these are typically actively maintained by me and regularly used by the community.
They are meant as a way to give back to the community with, hopefully, helpful packages.
PolyFuzz performs fuzzy string matching, string grouping, and contains extensive evaluation functions. PolyFuzz is meant to bring fuzzy string matching techniques together within a single framework.
KeyBERT is a minimal and easy-to-use keyword extraction technique that leverages BERT embeddings to create keywords and keyphrases that are most similar to a document.
Concept introduces the concept of Concept Modeling. It takes inspiration from topic modeling techniques to cluster images, find commonalities (i.e. concepts) and create a multimodal representation.
Created a package that allows in-depth analyses (sentiment analysis, topic modelling, etc.) on whatsapp conversations.
c-TF-IDF is a class-based TF-IDF procedure that can be used to generate features from textual documents based on the class they are in.
Leveraging clusters of word embeddings to create features from a collection of documents allowing for classification of documents.
Analyses
A small selection of projects and analyses I have done in the past to further develop my Data Science skills.
Tournament brackets are generated based on a seed score calculated through scraping data from IMDB and RottenTomatoes.
Used Apple Store data to analyze which business model aspects (entry timing and technological innovation) influence performance of mobile games.
Created an application for exploring board game matches that I tracked over the last year. Streamlit and Heroku were used for deployment.