Hi!

I'm a data science practitioner with more than 5 years of industry experience in software development. My current interests are bayesian statistics, natural language processing, data visualization and unsupervised machine learning. I'm a curious and hands-on type of person, and I love creating and building things. Take a look around, and if you see anything you like and/or you have some constructive feedback to share, feel free to connect on Twitter or LinkedIn.

Project Repositories

Analysis of Pytorch and Tensorflow Github Projects using NLP

  • Linguistic analysis of open issues based on Tensorflow and Pytorch GitHub repositories. Deep dive into "memory leaks" using word embeddings and dimensionality reduction.

Predicting Flight Cancellations

  • Ingestion, preprocessing and analysis of Bureau of Transportation flight performance dataset. Dataset includes all US flights since 2016.

Data pipeline with Apache Beam

  • Ingested data is enriched with external location and timezone data, and data fixes are applied when necessary. Then it is uploaded to a new table in BigQuery using Apache Beam framework. Job is deployed using GCP Dataflow service.

Time Series Sales Forecasting

  • Univariate multi-step time series forecasting using ARIMA and multi-variate forecast using machine learning.

NCHS Data Retriever

  • Java utility app that retrieves birth data in fixed length file format files and writes to BigQuery in batch.

A learning experience is one of those things that says, 
                "You know that thing you just did? Don't do that."Douglas Adams, The Salmon of Doubt