Hi!
I'm a data science practitioner with more than 5 years of industry experience in software development. My current interests are bayesian statistics, natural language processing, data visualization and unsupervised machine learning. I'm a curious and hands-on type of person, and I love creating and building things. Take a look around, and if you see anything you like and/or you have some constructive feedback to share, feel free to connect on Twitter or LinkedIn.
Project Repositories
Analysis of Pytorch and Tensorflow Github Projects using NLP
- Linguistic analysis of open issues based on Tensorflow and Pytorch GitHub repositories. Deep dive into "memory leaks" using word embeddings and dimensionality reduction.
Predicting Flight Cancellations
- Ingestion, preprocessing and analysis of Bureau of Transportation flight performance dataset. Dataset includes all US flights since 2016.
Data pipeline with Apache Beam
- Ingested data is enriched with external location and timezone data, and data fixes are applied when necessary. Then it is uploaded to a new table in BigQuery using Apache Beam framework. Job is deployed using GCP Dataflow service.
Time Series Sales Forecasting
- Univariate multi-step time series forecasting using ARIMA and multi-variate forecast using machine learning.
NCHS Data Retriever
- Java utility app that retrieves birth data in fixed length file format files and writes to BigQuery in batch.
A learning experience is one of those things that says,
"You know that thing you just did? Don't do that."
― Douglas Adams, The Salmon of Doubt