Visualizing Twitter stream data near-realtime with BigQuery and Data Studio

Consuming Twitter stream has been a playground for me to learn new technologies and gain practice with them. I built the first version 5 years ago with web sockets, Redis and D3.js. It ran in AWS. Then I built this version. I think there might be another version in future where I utilize the content of tweets somehow.

BigQuery and Data Studio makes it painless and straightforward to visualize near-realtime Twitter stream data. The pipeline consists of 4 components:

  • Gitlab CI/CD: Builds the docker image, pushes it to container registries, then deploys to Google Kubernetes Engine.
  • Java app: Consumes tweets from Twitter's public stream and inserts them in BigQuery. The app runs as Kubernetes deployment.
  • BigQuery: Created a table with a schema with gcloud client. shell bq mk --table project-id:dataset.table schema.json. This table stores the raw data from Twitter. Then I created a view to support the visualization for the dashboard. The view essentially retrieves all tweets that has CEO name as hashtag, then ranks each hashtag and tweet by the number of followers each user has. Then this is used in DataStudio to create the dashboard.
  • Data Studio: Populates viz objects with BigQuery data. I didn't spend a lot of time building custom queries or improve the visualization, but even then, I can see the popular hashtags related to the people I'm filtering in. I also can drill down to each hashtag and see the tweets that use that hashtag. I used TreeMap to group hashtags related to each CEO, then used color hue to signify the popularity of each hashtag.

Dashboard Snapshot