I analyzed over 3 gigabytes of email (my own)

Posted on Fri 02 October 2020 in programming • Tagged with python, data

How often do you get excited when your phone vibrates and an email notification shows up? If you're like me, not very. Much like how I only seem to receive spam calls on my phone, almost all of the emails I receive on a daily basis are not important to …


Continue reading

Using a database | large data

Posted on Tue 05 November 2019 in programming • Tagged with python, data

This is part of a series of articles on how to work with large data

The first method I want to explore when working with large data, since we cannot use RAM, is to use a database for our dataset. Databases are a staple of data processing and analysis, and …


Continue reading

How to work with large data

Posted on Mon 04 November 2019 in programming • Tagged with python, data

Here's a question: how do you analyze data that is too big to fit into memory?

Those of you that have worked with pandas before are probably intimately familiar with the following syntax, which reads csv file into a DataFrame variable

import pandas as pd
df = pd.read_csv('dataset.csv …

Continue reading

Heroes of the Storm Game Analysis

Posted on Tue 20 November 2018 in programming • Tagged with python, data

The HOTS Logs website (https://hotslogs.com) has an API with a data dump of the past 30 days' worth of replays.

https://www.hotslogs.com/Info/API

It's an automated version of this reddit post:

That being said, this information is incredibly awesome! What kind of information can be gleaned from it?

Pets of Seattle

Posted on Thu 04 October 2018 in programming • Tagged with python, data

Seattle has a reputation for being a pet friendly city. By some estimates, there are more dogs in the city than there are children, an impressive feat for a place as populous as Seattle. Seattle's open data portal contains, among other things, information on licensed pets.

Awesome!

Let's explore the kinds of insights that can be found by looking at this data.


Continue reading