I analyzed over 3 gigabytes of email (my own)

Posted on Fri 02 October 2020 in programming • Tagged with python, data

How often do you get excited when your phone vibrates and an email notification shows up? If you're like me, not very. Much like how I only seem to receive spam calls on my phone, almost all of the emails I receive on a daily basis are not important to …


Continue reading

Project Euler Problem 22

Posted on Thu 12 December 2019 in programming • Tagged with python

This is a short post, showcasing my solution to problem 22 on Project Euler. I worked on this problem over lunch with a few co-workers, and they found the solution useful. If you find this article, I hope it is useful to you as well.

The prompt for this exercise …


Continue reading

Breaking my Advent of Code 2019, day 1 solution

Posted on Wed 04 December 2019 in programming • Tagged with python

The Advent of Code 2019 challenge is open. These are my solutions to the problems presented on day 1.

Part one

Asks us to figure out how much fuel is required to lift a given spacecraft module if we know its' mass.

import math
import sys

def calculate_fuel_cost(mass):
    return …

Continue reading

Using a database | large data

Posted on Tue 05 November 2019 in programming • Tagged with python, data

This is part of a series of articles on how to work with large data

The first method I want to explore when working with large data, since we cannot use RAM, is to use a database for our dataset. Databases are a staple of data processing and analysis, and …


Continue reading

How to work with large data

Posted on Mon 04 November 2019 in programming • Tagged with python, data

Here's a question: how do you analyze data that is too big to fit into memory?

Those of you that have worked with pandas before are probably intimately familiar with the following syntax, which reads csv file into a DataFrame variable

import pandas as pd
df = pd.read_csv('dataset.csv …

Continue reading

Heroes of the Storm Game Analysis

Posted on Tue 20 November 2018 in programming • Tagged with python, data

The HOTS Logs website (https://hotslogs.com) has an API with a data dump of the past 30 days' worth of replays.

https://www.hotslogs.com/Info/API

It's an automated version of this reddit post:

That being said, this information is incredibly awesome! What kind of information can be gleaned from it?

Pets of Seattle

Posted on Thu 04 October 2018 in programming • Tagged with python, data

Seattle has a reputation for being a pet friendly city. By some estimates, there are more dogs in the city than there are children, an impressive feat for a place as populous as Seattle. Seattle's open data portal contains, among other things, information on licensed pets.

Awesome!

Let's explore the kinds of insights that can be found by looking at this data.


Continue reading

Pin your versions. Or don't.

Posted on Mon 25 June 2018 in programming • Tagged with python

Here's a hypothetical situation. Suppose you're starting a new project and you want to use the requests library. Installing requests also installs the libraries that it depends on. Should you be explicit and pin the versions of all installed libraries (equivalent to the output of pip freeze)

certifi==2018.4 …

Continue reading

Supercharge your jupyter startup with bash and tmux

Posted on Fri 08 June 2018 in programming • Tagged with python, shell

If you're like me, you dislike doing repetitive tasks when a script could have just as easily done the work for you. For example, starting a jupyter server on my local machine requires me to do the following:

  1. open a terminal
  2. source the virtual environment I use for jupyter work …

Continue reading

Making method chains readable

Posted on Sun 06 May 2018 in programming • Tagged with python

I do a fair amount of data manipulation work in pandas and as such, I find myself doing a lot of method chaining. In the past I've struggled to find a good way of keeping my code concise while still maintaining readability.

What do I mean by that? Suppose we have census data on a group of people.


Continue reading