For loop and list comprehension performance

Posted on Fri 27 April 2018 in programming

For loops are the de facto default looping mechanism for python and is a concept that almost all programmers are familiar with. However, if you've been around the python block a few times you've probably seen (and maybe dabbled in) list comprehensions from time to time.

The decision to use one or the other should be a case-by-case basis. In my own programming, I tend to use a list comprehension when building a result (list) from another result, and a for loop for anything else. Because of my curious nature, however, I began to wonder what the difference was, in terms of speed, between the two approaches.

Setting the stage

I'm going to use the following trivial snippets of code when running the comparisons:

for loop

mylist = []
for i in range(n):
    mylist.append(n)

list comprehension

mylist = [i for i in range(n)]

For varying values of n.

timeit

The easiest way to measure the runtime of code snippets is by using the timeit module. I'll be using ipython's built-in magic command for this.

To create a list of one-hundred thousand elements, it takes ~15 milliseconds in a for loop

In [1]: %%timeit
   ...: mylist = []
   ...: for i in range(100000):
   ...:     mylist.append(i)
   ...:
14.9 ms ± 206 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

but only ~6 milliseconds to do the same thing with a list comprehension.

In [2]: %%timeit
   ...: mylist = [i for i in range(100000)]
   ...:
6.09 ms ± 53.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In this example, the list comprehension is over 2.4x faster than the for loop equivalent.

If I instead iterate over one-thousand elements, the results are similar.

In [10]: %%timeit
    ...: mylist = []
    ...: for i in range(1000):
    ...:     mylist.append(i)
    ...:
123 µs ± 568 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [12]: %%timeit
    ...: mylist = [i for i in range(1000)]
    ...:
53.5 µs ± 709 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

A 2.3x speed improvement when using a list comprehension.

For two pieces of code that produce the same result, it seems surprising that there is such a performance difference. There must be something going on under the hood that makes list comprehensions so much more performant.

dis

Python has a really cool built-in module named dis (short for disassemble). We can use it to disassemble pieces of code into their opcode equivalent to see what each line of code does.

Running this code

import dis

dis.dis("""\
mylist = []
for i in range(1000):
    mylist.append(i)""")

gives as output:

  1           0 BUILD_LIST               0
              3 STORE_NAME               0 (mylist)

  2           6 SETUP_LOOP              33 (to 42)
              9 LOAD_NAME                1 (range)
             12 LOAD_CONST               0 (1000)
             15 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             18 GET_ITER
        >>   19 FOR_ITER                19 (to 41)
             22 STORE_NAME               2 (i)

  3          25 LOAD_NAME                0 (mylist)
             28 LOAD_ATTR                3 (append)
             31 LOAD_NAME                2 (i)
             34 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             37 POP_TOP
             38 JUMP_ABSOLUTE           19
        >>   41 POP_BLOCK
        >>   42 LOAD_CONST               1 (None)
             45 RETURN_VALUE

The left-most set of numbers (1, 2, and 3) represent the particular line being disassembled, while the second set of numbers (0, 3, 6, 9, 12, 15, etc...) represent the offset that the given opcode is from the start of the bytecode sequence.

Compare that to disassembling a list comprehension

import dis

dis.dis("mylist = [i for i in range(1000)]")

  1           0 LOAD_CONST               0 (<code object <listcomp> at 0x7f4d22d959c0, file "<dis>", line 1>)
              3 LOAD_CONST               1 ('<listcomp>')
              6 MAKE_FUNCTION            0
              9 LOAD_NAME                0 (range)
             12 LOAD_CONST               2 (1000)
             15 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             18 GET_ITER
             19 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             22 STORE_NAME               1 (mylist)
             25 LOAD_CONST               3 (None)
             28 RETURN_VALUE

To me, the significant difference is that the list comprehension doesn't make calls to append, like the for loop does - that means using a list comprehension uses less instructions and doesn't have the overhead of an extra function call on every iteration. Neat!