• +43 660 1453541
  • contact@germaniumhq.com

Errata on Better Closures


Errata on Better Closures

This article is an errata to the previous article I just wrote. Why? It turns out I was dead wrong in stating that ordinary objects would be better than nested functions. How do I know? Benchmarks. Let’s see them now.

After I published the article, I got a gnawing feeling. At the end of the day, the creation of nested functions is the same as creating other objects by hand. It’s either me, that manually creates the context, via the InnerCall object, or Python, that does it via the frame reference. Regardless, a new object gets created. Who am I to say that the stack frame reference is lighter or heavier than an ordinary object allocation? Intuitively it feels so, and the code is cleaner, but is it correct?

Furthermore, I had previously created extra state objects via the nested functions, since they indeed create a context, and via some additional context that the state machine generator created. Of course, if I generate fewer objects, I get better performance.

So let’s see what’s goes faster. I’ve used the same memory profiling as we mentioned before in Fighting the Invisible Memory Thief. So without further ado, here’s the result of running ten million calls on both approaches. The code to call them is the same:

for f in range(10000000):
    f = some_function("abc")
    z = f("xyz")

For the simple nested functions:

def some_function(name):
    prefixed_name = "pre-" + name

    def inner_call(postfix):
        # this creates a hidden reference to the `some_function` stack frame
        return prefixed_name + "-" + postfix

    return inner_call

Results:

Top 10 lines
#1: benchmark.py:37: 0.1 KiB
    def inner_call(postfix):
#2: benchmark.py:34: 0.1 KiB
    def some_function(name):
#3: benchmark.py:39: 0.1 KiB
    return prefixed_name + "-" + postfix
#4: benchmark.py:35: 0.1 KiB
    prefixed_name = "pre-" + name
#5: benchmark.py:45: 0.0 KiB
    f = some_function("abc")
Total allocated size: 0.4 KiB

real    0m16,396s
user    0m16,387s
sys     0m0,008s

For the object context:

class InnerCall:
    def __init__(self, prefixed_name):
        self.prefixed_name = prefixed_name

    def __call__(self, postfix):
        return self.prefixed_name + "-" + postfix


def some_function(name):
    prefixed_name = "pre-" + name

    return InnerCall(prefixed_name)

Results:

Top 10 lines
#1: benchmark-class.py:34: 1.8 KiB
    class InnerCall:
#2: benchmark-class.py:42: 0.2 KiB
    def some_function(name):
#3: benchmark-class.py:38: 0.1 KiB
    def __call__(self, postfix):
#4: benchmark-class.py:35: 0.1 KiB
    def __init__(self, prefixed_name):
#5: benchmark-class.py:39: 0.1 KiB
    return self.prefixed_name + "-" + postfix
#6: benchmark-class.py:43: 0.1 KiB
    prefixed_name = "pre-" + name
#7: benchmark-class.py:45: 0.0 KiB
    return InnerCall(prefixed_name)
#8: benchmark-class.py:36: 0.0 KiB
    self.prefixed_name = prefixed_name
Total allocated size: 2.4 KiB

real    0m17,536s
user    0m17,526s
sys     0m0,004s

Wow! About a 10% difference in speed. It’s not much, but it’s significant, and except code readability, you don’t gain anything in having your object context separate.

Conclusions

  1. Use nested functions instead of passing the context manually into an external object, if performance is critical.

  2. Stay away from passing callbacks that depend on the current context. They don’t look like new variable allocations, but they are, and they hit hard. (this we already mentioned in Fighting the Invisible Memory Thief)