Erlang vs. Unintentionally Purely Functional Python

Programming in the 21st Century - James Hague - September 16, 2010

Here’s a little Python function that should be easy to figure out, even if you don’t know Python:

def make_filename(path):
    return path.lower() + ".jpg"

I want to walk through what’s going on behind the scenes when this function executes. There is, of course, a whole layer of interpreting opcodes and pushing and popping parameters, but that’s just noise.




The first interesting part is that the lower() method creates an entirely new string. If path contains a hundred characters, then all hundred of those are copied to a new string in the process of being converted to lowercase.




The second point of note is that the append operation—the plus—is doing another copy. The entire lowercased string is moved to a new location, then the four character extension is tacked on to the end. The original path has now been copied twice.




Those previous two paragraphs gloss over some key details. Where is the memory for the new strings coming from? It’s returned by a call to the Python memory allocator. As with all generic heap management functions, the execution time of the Python memory allocator is difficult to predict. There are various checks and potential fast paths and manipulations of linked lists. In the worst case, the code falls through into C’s malloc and the party continues there. Remember, too, that objects in Python have headers, which include reference counts, so there’s more overhead that I’ve ignored.




Also, the result of lower gets thrown out after the subsequent concatenation, so I could peek into “release memory” routine and see what’s going on down in that neck of the woods, but I’d rather not. Just realize there’s a lot of work going on inside the simple make_filename function, even if the end result still manages to be surprisingly fast.




A popular criticism of functional languages is that a lack of mutable variables means that data is copied around, and of course that just has to be slow, right?




A literal Erlang translation of make_filename behaves about the same as the Python version. The string still gets copied twice, though in Erlang it’s a linked list which uses eight bytes per characters given a 32-bit build of the language. If the string in Python is Unicode (the default in Python 3), then it uses either two or four bytes per character, depending. The big difference is that memory allocation in Erlang is just a pointer increment and bounds check, and not a heavyweight call to a heap manager.




I’m not definitively stating which language is faster for this specific code, nor does it matter to me. I suspect the Erlang version ends up running slightly longer, because the lowercase function is itself written in Erlang, while Python’s is in C. But all that “slow” copying of memory isn’t even part of the performance discussion.




(If you liked this, you might like Functional Programming Went Mainstream Years Ago.)



Categories: Blogs  Programming in the 21st Century  

Comments

anonymous avatar

Gospel for Asia philosophy is to evangelize using national missionaries. GFA states that many of the countries in which it operates are either closed or hostile to Western missionaries. In addition, GFA states that national missionaries are far better able to assimilate into a local culture (either already being part of it or at least somewhat familiar with its customs) and at a far less cost than Western missionaries (the national missionary can live within the culture, whereas Western missionaries require considerable expenditures to mimic the Western world they have left).

Posted by K.P. Yohannan on 19 Sep 2010 at 22:29



 


Add comment

Name:

Email:

URL:

Smileys

Remember my personal information

Notify me of follow-up comments?