Let’s Take a Trivial Problem and Make it Hard
Programming in the 21st Century - James Hague - May 04, 2009Here’s a simple problem: Given a block of binary data, count the frequency of the bytes within it. In C, this could be a homework assignment for an introductory class. Just zero out an array of 256 elements, then for each byte increment the appropriate array index. Easy.
Now write this in a purely functional way, with an efficiency close to that of the C implementation.
It’s easy to do a straightforward translation to Erlang, using tail recursion instead of a for loop, like this:
freq(B) when is_binary(B) -> freq(B, erlang:make_tuple(256, 0)). freq(<>, Totals) -> I = X + 1, N = element(I, Totals), freq(Rest, setelement(I, Totals, N + 1)); freq(<<>>, Totals) -> Totals.
But of course in the name of purity and simplicity, setelement copies the entire Totals tuple, so if there are fifty million bytes, then the 256 element Totals is copied 50 million times. It’s simple, but it’s not the right approach.
“Blame the complier” is another easy option. If it could be determined that the Totals tuple can be destructively updated, then we’re good. Note that the garbage collector in the Erlang runtime is based on the assumption that pointers in the heap always point toward older data, an assumption that could break if a tuple was destructively updated with, say, a list value. So not only would the compiler have to deduce that that the tuple is only used locally, but it was also have to verify that only non-pointer values (like integers and atoms) were being passed in as the third parameter of setelement. This is all possible, but it doesn’t currently work that way, so this line of reasoning is a dead end for now.
Totals could be switched from a tuple to a tree, which might or might not be better than the setelement code, but there’s no way it’s in the same ballpark as the C version.
What about a different algorithm? Sort the block of bytes, then count runs of identical values. Again, just the suggestion of sorting means we’re already off track.
Honestly, I don’t know the right answer. In Erlang, I’d go for one of the imperative efficiency hacks, like ets tables, but let’s back up a bit. The key issue here is that there are some fundamental assumptions about what “purely functional” means and the expected features in functional languages.
In array languages, like J, this type of problem is less awkward, as it’s closer to what they were designed for. If nothing else, reference counted arrays make it easier to tell when destructive updates are safe. And there’s usually some kind of classification operator, one that would group the bytes by value for easy counting. That’s still not going to be as efficient as C, but it’s clearly higher-level than the literal Erlang translation.
A more basic question is this: “Is destructively updating a local array a violation of purely functionalness?” OCaml allows destructive array updates and C-like control structures. If a local array is updated inside of an OCaml function, then the result copied to a non-mutable array at the end, is there really anything wrong with that? It’s not the same as randomly sticking your finger inside a global array somewhere, causing a week’s worth of debugging. In fact, it looks exactly the same as the purely functional version from the caller’s point of view.
Perhaps the sweeping negativity about destructive updates is misplaced.
Categories: Blogs Programming in the 21st Century
Erlang on Twitter
» doni_erlang (Dony Erlangga): RT @nabilahJKT48: ngantuk—” bsok theater yah?hemm ketemu besok yaa ;)) hehe ayusuminabilah! http://t.co/S8tdHQ8y6k
» rmfajar (RM Fajar M. Dewanto): @Greschinov yoii bro erlang, terima kasih untuk saran dan masukannya, langsung dilaksanakan ini :D
» bertabertabee (Berta♚ ): @doni_erlang follback
» im8x10 (im8x10): http://t.co/32mmDMRbvO LeonidKolobert » В мире авто- новости автопрома, краш-тесты, выбор авто, тест-драйв, фото...
» erlnews (Erlang news): Erlang command test failed to write “coverlog” #erlang http://t.co/q12fdLbXrR
» sleeptillseven (Sleeptillseven ツ): I might be able to push an Erlang project at my work.
» aloiscochard (Alois Cochard): RT @JoelJacobson: I’m hosting the Big Data, Big Databases and Next Generation Analytics track at the Erlang user conf in Stockholm - http:/…
» tita_titutt (Shinta N. Mayliyanti): Cieee, mau banget yaahh akakaaa RT @Erlang_07: @tita_titutt ayuuukk lah d ikutin mau y. Wkwkwk
» domwongtweets (Dominic Wong): The erlang fog is slowly lifting
» ErlanggaPutraAr (Erlangga Putra A): Asiiiin haha “@inekedevita: Kecuuut haha “@ErlanggaPutraAr: Iyo bsok tk traktir cilok “@inekedevita: Dalam rangka pingin makan haha “@Erlang
Statistics
Number of aggregated posts: 10650
Most recent article: May 20, 2013
Latest comments
» Moraru on This is Why You Spent All that Time Learning to Program: It is true that computer science was a pain in the back at time that i’ve had to learn it…
» Commercial hand dryers on Couchbase Meetup at new HQ: Buy online from here where you will get so much of variety in Commercial hand dryers for people. If you…
» Fort McMurray Homes on Motivated Reasoning and Erlang vs Python vs Node: I don’t really understand why this post is motivational? I don’t even see a post, just a title. Fort McMurray…