Let’s Take a Trivial Problem and Make it Hard
Programming in the 21st Century - James Hague - May 04, 2009Here’s a simple problem: Given a block of binary data, count the frequency of the bytes within it. In C, this could be a homework assignment for an introductory class. Just zero out an array of 256 elements, then for each byte increment the appropriate array index. Easy.
Now write this in a purely functional way, with an efficiency close to that of the C implementation.
It’s easy to do a straightforward translation to Erlang, using tail recursion instead of a for loop, like this:
freq(B) when is_binary(B) -> freq(B, erlang:make_tuple(256, 0)). freq(<>, Totals) -> I = X + 1, N = element(I, Totals), freq(Rest, setelement(I, Totals, N + 1)); freq(<<>>, Totals) -> Totals.
But of course in the name of purity and simplicity, setelement copies the entire Totals tuple, so if there are fifty million bytes, then the 256 element Totals is copied 50 million times. It’s simple, but it’s not the right approach.
“Blame the complier” is another easy option. If it could be determined that the Totals tuple can be destructively updated, then we’re good. Note that the garbage collector in the Erlang runtime is based on the assumption that pointers in the heap always point toward older data, an assumption that could break if a tuple was destructively updated with, say, a list value. So not only would the compiler have to deduce that that the tuple is only used locally, but it was also have to verify that only non-pointer values (like integers and atoms) were being passed in as the third parameter of setelement. This is all possible, but it doesn’t currently work that way, so this line of reasoning is a dead end for now.
Totals could be switched from a tuple to a tree, which might or might not be better than the setelement code, but there’s no way it’s in the same ballpark as the C version.
What about a different algorithm? Sort the block of bytes, then count runs of identical values. Again, just the suggestion of sorting means we’re already off track.
Honestly, I don’t know the right answer. In Erlang, I’d go for one of the imperative efficiency hacks, like ets tables, but let’s back up a bit. The key issue here is that there are some fundamental assumptions about what “purely functional” means and the expected features in functional languages.
In array languages, like J, this type of problem is less awkward, as it’s closer to what they were designed for. If nothing else, reference counted arrays make it easier to tell when destructive updates are safe. And there’s usually some kind of classification operator, one that would group the bytes by value for easy counting. That’s still not going to be as efficient as C, but it’s clearly higher-level than the literal Erlang translation.
A more basic question is this: “Is destructively updating a local array a violation of purely functionalness?” OCaml allows destructive array updates and C-like control structures. If a local array is updated inside of an OCaml function, then the result copied to a non-mutable array at the end, is there really anything wrong with that? It’s not the same as randomly sticking your finger inside a global array somewhere, causing a week’s worth of debugging. In fact, it looks exactly the same as the purely functional version from the caller’s point of view.
Perhaps the sweeping negativity about destructive updates is misplaced.
Categories: Blogs Programming in the 21st Century
Comments
wouldn’t functional way be to store each frequency individually? it would scale much better too.
Posted by randuev on 05 May 2009 at 08:07Use a function that takes 1 binary and 256 integers :p
It would be very ugly (you dont want to write it yourself; generate the code instead) but it would be fast and properly functional.
I actually had to resort to this technique in a similar problem where the clean equivalent (lists:map, tuple_to_list…) was killing performance. What I really wanted was programatically-generated Funs, but code-gen does the job.
Posted by moltonel on 05 May 2009 at 11:56Your solution is not a “functional” solution but an “object orientend” one! Infact functional and object oriented are two ortogonal way to reach the same goal: reduce bugs due to states proliferation.
Posted by Mario on 12 May 2009 at 15:59
Add comment
Erlang on Twitter
» ingojaeckel (ingo jaeckel): Even more awesome, free Erlang resources http://t.co/blGINLJd
» DiTeam (Тимурка): @multybuq @ukhin руби хороший вариант :) можно даже без rails..попробуй erlang еще :)
» michelir5 (Micheli Gelatinous): @pharkmillups Still seeing it. I might just have to manually install it. The version of Erlang required by Riak is not current version in HB
» Angry_Lawyer (Tony Aldridge): @rvirding @saghul If Erlang kills you, does a supervisor automatically create a replacement of you?
» rvirding (Robert Virding): Softly I hope. RT @saghul: Slowly making progress… erlang is killing me.
» jsvd (João Duarte): RT @FrancescoC: Woot! RT @valdo404: Practical Erlang Programming at #QConLondon I want to go there
» saghul (Saúl Ibarra Corretgé): Slowly making progress… erlang is killing me.
» dlsspy (Dustin Sallings): @IbnFirnas heh. The erlang parts are still solid. The currently active alerting box is arm5, failed over from a pc that died one day.
» quercialwji2 (Quercia Quinn): @MikeSmooth_ABCs http://t.co/pPiIpTCx
» levicole (Levi Kennedy): @pharkmillups the homebrew version of erlang is the most recent version, and riak requires R14B I think.
Statistics
Number of aggregated posts: 10456
Number of comments: 1445
Most recent article: February 06, 2012
Latest comments
» simple smile on Scale means Skills: Very informative article. Pretty sure people would love to go to that place for shopping. Specially to those who are…
» simplesmile on 27 January 2012: Erlang Solutions embarks on an Erlang Embedded KTP: Your article will make the world better. Thanks again and good luck to you in your life. See you next time.simplesmile
» tandblekning easewhite on 08 February 2012: Erlang Express 3-day Course in San Francisco on 8 February: ncomprehensible to me now, but in general, the usefulness and significance is overwhelmingtandblekning easewhite