A Peek Inside the Erlang Compiler
Programming in the 21st Century - James Hague - February 06, 2012Erlang is a complex system, and I can’t do its inner workings justice in a short article, but I wanted to give some insight into what goes on when a module is compiled and loaded. As with most compilers, the first step is to convert the textual source to an abstract syntax tree, but that’s unremarkable. What is interesting is that the code goes through three major representations, and you can look at each of them.
Erlang is unique among functional languages in its casual scope rules. You introduce variables as you go, without fanfare, and there’s no creeping indentation caused by explicit scopes. Behind the scenes that’s too quirky, so the syntax tree is converted into Core Erlang. Core Erlang looks a lot like Haskell or ML with all variables carefully referenced in “let” statements. You can see the Core Erlang representation of a module with this command from the shell:
c(example, to_core).
The human-readable Core Erlang for the example module is written to example.core.
The next big transformation is from Core Erlang to code for the register-based BEAM virtual machine. BEAM is poorly documented, but it’s a lot like the Warren Abstract Machine developed for Prolog (but without the need for backtracking). BEAM isn’t terribly hard to figure out if you write short modules and examine them with:
c(example 'S').
The disassembled BEAM code for the example module is written to example.S. The key to understanding BEAM is that there are two sets of registers: one for passing parameters (“x” registers) and one for use as locals within functions (“y” registers).
Virtual BEAM code is the final output of the compiler, but it’s still not what gets executed by the system. If you look at the source for the Erlang runtime, you’ll see that beam_load.c is over six thousand lines of code. Six thousand lines to load a module? That’s because the beam loader is doing more than its name lets on.
There’s an optimization pass on the virtual machine instructions, specializing some for certain situations and combining others into superinstructions. To check if a value is a tuple of three elements is accomplished with a pair of BEAM operations: is_tuple and is_arity. The BEAM loader turns these into one superinstruction: is_tuple_of_arity. You can see this condensed representation of BEAM code with:
erts_debug:df(example).
The disassembled code is written to example.dis. (Note that the module must be loaded, so compile it before giving the above command.)
The loader also turns the BEAM bytecode into threaded code: a list of addresses that get jumped to in sequence. There’s no “Now what do I do with this opcode?” step, just fetch and jump, fetch and jump. If you want to to know more about threaded code, look to the Forth world.
Threaded code takes advantage of the labels as values extension of gcc. If you build the BEAM emulator with another compiler like Visual C++, it falls back on using a giant switch statement for instruction dispatch and there’s a significant performance hit.
(If you liked this, you might enjoy A Ramble Through Erlang IO Lists.)
Categories: Blogs Programming in the 21st Century
Comments
Very nice! Thanks for all that explanations! It is very useful for a beginner like me!
Imbracaminte
Add comment
Erlang on Twitter
» dessyrosalia (♡pesek mancung♥ ): Erlang ke rumahku donk kangen nih
» si_erlang (Erlangga Adhitya): 75% dalane jahanam
» GeekDani (Dani Kim): @charsyam 그렇군용. :-) 여긴 서늘한데. 크크. Erlang Meetup 준비는 잘 하시나요. ㅋㅋ
» syahlafatimahA (LalaTik(ʃ⌣ƪ) ): Waaa?! Si erlang suka cherrybelle(?) wkwkwkwk ngakak aih xD
» yosukehara (Yosuke Hara): I’ll be a simple test for benchmarking JSX and Jiffy together. #erlang
» Debbyvheumen (Debby van Heumen): @elisaaa15 @kleingeld_ haha okee succes :) blijven jullie erlang
» ovatsus (Gustavo Guerra): RT @martintrojer: Just *blogged “Distributed Actors in Clojure” on http://t.co/WcKBpNBR #Clojure #Akka #Erlang #in
» larshesel (Lars Hesel): ...or rather: 4 days of Erlang hacking coming up!
» hongye_erlang (紅葉): とりあえずチャイナ。
» Erlang_ABNIC (Erlangga .A): RODOK !!! Dtakok"I genah” jawabane malah ngelantur! (N)
Statistics
Number of aggregated posts: 10498
Number of comments: 2115
Most recent article: May 15, 2012
Latest comments
» cheap soccer jerseys on Memory Models in Erlang vs Java: Nice discussion here,you are doing a great job. i was looking for this information. i found it on your page…
» mandesejohn on Couchbase Meetup at new HQ: Thanks for sharing experience. It should be really a great post. It should be knowledgeable and informative. Keep it up. flower delivery columbus ohio
» vermaseo on Scale means Skills: I’m surprised people are still commenting about this. George has been moved on to bigger and better things with the president for awhile now.ledikanten