Optimization on a Galactic Scale
Programming in the 21st Century - James Hague - October 08, 2011The code to generate this site has gotten bloated. When I first wrote about it, the Perl script was 6838 bytes. Now it’s grown to a horrific 7672 bytes. Part of the increase is because the HTML template is right there in the code, so when I tweak or redesign the layout, it directly affects the size of the file.
The rest is because of a personal quirk I’ve picked-up: when I write tools, I don’t like to overwrite output files with exactly the same data. That is, if the tool generates data that’s byte-for-byte identical to the last time the tool was run, then leave that file alone. This makes it easy to see what files have truly changed, plus it often triggers fewer automatic rebuilds down the line (imagine if one of the output files is a C header that’s included throughout a project).
How do you avoid overwriting a file with exactly the same data? In the write_file function, first check if the file exists and if so, is it the same size as the data to be written? If those are true, then load the entire file and compare it with the new data. If they’re the same, return immediately, otherwise overwrite the existing file with the new data.
At one time I would have thought this was crazy talk, but it’s simple to implement, works well, and I’ve yet to run into any perceptible hit from such a mad scheme. This site currently has 112 pages plus the archive page and the atom feed. In the worst case, where I force a change by modifying the last byte of the template and regenerate the whole site, well, the timings don’t matter. The whole thing is over in a tenth of a second on a five year old MacBook.
That’s even though the read-before-write method has got to be costing tens or hundreds of millions of cycles. A hundred million cycles is a mind-bogglingly huge number, yet in this case it’s irrelevant.
As it turns out, fully half of the execution time is going into one line that has nothing to do with the above code. I have a folder of images that gets copied into another folder if they’ve changed. To do that I’m passing the buck to the external rsync command using Perl’s backticks.
It’s oh so innocuous in the Perl source, but behind the scenes it’s a study in excess. The shell executable is loaded and decoded, dependent libraries get brought in as needed, external references are fixed-up, then finally the shell itself starts running. The first thing it does is start looking for and parsing configuration files. When the time comes to process the rsync command, then here we go again with all the executable loading and configuration reading and eventually the syncing actually starts.
It must be a great disappointment after all that work to discover that the two files in the image folder are up to date and nothing needs to be done. Yet that whole process is as expensive as the rest of the site generation, much more costly than the frivolous reading of 114 files which are immediately tromped over with new data.
This is all a far cry from Michael Abrash cycle-counting on the 8086, from an Apple II graphics programmer trimming precious instructions from a drawing routine. Today performance optimization doesn’t matter unless you’re saving tens of millions, hundreds of millions, or billions of cycles.
(If you liked this, you might enjoy How Did Things Ever Get This Good?)
Categories: Blogs Programming in the 21st Century
Comments
I have a lot to benefit from this article and thank you for this wonderful effort to this article and will continue my many articles you have other distinctive
Posted by بوابة نعم on 31 Mar 2012 at 11:21
Add comment
Erlang on Twitter
» ivansyahhsn (ivansyah): Iya dewa erlang hbd,awas ya siksamu menanti RT @indrasan: selamat ulang tahun saudara reza erlang @rezasur semoga makin banyak proyek nya ya
» CzarneckiD (David Czarnecki): It was like Mr. Toad’s Wild Open Source Ride here tonight: Erlang, Riak, CouchDB, Ruby and Python. #nofastpassrequired
» yang_yihming (Yiming Yang): @vw009 Which language do you often use in parallel programing? Ocaml? Erlang? C01? Or some other language?
» tengkushara (T Muni Fahtu Zahra): RT @fathiaamandaaa: RT @indrasan: selamat ulang tahun saudara reza erlang @rezasur semoga makin banyak proyek nya ya.
» wolfeidau (Mark Wolfe): Coding in emacs and enjoying it, hell has frozen over.. #erlang #emacs
» bagus_erlang (bagus): SI rizky kocak ♓é² :.. ♓é² :.. ♓é² :..
» ErNugraha7G (ErlAngga™): Enggaa lama bgt balesnya? RT @VanessaaaZM: apose? RT @ErNugraha7G Yah mention erlang gadibales @VanessaaaZM
» bagus_erlang (bagus): Bt nieh!! Di sklh…
» VanessaaaZM (Vanessa Zian M): apose? RT @ErNugraha7G Yah mention erlang gadibales @VanessaaaZM
» VanessaaaZM (Vanessa Zian M): pacar aku abel;D RT @annisaH_Ijem jangan samain kakak aku sama si erlang :;pp RT @VanessaaaZM Kaya abel RT @annisaH_Ijem: ngeledek, liat
Statistics
Number of aggregated posts: 10503
Number of comments: 2136
Most recent article: May 21, 2012
Latest comments
» Jessica on 30 September 2011: Basho Technologies, Erlang Solutions and Trifork AS Announce Big Data and NoSQL R: yeah of course. I just thought that everything will be just alright and I want to have these kind of…
» DRS786 on 25 May 2012: Poznan Erlang User Group Event: I’m going!
» the tantric way in london on TextOne HD for webOS: Interesting articles are published here. By reading it I acquired great deal of knowledge on various subject. Thank you for…