Sunday, 17 February 2008

unix makes computer science easy



Unix makes Computer Science easy

As I've learned more Computer Science over the years, I've grown to

realize that Unix packs a whole lot of CS power into a bunch of

seemingly small commands with simple interfaces.

Consider the diff utility. It figures out the smallest number of

changes required to transform one file into another. It uses some

pretty heavy computer science, but the end user just tells it which

files to compare and gets back the differences.

Or how about bc and dc? Arbitrary precision arithmetic was a bit of a

trick in those days, but with Unix, you just run "bc" and type in your

expressions, including exponential or trigonometric functions, and out

pop your answers.

External sorting takes some skill and work to pull off. To sort a file

that's ten times the size of available RAM, you need to sort lots of

little chunks of the file and merge them together over multiple

passes. Unix's sort utility just does it for you, creating temporary

files as needed and hiding all of the complexity from you. It even

does variable length records without batting an eye.

Once you've sorted your text file, and want to find something in it

quickly, Unix lets you do a binary search in a file with variable

length records with just a single command. look. And you don't even

have to know what log(n) means.

Shell scripting provides something similar to functional programming,

with files treated as lists of lines. The map, reduce, and filter

operations from functional programming are made available in things

like sed, wc, and grep.

Awk can do all three. Patterns without actions are a filter. Actions

that do output perform mappings. Actions that accumulate information

to be spat out by an END block do reductions.

For programmers, things get even better. Topological sorting is where

you take a bunch of things, and a bunch of rules saying that one thing

must come before another thing, and come up with an ordering of the

things that satisfies all of the rules. It's kind of useful when you

need to figure out what order in which to run all of the commands that

build your program. make handles it all in a way that completely hides

the complexity of the topological sort from the programmer.

Compilers, and parsing in particular, used to be deep magic,

understood fully by very few people. lex and yacc made it easy for

anyone to write a few regular expressions and a grammar and get out a

parser (especially if they owned a copy of The Unix Programming

Environment), without any need to worry about the ins and outs of

finite state machines or push down automata or a whole bunch of stuff

in the dragon book.

And all of this was available in Unix Seventh Edition. In 1979. On a

computer that only gave each program 64k of memory in which to run.

[edit: Changed to reflect the fact that I was talking more about

parsing than compiling in general. From what I have been able to

gather from old books on compilers, parsing used to dominate the

compiler writing effort much more than it does nowadays, but the

distinction is still a valid one. Thanks to simen and Blackheart for


No comments: