This post will explore the famous winner of the first IOCCC contest, dive into the story behind the code, and even display the original source programs!

A few years ago I wrote twoposts
on medium.com explaining the workings of some obfuscated programs. After these posts other stuff took priority and I forgot about the world of obfuscation for a while. Now though, I find myself returning to it.
For reasons I’ll explain in a different post I find studying obfuscated code to be quite valuable. So, I thought to myself ‘Well, those last programs I examined were honestly pretty simple. What would it be like to research a truly obfuscated program?’
This post is the story of that journey. I will discuss the code, how I got such old and obscure code to run, as well as include snippets from my conversations with
one of the original authors (who was very helpful in figuring some of this out). If all that wasn’t enough I managed to obtain the original PDP and VAX source code, it will be hosted here with permission. I want to give a huge thank you to Sjoerd Mullender and Don Libes for their assistance and permission in reproducing some of this material.
Because older obfuscated programs tend to be easier to dissect I decided to stay in the early years of the IOCCC (International Obfuscated C Code Contest).
Among those early programs one sticks out as truly interesting – and that is the first IOCCC winner mullender.c, which I’ve also replicated below because it’s quite short.
short main[] = {
277, 04735, -4129, 25, 0, 477, 1019, 0xbef, 0, 12800,
-113, 21119, 0x52d7, -1006, -7151, 0, 0x4bc, 020004,
14880, 10541, 2056, 04010, 4548, 3044, -6716, 0x9,
4407, 6, 5568, 1, -30460, 0, 0x9, 5570, 512, -30419,
0x7e82, 0760, 6, 0, 4, 02400, 15, 0, 4, 1280, 4, 0,
4, 0, 0, 0, 0x8, 0, 4, 0, ‘,’, 0, 12, 0, 4, 0, ‘#’,
0, 020, 0, 4, 0, 30, 0, 026, 0, 0x6176, 120, 25712,
‘p’, 072163, ‘r’, 29303, 29801, ‘e’
Where do you start with a program like this? Well, the hint provided by the IOCCC is always a good first step.
From this hint, we can learn a few things necessary to understanding how the program works:

  • Perhaps most importantly, this program will only run on a PDP-11 or VAX-11.
  • The C startup routine – provided by crt0.o – transfers control to a location named ‘main’ – usually intended to be a function, but in this case it’s an array of shorts
  • These shorts (represented in different formats to enhance obfuscation) form a meaningful set of PDP-11 and VAX-11 instructions
  • The first instruction is a branch to the PDP code, and due to the way VAX calls the main ‘function’ it skips the first word of the program (I’ll discuss this more later).
  • The program prints a string

Essentially, the program is a bunch of machine code instructions encoded in various formats. It runs because this code predates ANSI C by a half decade and old compilers seem to have had significantly less qualms about
‘main’ being a function or not (I tried to define ‘main’ as an array of shorts with a modern compiler and it did not like that one bit).
Running the thing on a PDP-11⌗
I admit it, the PDP-11 was before my time. I grew up in the Windows 95/98/2000 era, and until recently I didn’t even know what a PDP-11 was.
This ignorance left me kind of stuck, and a brief look on eBay confirmed that I was not going to get a working PDP-11 any time soon. Fortunately though,
an emulator exists. Enter SimH, it’s more of a suite of emulators for a wide range of historical computers – of which the
PDP-11 is one, and lucky for us the VAX-11 is another.
After a few false starts and misadventures with 2.11BSD, I realized I had no idea what I was doing so I decided I needed some help.
The two original authors of this program are Robbert van Renesse and Sjoerd Mullender, why not try to contact them? So I did.
Robbert didn’t answer my e-mail, but luckily Sjoerd got back to me and was happy to discuss the program. He told me that the
PDP-11 OS they used was Unix v7, so I proceeded on that route.
I managed to find a guide on installing Unix v7 onto this emulator that worked;
and now with my emulator running, and OS installed, I could slap it into a file, compile, and run the thing. Below is what I got.
Running the program on the PDP-11 using Unix v7
The animation is a little choppy, but even still you can hopefully see that it is printing “:-)” across the screen until it is forced to exit (there is no clean way to stop it).
In my discussions with Sjoerd, it was revealed that the string it prints is not exactly “:-)” but instead ” :-)\b\b\b\b”
(NOTE: there should be 2 space characters at the start but my templating engine is removing one for some reason).
It is exactly 9 bytes long. The first 2 characters are whitespace, followed by the :-), finally followed by four ‘\b’
characters. The ‘\b’ characters indicate a backspace. What does the backspace accomplish? Well, if you count 4 characters back from when they begin you’ll see we’re left with a single space – it is at this point
that the program loops and prints the string again. This happens quickly enough that to our eyes it looks like the “:-)” is scrolling across the screen – a clever effect.
Examining the PDP-11 code⌗
As it turns out, the C code is not very useful to examine directly. As it also turns out, the assembly isn’t much better. Below is a small sample of what the assembly looks like (from the PDP-11’s perspective):
.globl _main
.word 0425
.word 04735
.word -010041
.word 031
.word 0
.word 0735
.word 01773
.word 05757
So, what to do? Well, you can actually set up a cross compiler for the PDP-11 and if you go through that effort you can use the objdump program targeted at PDP-11 binaries.
Below is the relevant portion of the objdump – the whole thing is here, but not all of it is useful. Sjoerd confirmed that the PDP code is
from _main+0x2c to _main+0x4a (inclusive). Of course, the first word is also a PDP-11 instruction. Everything I’ve ommitted is related to the VAX portion of
the program which we will get back to. (It’s also not proper VAX code since the disassembler thinks everything is PDP-11).
0: 0115 br 0x2c
2c: 11c4 mov pc, r4
2e: 0be4 tst -(r4)
30: e5c4 0009 sub $11, r4
34: 1137 0006 mov r4, $0x3e
38: 15c0 0001 mov $1, r0
3c: 8904 sys 4
3e: 0000 halt
40: 0009 .word 11
42: 15c2 0200 mov $1000, r2
46: 892d sys 55
48: 7e82 sob r2, 0x46
4a: 01f0 br 0x2c
Before going further there are 2 important things to know about PDP-11 assembly code:

  • The PDP-11 works primarily with octal. All the addresses and instructions are encoded in octal.
  • In an instruction like mov pc, r4, you are moving pc INTO r4, not r4 into pc. Personally, I got used to this pretty quickly.

The first number in the original program is 277, put that into a hex converter and you’ll get 115 like we see in the objdump. More interesting though is what you get translating it into octal – 425.
How does 0425 get you br 0x2c though? Well for that you need to grab some manuals. I got my hands on a lot of old manuals for these computers but the most useful one was the PDP-11/45 Handbook from 1973.
It would helpfully describe the instruction format, and exactly what it did. Below is an example of the page for the unconditional branch instruction.
PDP-11/45 Handbook Describing the Branch instruction
As you can see, it sets the program counter to pc + 2x the offset – which is 25 (octal) in our case. Since the program counter at the start of our program is 2 (because it points to the next word to execute),
we get 2 + (25 x 2), or 2 + 52 = 54, which translating back into hex gives us 2c. Cool!
The meat of the code⌗
Now that we know how the branch works we can examine the main stuff – I won’t go into huge depth with the manuals for each little instruction but doing so is fun if you’re bored and enjoy that sort of thing.
The next relevant portion of code is this:
mov pc, r4
tst -(r4)
sub $11, r4
mov r4, $0x3e
mov $1, r0
sys 4
.word 11
We move the program counter into the r4 register and then use the ‘test’ (tst) instruction. This has the effect of clearing the some flags in the PSW (processor status word), and because
the register is using ‘Autodecrement mode’ it will subtract 2 from the value stored at r4. When using autoincrement / decrement, it will go by 1 in byte instructions or 2 in word instructions.
After this we subtract 11 from r4, but 11 is actually 9 (used to octal yet?). Then we store r4 into…0x3e? Well, if you look back at the objdump you’ll see that 0x3e is ‘halt’ or 0000. This
mov instruction overwrites that value with the contents of r4. We move 1 into r0 (1 indicating STDOUT), and call ‘sys 4’. The next line will be the value of r4 as just mentioned, and .word 11
simply inserts octal 11 directly into that spot. If you’re wondering what exactly is in r4 and why we’re doing all this stuff, be patient – we’re getting there.
If you’re familiar with system calls, you probably know what’s going on. If not, all of this is to setup the write system call. I
actually made a post on the retro-computing stackexchange
asking about the workings of this call internally – I recommend checking it out if you want further clarity on how this works.
After this we have the final 3 lines of PDP code which are:
mov $1000, r2
sys 55
sob r2, 0x46
We put 1000 into r2, and call system call 55. After this, we use the ‘sob’ instruction – which is ‘subtract one and branch’ and not an invitation to cry – to see if r2 is 0 yet. If it isn’t, we branch
to the previous line and call system call 55 again. loop until r2 is 0. Syscall 55 eh? Which one is that?
If you look up system call numbers you might be tempted to believe it’s the reboot system call – but alas, it is not. This is an undefined system call,
and Sjoerd states that these 3 lines were only included to introduce a slight delay to the program (presumably for more legible printing on the screen).
Finally, after the delay has delayed things, we branch back up to 0x2c and the fun begins all over again.
Where is the string?⌗
So that’s all well and good. We setup a system call to print the string, we delay a bit, then we repeat. Where is the string though? If you guessed that r4 is a pointer
to the string you’d be correct – but the string is not in the PDP-11 code. So you might think it’s in the VAX code, but it is not there either. It is at this point that
it’s useful to understand the ‘segments’ of this program – i.e, how it is laid out. It is very deliberate, and part of the design.
I hope you find my ASCII art explanation useful, it doesn’t look good on mobile though so if you’re reading on mobile
here is the same diagram as a picture
PDP-11 Entry
| VAX Entry
| +
v |
+——-+——+ |
+—–|Initial Branch|————> Address _main+0x0
| +————–+ |
| +————–+ |
+——–| VAX Code |<—+——-> Address _main+0x2 – _main+0x22
| | +————–+
| | +————–+
| | | String Bytes |————> Address _main+0x23 – _main+0x2B
| | +————–+
| | +————–+
| +—->| PDP Code |————> Address _main+0x2C – _main+0x4A
| +————–+
| +————–+
+——->| VAX Code |————> Address _main+0x4C – _main+0xA0
As you can see above, the printed string is not in either portions ‘code’ – rather it exists between the two sections. For the PDP, it is right before it.
This is why we move the program counter (pc) into r4 and then decrement it by 2 and subtract 9 – the string is 9 bytes long. This leaves us with the address
of the beginning of the string in r4.
This also demonstrates why the PDP-11 objdump file is only partly useful, it interprets these bytes as instructions when they are not. This initially misled me
into believing that they have some purpose in terms of the codes execution but nothing that complex is going on. That is, the bytes do not do ‘double duty’ as code
and data – it is simply data carefully placed between two code sections.
We can verify this in adb (‘Advanced Debugger’ – not Android Debug Bridge, also; if this is advanced I’d hate to see
the regular ‘db’ program). Below is a picture of an adb session showing the subtraction, a dump of the registers, as well as the printed string and the octal representation
of the bytes.
ADB Session showing how to locate the string
You’ll probably notice the numbers are all octal (even the bytes of the string). They also do not match up with the objdump numbers – this is because of the C startup code.
Looking at the VAX code⌗
I know what you’re thinking – all this old stuff is neato but it sure would be better if we added docker to the mix. Well, I couldn’t agree more. I didn’t want to go
through the hassle of installing another OS on one of these old machines and luckily for me someone has distributed a docker image with VAX-11 running 4.3BSD on SimH.
I booted that all up and compiled the program and it worked, as expected. No gif of this one running – it looks exactly the same as last time.
The assembly code for the VAX program is unhelpful, much like the PDP assembly code. Below is a sample:
.long 0x9dd0115
.long 0x19efdf
.long 0x1dd0000
.long 0xbef03fb
.long 0x32000000
.long 0x527fff8f
.long 0xfc1252d7
To complicate matters, there is no objdump program (that I could find) available for the VAX architecture. I suspect this was largely because it was used in proprietary spaces
and never really had a market outside of large businesses / universities. So basically, the only way to look at this is with the dbx debugger –
a debugger originally shipped with BSD.
So, let’s see what dbx says…
00000408 bleq 40b
0000040a pushl $9
0000040c pushal 42b
00000412 pushl $1
00000414 calls $3,426
0000041b cvtwl $7fff,r2
00000420 decl r2
00000422 bneq 420
00000424 brb 40a
00000426 halt
00000427 halt
00000428 chmk $4
0000042a ret
0000042b addp4 $20,$3a,$2d,$29
00000430 cvtps $8,$8,$8,-7151(r4)
00000437 crc 2533(r4),$0,$37,$11
0000043e ldpctx
0000043f halt
00000440 addl2 $15,$1
00000443 halt
00000444 ret
00000445 bisb3 $0,$0,$9
00000449 halt
0000044a subl2 $15,$0
0000044d rei
0000044e cmpc5 (r9)+,(r2)+,-(sp),*1537(r0),$0
00000458 ret
00000459 halt
0000045a halt
0000045b rsb
0000045c remque $0,$0
Double Yikes
Yes, it is significantly more complicated than the PDP stuff. Trying to figure out what is even relevant is difficult, and perhaps that is because this code was never written by a person.
Here is what Sjoerd said:
I have never known a lot about the VAX assembly, so we used the C
compiler to create the VAX code. We didn’t write it ourselves from
scratch as we did with the PDP code. This is the reason why the VAX
code is more complex, including the extra data after the PDP code.
I tried to get up to speed on some VAX assembly but most of it is horribly complex – especially compared to the PDP. The amount of times the manual told me that the behaviour of something was
‘UNPREDICTABLE’ (it always used all caps) was…a lot.
So to save my sanity I gave up on understanding all of it (if you’re a VAX expert though I would be curious to hear your thoughts – get in touch). That being said some things do deserve explanation.

  • The ‘calls’ instruction. This invokes a procedure with a number of arguments on the stack. This instruction also uses the first word at the destination as a mask of registers to be saved,
    this is how we’re able to skip over the first branch to PDP code when running on VAX. (In that case, it is the C startup routine invoking ‘calls’)
  • ‘chmk $4’. This instruction switches to kernel mode, and I can only assume $4 is the write system call. I did not find great explanations about how this really works

Procedures being invoked with calls explains a bit of the above code. Specifically this bit:
0000040a pushl $9
0000040c pushal 42b
00000412 pushl $1
00000414 calls $3,426
This is putting on the stack the length of the string, the pointer to the string, and the file descriptor we’re writing to – STDOUT.
That’s all to really say about the VAX stuff. I wish I could explain what the bit after the PDP code does but it seems like no one has a good idea.
How the program happened in Sjoerd’s own words⌗
To close this out, here is the story of the program. I was curious how it came about, and how Sjoerd and Robbert heard of the IOCCC – afterall this was the first year it existed.
Robbert and I were students at the VU (Free University in Amsterdam) at
the time (mathematics with CS as our major since there was no CS
curriculum when we started). We had an assignment to create a pair of
programs for the computer networks course. The programs were supposed
to send data reliably from one program to the other over an unreliable
channel. This channel was simulated with a pair of pipes.
We decided for fun to create an obfuscated set of programs, only for the
PDP, to do this, but cirucumventing the channel. (I.e. cheating, hence
the needed obfuscation.) Our programs worked and we handed them in.
Of course, the teacher had a good laugh and then rejected our
submission. (We knew him well, so we could get away with this.)
Then the IOCCC came along. I don’t remember how we heard about it, but
at the time there was a world-wide messaging network Usenet
where we read a bunch of newsgroups. I’m sure it was announced there and we saw it.
Since we had just recently created these obfuscated programs we decided
we could use the same technique for an obfuscated C program. We upped
the ante a bit by making it “portable”.
To add to the obfuscation, we used different formats for the integers in
the array, some in decimal, some in octal, some in hexadecimal, and when
the value would fit, some as an ASCII character.
The rest is history.
Since this was the first contest, we hadn’t seen any old entries, nor
had any of the other contestants. Of course we knew about #define and
tricks you could do with that, but we didn’t need that for this program.
In fact, we made it as “standard” as possible. At the time there was
this program called “cb” for C beautifier which would reindent your
program to make the layout look better. Our program is idempotent under cb.
The Original Source⌗
Throughout the course of this research Sjoerd mentioned to me a book from the early 90s called “Obfuscated C and Other Mysteries” by Don Libes.
It contained its own analysis of mullender.c and he thought it was akin to what I was doing. Naturally, I had to have it. So I found a used copy
for a reasonable price and patiently waited for it to arrive (Note: if you want this book keep checking eBay – not amazon!!).
It did arrive, and it’s quite wonderful. Below is the cover, table of contents, and the 5 and 1/4 floppy disk it came with (the only floppy of that size I’ve ever owned).
Obfuscated C and Other Mysteries
The interesting thing is not that I have this book though. No, this book is interesting because it contains original source code for the PDP and VAX programs
as well as the C program that generated the mullender.c submission. Without these there is no avenue to explore this program except through debugging tools.
Graciously, Sjoerd Mullender and Don Libes are allowing me to Host the excerpt regarding mullender.c here for all to read (PDF Warning).
All of the code credit goes to Sjoerd Mullender and Robbert van Renesse.
Copyright (1993) for the book belongs to Don Libes and has been reprinted with permission.
Extra bits⌗
This is just a collection of random things that happened that couldn’t really find an appropriate spot in the main post.
Documents and Websites I used⌗