At the beginning of May I digged up an old toy of mine, from about 2002-2004 - a "test computer" based on the 80C535 microcontroller. The computer consists almost exclusively of the controller, the memory (EPROM + RAM), a power connector and a serial port (RS-232) and some I/O ports. The serial port serves as a means of communication with a PC, allowing for uploading to it programs written in a simple assembly language.
Two problems appeared, though. The first one was that modern computers rarely have an RS-232 port, and laptops probably don't have them at all. This one was easy to solve by ordering a USB adapter from the internet. The second one was more serious.
In 2003 I was 15 years old, so as you can probably guess, I didn't have much influence on the design of the computer. It was designed by my teacher, who also provided us (me and the other students in the electronics club) with some software for writing and uploading programs. The problem is, during the 14 years that passed since that time, I lost the software and I have no contact with the teacher. Well, I said to myself, I'm an adult now and I'm quite good at programming, so I can probably figure this out ;)
And so began my adventure with reverse-engineering a toy from the electronics club.
Day 1
The USB-RS232 adapter has been ordered, but the postal service would take some time delivering it, so I started doing what I could - that is, analysing the connections on the board.
I found out that the computer contains an 80C535 processor, a 27C256 EPROM memory chip (32 kB) and a D43256AC-12L RAM chip (also 32 kB). Those chips are connected in parallel to the pins of the processor responsible for addressing the external memory and the data transfer. There is also one chip with NAND logic gates, at least one of which is being used as a negator, which suggested to me that the chips might be in separate address spaces. Let me explain.
The processor uses 16 bits to address the external memory, which gives it 2^16 = 65536 possible addresses - that is, 64 kB of address space. Let's now assume that we want to put the EPROM chip in addresses 0-32767 and the RAM chip in 32768-65535.
The addresses 0-32767 have the highest, 16th bit equal to 0, and 32767-65535 have it equal to 1. The other bits encode values from 0 to 32767 both in the lower and in the upper range. This means that we can connect 15 bits of the address both to the EPROM and the RAM and everything will be ok, we just have to activate only one of those chips based on the 16th bit. So, it's sufficient to connect the RAM chip to an AND gate which has the 16th bit and a pin activating the memory as inputs, and analogously the EPROM chip, but with the 16th bit negated. This hypothesis is supported by the existence of a NAND gate connected as a negator, but I couldn't check it properly, because some paths on the board were obscured by the chips. Oh, well.
The next chip contains 8 so-called D-latches and is connected to the processor's port 0. This port serves both as a data and address bus and the D-latches make it possible. When the processor wants to access the memory, it outputs the address first and activates the latches - this makes them "remember" the 8 bits from the port. Then it outputs the data on the same port. The address pins of the memory chips are connected to the D-latches, and the data pins - directly to the port 0. This way both the address and the data can make their way to the memory chip. Simple and effective.
The last chip worth mentioning is the one enabling serial communication. The serial port works in a different voltage range than the processor, which makes a control chip necessary. Two processor pins which transmit and receive serial data are connected to the chip, and the chip is connected to the port. This way everything can work without problems.
This analysis improved my knowledge about the inner workings of the computer, but it still didn't help me find an answer to the main question - how to actually communicate with it? This had to wait for the adapter.
Day 2
The adapter got here, time for communication attempts.
I use Linux on my laptop. Some people think of it as a system which has troubles with hardware, but my experience is much different - I think I haven't encountered a piece of hardware that wouldn't work on Linux out of the box. This time was the same - connecting the adapter resulted in an immediate appearance of /dev/ttyUSB0
in the filesystem.
A small problem appeared - I had no idea what should be the transmission baud rate (which is approximately equal to the bit rate). Well, there aren't too many possibilities, so I would find the correct one by trial and error.
In the meantime I was also looking for candidate software that could possibly be recorded on the EPROM chip (some software had to be there - the processor wouldn't be able to interpret incoming data by itself, something had to be controlling it). I found something called ASEM-51 and decided to try it.
The ASEM-51 manual called for setting the baud rate to 9600, connecting a terminal to the serial port and a reset of the circuit. I did that and... nothing. No reaction from the computer. I tried resetting it a few more times, but to no avail. The test computer was silent.
I tried another method: cat /dev/ttyUSB0 | hexdump -C
. This command tells the system to read the data directly from the serial port and pass it to a program which writes it on the screen as hexadecimal numbers. Reset, and... there is something! The data didn't tell me much and was very repetitive, something was not right. I decided to try other baud rates and at 4800 bps it looked a bit better. Importantly, '0D 0A' sequences appeared, which are ASCII characters for a new line to a trained eye. This looked promising. Unfortunately, none of the 51 received bytes exceeded the 01h-16h range (the 'h' suffix means it is a hexadecimal number), and readable characters in ASCII begin at 20h. In the case of ASEM-51 I should expect some readable prompt, so that's not it. I looked for some pattern in the message for a bit, that would tell me how to proceed, but nothing came to my mind. It looked like the fun would be over quite quickly.
Day 3
I had an enlightenment that day. I need to know how the program in the EPROM memory works, right? There is a "simple" way to get to know that. It's very easy to read the memory - you just supply the voltage to the right pins to pass the address and trigger a read, and the voltage on some other pins will tell me the memory content at that address. I can connect LEDs to the data pins and they will show me the value of the byte, I can also setup the cables so that it will be easy for me to set different addresses... I gathered my electronics kit and that is how the device from the picture below came to be.
The computer board served just as a power source here - I knew I have the right voltage on it and it has some pins connected directly to the power. The second board is a universal board, which I used to connect the EPROM chip, LEDs and the cables setting the address. Yellow LEDs showed the upper 4 bits of the byte, and green ones - the lower 4 bits. The button triggered the memory reads.
Armed with this circuit, I started reading. The first three bytes written hexadecimally were those: 02 02 03
. A quick glance at the processor manual let me determine that it is a jump to the address 0203h
, or in decimal: 515. Finally, a readable result!
I read a few bytes at the following addresses (3, 4, 5), but after something looking like another jump (02 40 03
... to the address 4003h
? 16387? I hope I won't have to read this many bytes...) repeated FF
started appearing, so I switched to the addresses starting at 0200h
.
This was more interesting, since it was without doubt executable code. One of the instructions disturbed me a bit: 12 07 45
. This is a function call to the address 0745h
, in decimal - 1861. This would mean I had at least 1300 more bytes to go... Not a very pleasant prospect, when reading a single byte takes about 30 seconds. After 128 bytes I called it a day and went to sleep.
Day 4
This day I decided that the task would be much simpler, if I had some switches for setting the addresses, instead of wiggling the cables (which were stranded wires as well and had the tendency to split upon moving them). I bought some and constructed the Reader v2.
Flipping the switches was much quicker and I started reading the bytes blazingly fast ;) I read over 700 bytes, which contained some interesting things.
The first one was the instruction 12 0B D9
. Another function call, this time at the address 0BD9h
- decimal 3033. That's great, the code turns out to be even bigger - even more reading. At this moment I already had about 800 bytes read, so about a quarter of the code length. Not that bad.
Another one was that there were a few byte sequences that were clearly text. First, the distinctive sequence 0D 0A
, a few 20h
(which are spaces) and some bytes from the range 41h
- 5Ah
- which are capital letters. Here is what the text said:
LJMP TO 4100H...
--EMON52-- version 0.1 (2.7.1992) RAMTOP=
INTERRUPT, IE0
INTERRUPT, IE1
INTERRUPT, TF0
- ...
The only reasonable use of such strings in a setup without a screen, but with a serial port, is sending them through that port. None of those strings appeared in the bytes I've been receiving, though. Either something in the program started by sending other sequences, or something was wrong. At that moment I had too little data to be sure, though.
On a side note: I found some mentions of EMON52 in the internet, which confirmed that something like that existed, but I couldn't find much more, apart from that it was probably a German program ;)
Day 5
More reading.
After some INTERRUPT...
strings the code began again. It had a very repetitive structure, though:
75 A8 00 90 XX YY 12 0B D9 02 04 AE
The same sequence repeated many times with different XX YY. XXYY
turned out to be the addresses of the INTERRUPT...
strings, and the fragments themselves appeared in places referenced earlier by the calls to the 0745h
function.
It doesn't look like much, but it was really a big hint. Let me explain from the beginning.
The 80C535 processor has something called "interrupts". Interrupts are something that (as the name suggests) interrupts the work of the processor and temporarily makes it execute some code at a predefined address - a so-called interrupt handling routine. There is more than one interrupt; they are usually numbered, and in the case of 80C535 they also have names: IE0, IE1, TF0, ...
Let's go back to the repeating fragments of code. 75 A8 00
writes 0 to the internal address A8
- and, as it happens, it's a flag that indicates if the interrupts are enabled. So, this instructions turns the interrupts off. 90 XX YY
loads the address of a string to the DPTR register - it's a register used for referring to data. 12 0B D9
- a function call. 02 04 AE
- a jump to 04AEh
, at which address we have... 02 02 03
, which is a jump to the beginning of the program.
Also, let's repeat that the addresses of those fragments were loaded into DPTR before function calls to 0745h
.
So, do you see now? No?
The first conclusion is quite obvious. The 0BD9h
function is always being called after a string address has been loaded into DPTR, and what can you do with a string? Probably write it to some output, here: the serial port. The suspicion is then that the function at 0BD9h
"prints" a string.
The repetitive fragments turn the interrupts off, print a string "INTERRUPT, xxx" and restart the program. They are probably interrupt handling routines, then. It's quite common for such routines to turn the interrupts off in the beginning, so that nothing else interrupts their work, and this is what we have here.
Let's go back to the 0745h
function for a while. What can it do? It is called always when an address of an interrupt handling routine is loaded into DPTR. It seems reasonable then that this is a function setting up the handling of interrupts. And this way we know quite a bit from just a few small fragments of code ;)
Fortunately, I already had the switches. I started reading the code from 0BD9h
and my suspicions were confirmed. I discovered a few functions that could print single characters, strings, hexadecimal numbers, as well as reading numbers and strings - everything, of course, through the serial port.
This concluded my day, but at this point I was convinced that I'm getting somewhere.
Day 6
So what next? Reading, what else.
This time I started disassembling the code as well, though. To be precise, I started it the previous day, but now I've been disassembling more or less on the fly.
What is disassembling? Well, it was quite soon that people came to the conclusion that a code like 75 A8 00 90 02 F3 12 0B D9...
is a bit hard to read and they invented "assembly". Assembly is a language which assigns something called a "mnemonic" to each instruction of the machine code - that is, a sequence of characters that reminds people of some word. For example, the code above could be written like this:
1 2 3 |
mov 0A8h, #0 mov DPTR, #02F3h call #0BD9h |
mov
- from "move" (copying describes the action better, but mov
got embedded in people's minds). The first instruction writes 0 to the address A8h
(the leading 0 is required for the assembler to know that this is a number, not a label - more about that in a moment). The second one writes 02F3h
to DPTR, the third one calls a function at 0BD9h
. Isn't it more readable?
Readability can be further improved using labels. Usually when one loads an address of some string or calls a function, they mean a particular structure in the code, and not some arbitrary byte number. When the code is changed, the addresses of functions and data can change as well. Manual adjustment of every reference would be a nightmare, and so labels were invented.
Labels allow the programmer to name places in code. For example, the code above could look like this:
1 2 3 |
interrupt: mov 0A8h, #0 mov DPTR, #int_txt call print_string |
And somewhere earlier we could have:
1 2 |
mov DPTR, #interrupt call set_interrupt_handler |
Now it is obvious at once what a given fragment of code does. The first one handles some interrupt by writing some text, and the second one sets the address of the first one as an address of a function handling an interrupt. There are no mysterious 0BD9h
or 0745h
, just names.
Obviously, the processor still needs the machine code, so such assembly language has to be translated first. The translation process is called "assembling" or "compiling" (although the latter refers rather to the translation of a higher-level language to assembly or machine code), and the reverse process - "disassembling". My goal was then to write as great a part of the code as I could in the form of assembly.
That's exactly what I did and I discovered even more interesting things. The results can be seen here.
Firstly, I discovered that the computer should display the text found earlier ("--EMON52--...") right after reset. Why didn't it do that? That's a mystery (which I'll solve in a moment). Secondly - the program creates a kind of an interactive terminal. It handles a few simple commands:
- The strings beginning with ":" are interpreted as Intel HEX format, which is a way of encoding code as hexadecimal numbers - this allows one to load programs into memory via the serial port
- After entering "P", the program prints 128 bytes of memory, starting at a given address - and so my reader became obsolete :) (provided that I could fix the serial port communication)
- After entering "X" it starts executing code at a given address
- Some other commands, which are less important
So, it now made sense to reassemble the computer and try to talk to it through the serial port. Why didn't it work, though...?
I decided to connect the thing to a computer with Windows. I installed the drivers provided by the seller, some small program allowing for communication through the serial port, ran it... --EMON52-- version 0.1 (2.7.1992)
. Huh, so it's just that the adapter doesn't work with Linux. Awesome.
Well, let's look for a solution. The adapter is based on a CH340 chip - let's look for the drivers. Linux has drivers for this chip built in the kernel, but they obviously don't work. I found the vendor's drivers on some website, downloaded, compiled them... Nope, doesn't work. Required files are missing. Eh. I installed the files required for compilation of kernel modules, tried again... Error. The drivers are too old.
Ok, another approach. I checked the vendor's website, everything was in Chinese, but there was a search box. I entered "CH340", got a result, a newer driver code. Compiling, success, loading the driver... Nope, still nothing.
I wanted to give up at that point, but I decided - no, it's impossible, someone must have had a similar problem. I looked into the driver's code, but understanding it was at least a few days worth of work and I just didn't feel like it. Well, back to google... There was a result: someone on the kernel's mailing list noticed the problem and suggested a fix. The post was from a year before, but I decided to try.
I downloaded the Linux code from GitHub, applied the patch, compiled, loaded, restarted the test computer... --EMON52-- version 0.1 (2.7.1992)
. YES! IT WORKS!
I tested the commands and indeed I have been able to read the code via my laptop. The reader had been set aside and I filled some remaining holes in the code by means of the "P" command (although, not all - maybe I'll come back to that).
I decided to write and execute a small program as a test. Since I had to do it manually, I settled for something simple:
75 B0 AA 22
This program loads AAh
(binary 10101010) to one of the output ports, where I can connect "displays" visible in the first picture. After execution a nice pattern of alternating on and off LEDs should appear.
To load the program, I had to choose a starting address for it and write it as Intel HEX. Initially I chose 8000h
, but it didn't work - if the ROM and RAM were adjacent in the address space, as I thought in the beginning, this should load my program to the start of RAM, but it didn't work like that. I needed a lower address, so I chose 4100h
, mentioned in one of the strings.
The full Intel HEX requires a checksum as well, which gives:
:0441000075B0AA22CA
(":" begins a HEX block, 04 is the number of data bytes, 4100 is the address, then 4 bytes of code and the checksum CA).
Ok, uploaded. Sending command X4100
... The LEDs were shining as they were supposed to :)
Summary
This is where I'm currently at. I'm also planning to write a simple assembler (in progress), a program for automatic uploading of the code to the processor (maybe I'll exercise Rust a bit...), and some simple programs. When I create something, I'll definitely write about it here. Until the next time :)