PC Engine Compatible Hardware Development Note

Japanese (euc_jp)

I Started developing a PC Engine compatible hardware. I'll upload information about developing it. Please excuse my sloppy English, as I don't regularly use it in my daily life.


[10.16.2005] Motivation

  1. I want to keep the "Ys book I & II" playable even after my PC Engine is broken.
  2. PC Engine hardware is getting very old --> it won't last so long --> only games will remain, but no hardware --> hardware is hardly available --> nobody seems to make alternative PC Engine hardware --> I have to make it
  3. There is no PC Engine + CD-ROM^2 system compatible hardware for FREE.
  4. I want to make something on weekends.
  5. I want to improve my technical skills.

[Nov.02 2005] Current Status

The progress on my feeling:

[12.27.2003] Building a ROM emulator for PC Engine

Since I'll probably fall into situations that I need to test things on the real machine, I decided to build a ROM emulator. Here are some images of what I've got.

I bought an USB interface. It says "FIFO" rather than "parallel port"
so I'm not sure if this will work as I expect.

Deciding layout of ICs. I have cut the board in half.
It's going to be smaller than the one I built for Famicom.
There would be a lot of wires on the solder side (like spagetti)
if ICs are as close to others as such. I'm going
to try some kind of wire so the solder side won't go crazy.
I'll upload the solder side image after I finish wiring them.



I'm debugging the ROM emulator hardware. Most part of the circuit is taken from ChaN's ROM emulator. I expanded address lines so it can support 4M bit SRAM. I was going to control the hardware through USB, but it's currently controlled through parallel port since I now realized that the USB interface requires a CPU to be controlled...

Here I uploaded a snippet of control program written in C. It uses parallel port. Although I'm still debugging the hardware, I think this code will work.

#define CTRLPORT        0x378

static unsigned char    s_PrevData = 0x00;

        Change the ROM emulator to wite-mode,
    and zero-clear the counter and the shift register.
    outportb(CTRLPORT, 0x00);
    outportb(CTRLPORT, 0x08);
    s_PrevData = 0xff;

        Write 1 byte to the ROM emulator.
    unsigned char   v)
    if (v != s_PrevData)
        int             i;
        unsigned char   bit0;

        s_PrevData = v;
        for (i = 0; i < 8; i++)
            bit0 = ((v << i) & 0x80) ? 1 : 0;

            outportb(CTRLPORT, 0x08|bit0);      // clock=L
            outportb(CTRLPORT, 0x08|bit0|0x02); // clock=H

    outportb(CTRLPORT, 0x0c);   // strobe=H
    outportb(CTRLPORT, 0x08);   // strobe=L

First you call rome_reset(), then call rome_sendbyte() until whole SRAM bytes are written. After the write, outportb(CTRLPORT, 0x00); then the target can see the hardware as a ROM. [01.04.2004] PICTURES OF THE ROM EMULATOR


Some notes on the ROM emulator...

I used a general purpose transistor 2SC1815 for the RESET signal output. Any "general purpose" transistor should work. But in the case of the ROM emulator for PCE, I had to add a 1.5[kohm] base resistor to ChaN's ROM emulator in order to make it work. Without this resistor connected to the base of the 2SC1815, the G1 and G2 pins of 74HC541 (pin #1 and #19 which opens/closes address lines from PCE) become somewhere around 2.5[V], which I think is troublesome voltage for CMOS ICs. It seemed that too much current flew through the ICs when such voltage is supplied to their input pins. One of the ICs actually became very hot that I thought I had blown up the IC.
incorrect: "the G1 and G2 pins of 74HC541 (pin #1 and #19 ..."
correct: "the G2 pin of 74HC541 (pin #19 ..."

Another thing to note is that you should connect any unused input pins of CMOS ICs to either +V or to the GND to avoid Latch-Up condition to occur and break them. I would connect input pins of inverting ICs (such as 74HC14) to +V, and input pins of non-inverting ICs (such as 74HC541) to the GND so their output become 0V.

I used HM628512BLP-5 for the 4 Megabit SRAM. It's access time is 55[ns]. Suppose the gate latency is 30[ns], then the overall latency of the ROM emulator would be 55 + 30 = 85 [ns]. Since the access time of the main memory of my PCE seems to be 100[ns], I think the ROM emulator is responding quick enough.

I currently take power supply of the ROM emulator from USB port, not from PCE. It didn't work when I took power supply from PCE's cartridge connector. The PCE's power supply circuit uses 78M05, if I remember. So what I think is that there is not enough capacity left in the PCE's power supply, but I don't know.

I used 4 Megabit SRAM, which is too large for the purpose of just executing simple test programs. It takes more time to program as well (it takes about 15 seconds to program a whole 512KB SRAM). For the case of simple tests on the PCE, I think 512 Kbit (64KB) SRAM should be enough. Since you don't have to add an extra 74HC541 and a 74HC590 to ChaN's circuit if you use 64KB SRAM, building the hardware should be relatively easy. [01.17.2004] Circuit Diagram of the ROM Emulator
NOTE: Name of control signal lines are different from ChaN's circuit due to the circuit editor I used (CEAT V2.4). See the pin numbers.



The emulator is no more open to the public.

[Oct. 16, 2005] Changed the Main Point of the Project

Since there are some people who just don't use my emulator properly, I've changed the main point of the project to "Developing a compatible hardware."

I'm currently using BSch for circuit editing. I'll build my own circuit simulator, or use a freeware. The actual building of the circuit will probably be done by utilizing CPLD, but I suppose as if it's going to be built using general logic ICs, for studying sake.

I'm very new to the digital hardware, I'm not sure whether I can do it or not. So this time I would like to invite you for collaboration. If you are interested, please drop me a line:
or you can just write here: PCE Compatible Hardware Project BBS

Please don't use any information found hereafter for commercial purpose.


[Oct. 16, 2005] The Timer Circuit

There is no reason, but I started with making the timer circuit. It looks a little bit messy. I haven't debugged it, so it probably doesn't work yet.

/RD and /WR are read/write request signals already address-decoded for $FF:0C00.

The PC Engine Timer is constructed with

  1. 1/1024 prescaler
  2. 7-bit down counter
  3. 7-bit reload register
  4. 1-bit control register

I'll list some important characteristics of the PC Engine Timer.

[Oct. 23, 2005]

I simulated the 4040 using D-FF with my hand-made simulator.

Simulator source code(C++), executable(main.exe), and the result(result.txt)


[Oct.28 2005] The Address Decoder Circuit

I'll start from easy ones. I have drawn the address-decoding circuit. I still haven't debugged it, so it probably won't work yet.

Below is a memory map compatible with the PC Engine. Although I think this one has above 95% compatibility with the real machine, it's compatibility still hasn't fully verified.

  Logical Address  |  Physical Address  |  Activated Signal
  00:0000-7F:1FFF  |   000000-0FFFFF    |  /ROM
  68:0000-87:1FFF  |   0D0000-10FFFF    |  /CDRAM (64kB + 192kB)
  F7:0000-F7:1FFF  |   1EE000-1EFFFF    |  /BRAM
  F8:0000-FB:1FFF  |   1F0000-1F1FFF    |  /MAINRAM
  FF:0000-FF:03FF  |   1FE000-1FE3FF    |  /VDC
  FF:0400-FF:07FF  |   1FE400-1FE7FF    |  /VCE
  FF:0800-FF:0BFF  |   1FE800-1FEBFF    |  /PSG
  FF:0C00-FF:0FFF  |   1FEC00-1FEFFF    |  /TIMER
  FF:1000-FF:13FF  |   1FF000-1FF3FF    |  /PAD
  FF:1400-FF:17FF  |   1FF400-1FF7FF    |  /INTCTRL
  FF:1800-FF:19FF  |   1FF800-1FF9FF    |  /CDROM
  FF:1A00-FF:1AFF  |   1FFA00-1FFAFF    |  /AC
NOTE: addresses are in hexadecimal notation

You may have noticed that the rage of ROM and CDRAM are overlapped. This will be bank-switched using the signal which goes to "LO" level when a cartridge is inserted.

The upper two digits "XX:" of the Logical Address are the values of the MPR register within the CPU. You probably won't understand Logical Address section of the above table if you are not familiar with this MPR register. So I'll briefly explain about the MPR register.

The basic idea of the MPR register is the same as the "segment register" in the i8086. In the i8086, +1 of a segment register corresponds to +16 of the physical address. But in the PC Engine hardware, +1 of a MPR register corresponds to +8192 of the physical address.

There are 8 MPR registers in the CPU, and each one holds a segment number (0-255) of 8kb ($2000 bytes) segment blocks. This segment number directly becomes the upper 8 bits of the physical address.

The reason why there are 8 MPR registers is that there are 8 blocks of 8kb ($2000 bytes) segments in the 16-bit CPU effective address memory space. How these 8 MPR registers MPR0-MPR7 correspond to these 8 blocks of segments are shown in the table below. Values 0-255 are written to each MPR, and the address conversion from logical address to the physical address is done using values in the MPR registers. Suppose x is the upper 3 bits of the effective address, and Offset Address is the upper 13 bits of the effective address, then the address conversion follows the equation:

Physical Address = ([MPRx] << 13) + Offset Address

where [MPRx] is the value held by MPRx, and << 13 is operator for 13-bit logical shift left.
  Effective Address  |      Valid MPR     |   Offset Address
    0000-1FFF        |        MPR0        |      0000-1FFF
    2000-3FFF        |        MPR1        |      0000-1FFF
    4000-5FFF        |        MPR2        |      0000-1FFF
    6000-7FFF        |        MPR3        |      0000-1FFF
    8000-9FFF        |        MPR4        |      0000-1FFF
    A000-BFFF        |        MPR5        |      0000-1FFF
    C000-DFFF        |        MPR6        |      0000-1FFF
    E000-FFFF        |        MPR7        |      0000-1FFF
NOTE: addresses are in hexadecimal notation

When the CPU is reset, MPR7 will be loaded with zero. CPU accesses the effective addresses $FFFE and $FFFF for reading the reset vector. So the effective addresses are decoded to [MPR7]:1FFE and [MPR7]:1FFF respectively, and the physical addresses actually accessed are $1FFE and $1FFF. This means that the reset vector needs to be stored in the first $1FFE and $1FFF of a ROM.

I'm going to verify the address-decoding circuit some time soon.


[Oct.29 2005] The Timer Circuit Revised

I have redrawn the timer circuit.

[Oct.30 2005]

I have redrawn the timer circuit again.

I have simulated the main part of it. It is too unefficient to simulate the circuit with my own simulator (since I need to debug the simulator before I can debug the circuit), I'll try something different.

Simulator source(C++), executable(main.exe) and the simulated result(result.txt)

The simulator writes '1' to the control register on the 1023rd clock. This triggers a value stored in the reload register (fixed to 0x1f) to load to the down counter, and enables the count-down behavior. Before the simulation is finished, it write '0' to the control register, and checks if the count-down is disabled.

Below is a truth table of the /LOAD signal. +EDGE is the output Q of U10, which becomes '1' when detecting a rising edge of COUNT signal.

    0    |    0    |   0   |   1
    0    |    0    |   1   |   1
    0    |    1    |   0   |   1
    0    |    1    |   1   |   0
    1    |    0    |   0   |   0
    1    |    0    |   1   |   0
    1    |    1    |   0   |   0
    1    |    1    |   1   |   0
From the table, the following equation is obtained.


Since we have a lot of unused NAND gates, we change the equation so we can use the unused NAND gates.


The /LOAD is generated using the CPU's write strobe signal (CTRL.WR), so I may need to modify the circuit if its timing is different from what I expected.


[Nov.02 2005] The Interrupt Controller Circuit

I just drew it. It still hasn't been tested.


[Nov.03-04 2005] The 6502 Timing

Here I translated the article Nuts and Volts (COMPUTE II ISSUE 1 / APRIL/MAY 1980) into Japanese. But I don't think you need the translation. :) I just leave the figures which are a bit hard to see in the original page.

The 6502 Write Timing

The 6502 Read Timing


[Nov.19 2005] Implementing the Timer Circuit in Verilog-HDL

I have written the timer circuit in Verilog-HDL. ==> view source

The Verilog-HDL code is verified with the learning board which came with the book "FPGA ボードで学ぶ論理回路設計". It ran too fast with the 33[MHz] system clock, so I added a divide-by-1024 counter before the prescaler.

Debugging the down counter. The single LED is the /TIRQ output.
Two of the four buttons are allocated for /RESET and /TIRQACK.

Here are some more notes on implementing the Timer.


[Nov.20 2005] Implementing the Clock Divider in Verilog-HDL

I have written clock dividers which generate clock frequencies used in the PC Engine hardware. With the 1/2 and 1/3 dividers, 10.73863[MHz] and 7.159090[MHz] clock frequencies are generated from the system clock 21.47727[MHz]. Then each outputs are fed to the other 1/2 dividers so 5.369317[MHz], 3.579545[MHz] and 1.789772[MHz] frequencies are obtained. ==> view source

This time, I tried the iverilog + IVI for its verification.

Simulated result with 10[MHz] system clock

I have also verified that it works on the FPGA chip.


[Dec.04 2005] Implementing an ALU in Verilog-HDL

ALU (=Arithmetic Logic Unit) is a combination circuit which is responsible for executing operations of a CPU (addition, subtraction, AND, OR, EXOR, bit shift, etc.). Since it is a combination circuit, no FF circuit nor a clock signal input is required.

In HuC6280, when D flag is set (D=1), one extra clock cycle is required to do the decimal mode ADC/SBC. I guess that it does the normal ADC/SBC first, and then it converts the result into BCD on the next clock. So I have implemented a "BCD adjustment" function to the ALU.

[Dec.11 2005]

I have rewritten the code. I think it's fairly close to completion. I have checked the code to work on the FPGA chip, but there may be some bugs left. ==> ALU source
==> general decoders (used by the ALU)
==> Adder source (used by the ALU)


[Dec.10 2005] Implementing an Adder in Verilog-HDL

I have written an ALU, and I decided to look more closely on the adder.

An adder can be constructed without using any FFs. The basic circuit is a 1-bit full adder. Below is the truth table of the 1-bit full adder.

  A  B  Ci | Co S
  0  0  0  | 0  0
  0  0  1  | 0  1
  0  1  0  | 0  1
  0  1  1  | 1  0
  1  0  0  | 0  1
  1  0  1  | 1  0
  1  1  0  | 1  0
  1  1  1  | 1  1

nA  = not A
nB  = not B
nCi = not Ci

Co = (A*B) + (A*Ci) + (B*Ci)
S  = (nA*nB*Ci) + (nA*B*nCi) + (A*nB*nCi) + (A*B*Ci)
If you use EXOR gates, the equation will be much more simple. But I heard that the delay time of EXOR gates are larger than the other gates, so I just don't use them now. Since CPLDs and FPGAs have EXOR primitives within the chip, so I don't think this is really true when it comes to CPLDs and FPGAs, but I just decided not to use them anyway.

If you make a circuit directly from the above equation, it will look like this.

1-bit full adder

The easiest way to expand this to an 8-bit adder is to "cascade" 8 of the 1-bit full adders' Ci and Co. But the problem is that when carry occurs on all the bits (i.e. $FF->$00), the Co signal on the MSB will be heavily delayed. The carry look-ahead (CLA) circuit is used to generate carry signals on the other way around. CLA is realized by expanding the Ci in the two equations of Co and S.

Suppose we have 8-bit inputs A0-A7, B0-B7, and a 1-bit input CF. The carry signals C0-C7 which occurs on each bits of an adder are calculated as follows. The expantion of right hand side is omitted on C2 and thereafter.

C0 = A0*B0 + A0*CF + B0*CF
C1 = A1*B1 + A1*C0 + B1*C0 = A1*B1 + A1*(A0*B0 + B0*CF + CF*A0) + B1*(A0*B0 + B0*CF + CF*A0)
C2 = A2*B2 + A2*C1 + B2*C1 = ...
C3 = A3*B3 + A3*C2 + B3*C2 = ...
   :       :       :
C7 = A7*B7 + A7*C6 + B7*C6 = ...

Signals zero(Z), overflow(V), and negative(N) are implemented as well as the carry(C) signal within the adder. I have seen that overflow can be detected using the logic (C6 EXOR C7), so I just used it.

Here I uploaded the adder in Verilog-HDL. I have tested the code on the real FPGA.


[Dec.29 2005] OT: Building a XILINX CPLD Programmer

This is rather Off-Topic, but I suddenly wanted to build one, so I quickly made it.

Programming the XILINX XC9572-PC84-15 CPLD chip

This is a tool for "programming" a circuit into the XILINX CPLDs. If you buy it, it will cost about 6500yen ($55US). But if you build one by yourself, it will cost about 2000yen($17US). It doesn't make me feel that I saved money because it takes about half a day to complete it. And if you think about how much mony you can make if you work for half a day...

CPLD chips are only for fitting small-scale circuits, so it's definately not for fitting a whole PC Engine in it.

I referred to the circuit diagram found on the XILINX official web page. I used very old parts that I have had for many years, so I rather worried if those parts were still alive, but it worked on the first try.

I didn't draw any circuit diagrams. All you need is the circuit diagram of the download cable (found on the XILINX web page), pin assignment chart of XC9572-PC84 or XC95108-PC84, and pin assignment figure of 84-pin PLCC package.

I said I suddenly wanted to build one, but actual motivation is like this...

You probably never need to worry about RAM capacity once you are able to use PC's memory modules for hobby.


[Jan.01 2006] OT: Building a ROM emulator using XILINX CPLD

A happy new year. (^_^; I couldn't do anything impressive lastyear afterall. I hope I'll be able to proceed a bit more this year.

This is rather OT too, but I designed a ROM emulator using a CPLD.

Below is the block diagram.

Block diagram of a ROM emulator using XILINX XC95108PC84-15

HCmd, HClk and HData are command, clock, data output and datainput from PC, respectively. TAddr, TData, /TOE and /TWE are address, data, read strobe and write strobe of a target system. RAddr, RData, /ROE and /RWE are address, data, read strobe and write strobe of a S-RAM.

First you output a command to HCmd, and change HClk from '0' to '1' to submit command to the control logic. Below is a list of commands.

Since the data lines are connected to RAM and CPU data buses, they need to be bi-directional (inout). This is what makes it complexed. From the CPLD point of view, RData outputs data only when /RWE is active(0), while TData outputs data only when /TOE is active(0).

==> Verilog-HDL source of the control logic

Below is the simulated result of the control logic.

Simulated result of the ROM emulator control logic

I'm going to actually build one some time soon.

[Jan.04 2006]

I've done wiring, but it didn't work. As always, it really makes me down when I realize that the circuit isn't working...

I've taken a photo anyway.

ROM emulator using CPLD

My impression so far is that number of wires became less by using CPLD, indeed, but not that fewer so that I would feel like building two or more.

I'm going to debug it from now... Not sure whether I'm doing this right.

[Jan.05 2006]

The software was what contained the bug. I spent hours looking for a mistake in hardware. Ouch! Or is it a kind of bug that you think you have a bug where there really isn't any?

I quickly drew the circuit. You'll see that only thing I'm doing is wiring them. I haven't verified the pin numbers. You can fix it as you like if you find it wrong.

Circuit diagram of the ROM emulator using CPLD

Also, I put my "probably-complete" control program here. For controlling the PC's parallel port under Windows XP, I used the PortTalk by Craig Peacock.

Souce code to the control program

I tried so it would be easy to read, but I seem to ended up with a weird thing. As you can see from the circuit diagram, all the signals from/to the parallel port go through the HC14, and so the logic is inverted. Therefore all the data are inverted by the software before they are sent to the hardware.


[Jan.09 2006] PSG Volume Attenuator

I'm currently implementing the PSG in Verilog-HDL. It doesn't seem very hard in the waveform output part, but the volume control part seems very difficult.

There are three attenuators implemented in the PSG for audio volume adjustment. One is for adjusting both left and right volumes simultaneously within a channel (CH-ATT), another one is for adjusting left and right volume independently within a channel (L-ATT, R-ATT), and the last one is the master attenuator for adjusting the whole output volume (M-ATT). In the real hardware, they might be implemented as amplifiers rather than attenuators, but they both are just matter of amplitude ratio, so I think either way is fine. It looks like I've implemented the CH-ATT as an amplifier this time.

Since these attenuators changes the amplitude in logarithmic scale, you basically need to calculate logarithm in hardware. But it would probably use too many gates, so I decided to implement only the CH-ATT in digital, and implement the rest in analog circuit.

CH-ATT is a 32-step attenuator with 1.5[dB] attenuation per step. The weight on each bit of the 5-bit step input is as follows.

	D0: 1.5 [dB] ~= x1.1885 ~= x(1+1/8+1/16) = x1.1875 ~= 1.4927[dB]
	D1: 3.0 [dB] ~= x1.4125 ~= x(1+1/4+1/8) = x1.4375 ~= 3.1522[dB]
	D2: 6.0 [dB] ~= x2.0 = << 1 ~= 6.0206[dB]
	D3: 12.0[dB] ~= x4.0 = << 2 ~= 12.041[dB]
	D4: 24.0[dB] ~= x16.0 = << 4 ~= 24.082[dB]

	D0|D1: 4.5[dB] ~= x1.6788 ~= x(1+1/2+1/8+1/16) = x1.6875 ~= 4.5449[dB]
~= means "roughly equal". Being aware that 6.0[dB] is approximately x2, 12[dB] is approximately 4, and 24[dB] is approximately x16, these high-cost logarithm calculations are approximated using logical bit shift operation.

I have implemented the CH-ATT using this approximation.

Source ([Jan.15 2006] deleted due to abolishment)

The output of the CH-ATT will be fed to the attenuators L-ATT/R-ATT. With these, the left and right volumes on each channel can be changed independently, and so panpot setting can be done. These are 16-step attenuators with 3.0[dB] attenuation per step. I really want to implement these attenuators digitally, but even more gates the CH-ATT would be required to implement them because of the fixed point calculation (if you keep "multiplying" data, more bits are required to represent results). So I'll just wait until I see there are plenty of unused gates. In the meantime I'll try to use this analog circuit.

Attenuator circuit

Since this is the first time I've drawn a circuit like this, I have no idea whether this thing would work or not. It seems to me that the audio signal would be affected alot by the switching noise generated by digital gates. The left one is the L-ATT. You use the same circuit for R-ATT as well. A L-ATT and a R-ATT exist within each PSG channel. Since there are 6 PSG channels, you need to make 6 sets of L-ATT and R-ATT, 12 circuits in total. Wow, that would be a lot... A buffer may be required right after the ladder resistors.

The one on the right is the M-ATT. Two of them are required for left and right. These attenuators are the last ones which are fed from L-ATT and R-ATT outputs. The output of M-ATT finally becomes the final audio output of the console.

I'll debug the circuit sooner or later.


[Jan.15 2006] Implementing the PSG in Verilog-HDL

I implemented the PSG in Verilog-HDL. It seems to output waveform, but I haven't actually listened to it. So I don't know whether it is correctly working yet.

I'm still suffering from how I can implement the attenuators (ATTs). I changed my mind, and decided not to build all those analog circuits I designed above. I rather thought that it would be ALOT easier even if I add another FPGA solely for the ATTs. So this time, I put up with using more gates for the ATTs, and implemented the whole thing in digital. I tried hard using shift operations before, but I simply used multiplication this time.

The three ATTs can be summed up into one ATT by adding their attenuation levels. This can be done because all of them are in logarithmic scale, and also the following equation is true in logarithmic scale:

log10(A) + log10(B) = log10(A*B)

You notice from this equation that the multiplications in linear scale become addition in logarithmic scale. Here I will call the three ATTs as same as before: CH-ATT, L-ATT/R-ATT, and M-ATT. We add all the attenuation levels together, and we get the overall attenuation level.

L-ATT/R-ATT and M-ATT have different attenuation level per step than CH-ATT. The CH-ATT has the smallest granularity. It has 5-bit step input, 32 steps, and -1.5[dB] per step. whereas the others have 4-bit step input, 16 steps, and -3.0[dB] per step. This means that one step of L-ATT/R-ATT and M-ATT is equal to CH-ATT's 2 steps.

We need to unify the attenuation level per step in order to sum up all the ATTs into one. Here, we just assume L-ATT/R-ATT and M-ATT has 5-bit step input and -1.5[dB] attenuation per step, but we skip the step input by one. In another words, we multiply the step input by 2, and divide the attenuation level per step by 2.

Sorry if my explanation was too stupid. Anyway, once we have done this, then we can just add the attenuation level of three ATTs together and attenuate waveform by that value. Let me just call this unified ATT "composite attenuator (CATT)".

The ranges of each ATT's step input are:

0 <= CH-ATT <= 31

0 <= L-ATT/R-ATT*2 <= 30

0 <= M-ATT*2 <= 30

So the CATT's step input value is 31+30+30 = 91. Therefore CATT is an attenuator which has step input from 0 to 91, with -1.5[dB] attenuation level per step.

→ Verilog-HDL source for the CATT

Although we unified the ATTs into one, we need two of them since the PSG's audio output is stereo.

The output is designed to be 16-bits. The upper 5 bits is the integer part, and the lower 11 bits is the fractional part. The waveform level is internally calculated as 16+5 bits, but outputted as 16-bits.

Now for the PSG hardware. I used up most of my energy writing the ATT section, so I'll just explain the points.

Six of "PsgChannel" circuits belong to "PSG". PSG is a mediator, and each PsgChannel outputs waveforms.

PSG basically works at 3.579545[MHz]. But I used the system clock 21.47727[MHz] = 3.579545*6[MHz] (!) for the signal attenuation part, since I could sequentially attenuate all the 6 channels on each rising edge of it. In this way, I don't need to use one CATT per channel. The side effect of doing this is that output of some channels are delayed by 1/3579545 second. But I think this is OK, because the PSG is not structured as to be able to start output mutiple channels on the same edge of the clock in the first place.

I said PSG basically works at 3.579545[MHz], but I used 7.159090[MHz]. This is because I thought the PSG working slower than CPU may cause timing faults. But I'm not really sure about this at the moment, because the PC Engine CPU can work at 1.789772[MHz] when CSL instruction is executed...

Noise channel and LFO are not implemented yet.

I compiled the whole PSG hardware targeted for ALTERA Cyclone, and it used 1330 LEs - much more than I expected...

→ Verilog-HDL source for PSG

→ Verilog-HDL source for PsgChannel


[Jan.21 2006] 65C02 Test Circuit

I asked one of my foreign friends if he could send me some W65C02s, and he kindly sent me five of them (there is no way to buy 6502/65C02 chips now in Japan).

I quickly drew a circuit mainly to test this interesting chip.

65C02 TEST CIRCUIT (doesn't work)

I'm going to use a CPLD for implementing devices other than CPU, ROM, RAM and crystal oscillator. Well, they are actually clock dividers and address decoders, so they must fit within a XC9536PC-44. I'm not going to use general logic gates for these, since the total propagation delay would probably be larger than CPLD, and so I would have to use faster memory chips.

The circuit made of three FFs is a divide-by-three frequency divider of 50% duty cycle. I managed to get a 21.47727[MHz] crystal oscillator, so I'm going to divide this frequency by three and run the 65C02 at 7.159090[MHz]. Here is the timing chart of the frequency divider (NOTE: actually this is the the timing chart of the fixed one shown later):

         __    __    __    __    __    __    __    __    __
CLK   __|  |__|  |__|  |__|  |__|  |__|  |__|  |__|  |__|  
            _____             _____             _____
FF1Q  _____|     |___________|     |___________|     |_____
                  _____             _____             _____
FF2Q  ___________|     |___________|     |___________|
                     _____             _____             __
FF3Q  ______________|     |___________|     |___________|
                  ________          ________          _____
PHI2  ___________|        |________|        |________|
But this is for using general logic ICs. I have already implemented 1/3 frequency divider in the [Nov.20 2005] Implementing Clock Divider in Verilog-HDL section, so I'm going to use this one for CPLDs.

I'll just connect /IRQ1 line to /IRQ, and /TIRQ to /NMI.

154, 138, and 00 are used for the address decoding. This is quite different from the PC Engine hardware, but it's OK since this is only for testing.

I'm going to use a 32kB ROM, and a 32kB RAM. When the A15 line is 'L', RAM is selected, otherwise ROM is selected. I/O is mapped from $8000 to $FFFF... ooops! This is going to be read-only... Ouch! It looks like I need a radical reconsideration. (^_^;

For instance, I used 27256 for the 32kB ROM. But it can be a EEPROM or even a ROM emulator, which I'm going to use.

By the way, what are the good values for R and C in the crystal oscillator circuit? I searched a little, and it seems that R is between 1[MOhm] and 10[MOhm], and C is between 10[pF] and 20[pF]. I put 1/2 divider so I can observe the waveform with my old oscilloscope I recently got.

21.47727 [MHz] oscillator circuit

I quickly made a experimental circuit on a bread board. The circuit is pretty simple, but it looks rather complicated when it is built on a breadboard (mainly because unused input pins are connected to Vcc or GND). It is said that you should use dip-mica or film capacitor for C to prevent from being affected by surrounding temperature. But I didn't have such good ones, so I just used ceramic types.

Experimental circuit of 21.47727 [MHz] oscillator

I observed the waveform. Wow, I see something!

Observing 10.738635 [MHz] clock frequency

You can't see the waveform properly from the above picture since the range is not properly adjusted. But actual waveform looks very much distorted. Then I tested with other clock signals in different hardwares, but they also looked very dirty. So I don't think the bread board is causing this. This is probably because my probe for the oscilloscope is improperly made. Well, it's not even a "probe". I just hooked it up...

I was too lazy to fix the 65C02 test circuit at the top of this section, and I wrote it in Verilog-HDL instead. I'll just leave the circuit as it is. But I did fix the 1/3 divider circuit, so I just show it here.

Fixed 1/3 clock divider (not tested)

Perhaps it's getting unclear about what I'm doing, but I'm going to stop even more on the way to making PC Engine compatible hardware.

[Jan.22 2006]

I debugged the address decoder + clock divider in Verilog-HDL.

==> address decoder + clock divider in Verilog-HDL

I decided the memory map as follows:

$0000-$3FFF  RAM (16kB)
$4000-$47FF  VDC
$4800-$4FFF  VCE
$5000-$57FF  PSG
$5800-$5FFF  Timer
$6000-$67FF  Pad
$6800-$6FFF  IntCtrl
$7000-$77FF  CdRom
$8000-$FFFF  ROM (32kB)

$0000-$3FFF  RAM (16kB)
$4000-$7FFF  RAM (16kB)
$8000-$FFFF  ROM (32kB)

A 32kB RAM is still used, and is bank-switched with 16kB on each bank. The 6502 CPUs don't distinguish I/O space and memory space. So if you place a 32kB ROM + 32kB RAM like a Z80 system, there is no space left for I/O. Also 6502s need interrupt vector table located from $FFFA to $FFFF, and zero-page and stack from $0000 to $01FF. Therefore you normally place RAM from $0000, place ROM up to $FFFF, and place I/O somewhere between RAM and ROM. Here, I/O is placed from $4000 to $7FFF (in I/O mode).

Writing zero to anywhere from $8000 to $FFFF will set to "I/O mode", and writing 1 will set to "RAM MODE". A 16kB of RAM is enough since it is already twice as much as PC Engine, but since we have another 16kB, and there is no reason for not useing it, I just made it possible to use it.

And for the clock divider. I had many CPLD pins left unused, so I made it output 10.73863[MHz], 7.159090[MHz], 5.369317[MHz], 3.579545[MHz] and 1.789772[MHz] clock signals from the 21.47727[MHz] system clock input.

I recently got ALTERA EPM7160ELC84-10 very cheaply. But when I tried to use it, I had to install MAX+PLUS II design software, and it didn't support Verilog-HDL by default, and more over, it required a special hardware to program. So I gave up using this CPLD. Too bad!

Hence I ended up waiting for a CPLD to arrive.

[Feb.05 2006]

I finally got a CPLD that I can use, and I quickly wired it, but it didn't work! Why doesn't it work!? ...Too bad... This weekend is about to end... I have to wait until next weekend... I don't know how I can debug this...


I implemented the interrupt controller and the timer as well as the address decoder and the clock dividers. But since it didn't work, I don't have much to say... At least I can observe the clock signals, so the clock dividers seem to be working. I haven't connected the ROM emulator yet. Insted, I'm using a EEPROM (HN58C256-20) at the moment. This one has 200[ns] access time, so I fed 3.579545[MHz] clock to the CPU. But is it still too fast...?

[Feb.06 2006] - Debugging days 1 -

I made a manual clock circuit (generates a single clock cycle by pressing and releasing a push button) and checked each bit of the address bus with a tester. The circuit is from the book "CPU no Tsukurikata", with modified time constant.

Manual clock generating circuit (the inverter is 74HC14)

1 ffff    -- all the bits are set to '1' on the first clock
2 0724    -- a value I don't know
3 01fc    -- stack access; store PCH ?
4 01fb    -- stack access; store PCL ?
5 01fa    -- stack access; store flag register?
6 fffc 04 -- read lowe byte of the reset vector = 04
7 fffd e0 -- read high byte of the reset vector = e0
8 ????    -- unknown value from hereafter

I read somewhere that the first and second clock is "internal work", so I just skip them.

It is thought that reset is a kind of interrupt, and it seems that the address bus is actually accessing stack to save program counter and flag register. But after a little bit more of investigation, I found that the R/W signal doesn't become 'L', so these stack accesses must be invalid.

I first thought that ROM wasn't working, but it actually outputted the reset vector $E004. The reset vector is stored at $FFFC and $FFFD in the ROM. This means ROM is working... If the ROM is working, the address bus should output $E004 on the 8th clock. But it doesn't... The data bus connection is wrong...? ...Hmmm....

[Feb.07 2006] - Debugging days 2 -

The data bus connection was reversed.

The address bus value on the 8th clock was $0720. This is 0000 0111 0010 0000 in binary notation. If we group it as two 8-bit values, and reverse the bit order respectively, we get 1110 0000 0000 0100. This is $E004 in hexadecimal notation.

Anyway, the CPU seems to be working... I don't have time to fix it today, so I'll do it tomorrow. It's only to change 8 connections of wires, but the solder side of the board is pretty messy, and solder wouldn't easily reach the fixing point.

[Feb.08 2006] - Debugging days 3 -

It worked!

I connected the timer interrupt request singal to the /NMI input of 65C02. In the interrupt handler, I made it so it writes to the interrupt controller and acknowledges the timer interrupt. I observed the /NMI input pin of the 65C02 and it was periodically repeating 'H' and 'L'.

The CPLD chip on the bottom-right of the photo is the heart of the CPU board. It contains timer, interrupt controller, address decoder and clock divider. If I had to make this board without a CPLD, it might have become three or four times larger than the current size.

[Feb.10 2006]

Here are the circuit and the CPLD source.


The timer is modified as follows:

--> address decoder + clock divider + alpha in Verilog-HDL [Feb.11 2006 bug-fix2]

--> 65C02 test program source code [Feb.11 2006: added RAM test code]

Use PCEAS to assemble the 65C02 test program:

PCEAS -raw 65c02.asm

This will make 32kB ROM data(binary format).

Now it seems I finally can do some interesting tests with this board.

[Feb.11 2006]

I found another bug. The "locking edge" of 65C02 is the negative edge of PHI2, but the timer and the interrupt controller was looking at the positive edge. This caused the board not working at 7.159090[MHz]. I updated the source above.

I also slightly changed the ROM emulator. Here are the updated versions. The host PC program doesn't need any change.

Circuit diagram of ROM emulator using CPLD (fixed)
Fixed CPLD source (Verilog-HDL)

The ROM emulator and the 65C02 test board working together

I hope there is no more bug...

Yet I found another bug. (^_^; I couldn't access to the banked RAM. I updated the above source as BUG-FIX2. I also added a RAM test code in the 65C02 test program.

This is the part contained the bug:

    always @(posedge i_RW)
        if (w_ExRAM)
            r_ExRAM <= i_D0;

If you think this is "a matter of course", yes it is... I changed it to this.

    always @(negedge o_Clk7M159)
        if (w_ExRAM & ~i_RW)
            r_ExRAM <= i_D0;

What I learned from this is that you mustn't do what you mustn't do. Sure...



[Feb.04 2006] Obtaining XILINX XC3S200PQ208 (SPARTAN 3)

I bought a couple of SPARTAN 3 chips. My cheap digital camera can't clearly photo the gap between pins...

XILINX SPARTAN 3 FPGA Chips (XC3S200PQ208) and configuration chips (small ones on the left)

Since the pich of pins are too small to use (0.5mm), I haven't logically obtained these chips yet.

I can't say I obtained these chips until I actually become able to use it, by implementing them on the pich converter board:

QFP 208-pin --> 2.54mm pich converter board (Sanhayato QFP-51)

Hence I'm going to solder these chips on the pich converter boards. It seems pretty tough...

It was easier than I thought. Here is a photo which can be enlarged by clicking it. My cheap camera doesn't show it in detail though...

Implemented a XC3S200PQ208 on a QFP-51

The most important thing is placing the IC package precisely in position. Since pins of IC are very small, you won't need so much soldering iron. First you paint well-fluxed soldering iron on the board, then solder IC pins by gently pressing pins from top using solder. This resulted in pretty good quality.

I did it like this:

  1. Paint well-fluxed soldering iron on the board where IC pins will be placed.
  2. Place the IC package on the board as precisely as possible.
  3. Solder one pin at an edge of the IC package (gently contact the pin from top with pointed end of solder, then quickly remove it by sweeping outside).
  4. By doing 3, the IC package is probably displaced a little. So adjust its position by rotating the IC package with the soldered pin as the center.
  5. Solder the diagonal one pin at the other edge of the package.
  6. Make sure the IC package is placed precisely in position. If not, you can still slightly move the package to fix it.
  7. Solder pins at remaining edges.
  8. Now the position must be perfect. Solder all the pins by repeating the way you did in 3.
  9. Using a tester, check that all 208 pins are connected, and not short-circuited.

Now you have the Spartan-3 chip ready for use. Let's have some coffee and take a break. (^_^;

Obtaining SPARTAN 3 -FIN-


[Mar.04 2006] Building PROJECT PC2E Hardware

I started building a hardware which can actually work as PC Engine. But it seems that PC Engine hardware doesn't fit in a single XC3S200. At least I don't think I can, so I'm thinking of switching to XC3S400 later. Fortunately, XC3S200 and XC3S400 have the same pin assignment for the QFP208 package, So switching from XC3S200 to XC3S400 can probably be completed by simply altering the package.

PROJECT PC2E Hadware under development

Hopefully, this will be able to run Famicom/NES without any change.

Although they are not wired yet, there are three 32kB SRAMs to be implemented. One of them is used for MAIN RAM (8kx8 bits) + BRAM (8kx8 bits), and the other two are used for VRAM (32kx16 bits). I forgot that XC3 can't use 5[V] for I/O, and the only SRAM that I had that supported 3.3[V] operation was these. So I ended up using three SRAMs separately.

VREF = 1.25[V] of LM317T is used for VCCINT. Normally, 1.20[V] is supplied for VCCINT. The rated value of VCCINT is from -0.5[V] to 1.32[V]. The fluctuation of VREF of LM317T seems to be +/-0.05[V], so 1.20 <= VREF <= 1.30 [V], therefore I think this satisfies the rated value of VCCINT.

Solder side of PROJECT PC2E board

Three power supplies drive me crazy (I would never do this again...). I have wired only power supplies, but it already looks pretty ugly. There are probably too many by-pass capacitors, since I was afraid of malfunction caused by noise from power supplies. Surface-mount linear voltage regulators are used to generate VCCIO 3.3[V] and VCCAUX 2.5[V]. I was going to generate 1.25[V] with the one at the top, but it didn't generate 1.25[V] with the circuit same as LM317T (These SMD regulators are designed to output 2.5[V] when Adj is connected to GND??). Hence the top one is unused.

Now that it seems I finally finished wiring power supplies, I'll test if it is recognised by iMPACT.


[Mar.05 2006] Recognized by iMPACT

At last, it is recognized by iMPACT.

The JTAG logic of XC3S uses VCCAUX for its power supply. Hence the communication works at 2.5[V] (LVCMOS25). It seems that the CPLD programmer I made while back is called "Parallel Cable III", and it works at 5.0[V].

Then a question is can 5.0[V] be input to logic operating at 2.5[V]? In the case of XC3S, even 3.3[V] input will destroy the JTAG logic. But there is a documentation in the Xilinx webpage on how to deal with it. They say that a resister RSER must be connected in series with input. The value of RSER is 56[Ohm] for 3.3[V], and 300[Ohm] for 5.0[V].

Looking at the circuit diagram of Parallel Cable III, there are 100[Ohm] resistors implemented in series with input already. So it seems OK to operate it at 3.3[V].

But there was another problem. When it comes to using Parallel Cable III for XC3S configuration, the propagation delay of 74HC125 becomes a major problem, hence communication will fail. This information is written in detail in Nahitafu's page(in Japanese).

As written in Nahitafu's page, I changed 74HC125 to 74AC125. Then XC3S is recognized by iMPACT.

Changed 74HC125 of Parallel Cable III to 74AC125, and detected XC3S200 at 3.3[V] with Rser=56[Ohm]

Below is the photo of XC3S being detected. Sloppy wiring since I wasn't sure if it was going to succeed. I'll probably destroy the chip by accident, if I keep doing things like this...

The first success in XC3S200 boundary-scan

It was pretty tough... I'll draw the circuit diagram.


[Mar.10 2006] - Base Circiut -

I drew the circuit diagram. XCF02S is added & recognized.

FPGA Base Circuit

The VCCINT 1.2[V] generated by LM317T satisfies the absolute rating, but it doesn't fully satisfy recommended operating range. But since this is a prototype, I think it's OK.


[Mar.12 2006] A Measure for Signal Reflection (?), and DAC

Since I changed 74HC125 of Parallel Cable III to 74AC125, XC3S has indeed become configurable, but CPLDs became unconfigurable instead.

It seems that signal reflection is causing the problem, and the TCK line seems to be mostly affected.

It took me a while, but then I added the following circuit, and now it's working OK.

Parallel Cable III (measure for signal reflection(?)

The input is the signals from 74AC125 of Parallel Cable III. The output should be connected to TCK/TMS/TDI of CPLD. This will use all of six inverters in the 74HC14. I tested without 74HC14 (with only resistors), but it didn't work (74HC14 with no resistors didn't work either).

Below is my understanding of the circuit above, but I don't know whether I'm right or not.

Singal reflection occur at the end of wires. The input impedance of ICs (74HC14) are rather close to infinite(to signals, it should be like a thick rigid wall). So only a few amount of the singals, which has flooded into the input, can actually go through. The rest majority of them have to go elsewhere. But if this is the only input for the signal, then this is the terminal point for the signal. Since there is no way to get around, the signal reflects and start going back to 74AC125. What's happening at this point is a reflection, so the phase of the signal is reversed. The reversed signal is likely to interfere with either the current, or the next coming signal.

It is said that you should connect terminal resistors to prevent signal reflections from occurring. What this resistors do is that they sink terminal signal current to the GND. With the terminal resistors, signals which would cause reflection are absorbed to the GND.

The problem is, what values these resistors should be. I don't know the perfect answer. The idea is to first loosen the rigid wall of 74HC14 input by the resistor connected serial to the input and dull the reflection. Signal still reflect will be absorbed by the terminal resistor which is connected to the GND. The terminal resistor should be small enough so that the input impedance of 74HC14 (~=infinite) can be ignored, and large enough so that the output inpedance of 74AC125 (seveal ohm - several 10 ohm??). This time I decided to use 1[kOhm]. I also used 1[kOhm] connected to the GND.

Again, I don't know if this is correct. But at least it's woking now, so I guess it's OK.

... but this wasn't the main topic of the day. (^_^)

I made a D/A converter for audio output.

R-2R ladder-type D/A converter

Since I wanted to save I/O pins of FPGA, I made a serial audio output on the FPGA, made a CPLD to receive it, and then the CPLD output parallel data. Due to the limitation of the CPLD I used (XC9572PC44), the output format ended up in 15-bit stereo. After that, the parallel data is converted to analog audio signal by R-2R ladder D/A converter.

R-2R ladder D/A converter working

The FPGA outputs 15-bit L and R samples (30 bits in total) alternatively at 21.47727MHz. The output sample rate is 21.47727 / 30 = 715.909 [kHz].

I would like the FPGA to be a PSG for a while, and play around with some "beep" sound with the 65C02 CPU board.


[Apr.15 2006] Information on DAC

Here are the DAC circuit diagram and Verilog-HDL source code for CPLD.

R-2R DAC Circuit Diagram

R-2R DAC CPLD source code (dual 15-bit serial-in-parallel-out shift registers)

The circuit looks weird, because I was kind of playing with it. (^_^;

In the circuit diagram, two resistors, 1.5[kOhm] and 3.0[kOhm], are used. But actual implementation uses only 3.0[kOhm] resistors, and 1.5[kOhm] are made by connecting two 3.0[kOhm] resistors in parallel. There are 90 resistors used for the above circuit in total.

The precision of resistors are vital to the R-2R ladder type DAC. It's highly likely that resistors in the same production lot has the least relative errors. Therefore we should buy large quantity (say, 100) of resistors of the same resistance as a unit, hoping that they are made in the same production lot. I bought one hundred 3.0[kOhm] 5%-error resistors and measured the relative errors, and they were actually less than 1%. The relative errors I got here are probably less than the case of which I buy and use 1.5[kOhm] and 3.0[kOhm] resistors separately. Using 1%-error resistors should reduce the relative errors even more.

There are buffer-amplifiers added to the outputs, because the output impedance is pretty high due to the R-2R ladder resistors. Actually, I haven't implemented the buffer-amplifiers yet, so the effect of them hasn't been verified. However, by actually listening to the sound, I noticed that the quality is worse than my software-emulated PSG. I thought this was caused by the mismatch of impedance between the DAC output and my PC audio input, but I haven't been able to find the real cause.

R-2R DAC output waveform (visualized by Banno's "spwave")
Distortion at high frequency...?

The week point of this circuit is that you need DC decoupling capacitors connected to the output, which adds some distortion to the original waveform. It may get better if you insert small capacitors at the high-impedance side (input) rather than using large capacitors at the output side, and operate the buffer-amplifiers with 2 power supplies, in which case we can omit the output capacitors.


[Mar.19 2006] RS232C Receiver Module

I quickly implemented a RS232C receiver module for FPGA debugging purpose. I made it so quickly that it may still have some bugs left. It's set for 115200bps, 8-bit, one start-bit and one stop-bit. When it receives data with no stop-bit, then it's behaviour is indefinite. I tested on a FPGA chip and succeeded receiving values 0-255, so here it is...

--> source code

The input clock frequency is fixed at 7.159090[MHz], and you need to slightly modify the source if you want different frequencies. If you use 7.159090[MHz] and get 115200bps speed, you need a divide-by-62 counter since 7159090/115200 ~= 62. But here I just used a divide-by-64 counter instead. In this case, the baud rate becomes 7159090/64 ~= 111861. So the baud rate error is (115200-111861)/115200 * 100 ~= 2.90[%]. Since we transfer 10 bits as a unit (start-bit + 8-bit data + stop-bit = 10 bits), it should be safe if baud rate error is less than 100[%] / 10 = 10 [%]. Therefore, 2.9[%] baud rate error should be OK.

1/64 counter is made by a 6-bit binary counter. When 1 --> 0 transistion occurs on the MSB (bit5), that's 64 count. Everytime we detect a start-bit, we reset the counter to prevent the baud rate error from being accumulated.

The input signal i_Rxd is assumed to be the TXD signal from a PC's COM port which is already converted to the CMOS level by using MAX232 or compatible level converting device.


[Mar.25 2006] Playing PSG by RS232C Data Transfer

Using the RS232C receiver, I transferred a music data to my FPGA-PSG and made it play a music.

I actually tried with a commercial PC Engine game music data, and it sounded like the real PC Engine. It's too bad that these commercial game music data are copyrighted, and I can't just let you listen to it here... I wonder if there are any public-domain music data for PC Engine PSG? I should look for them.

Since the transferring speed is only 115200bps, the number of bytes it can receive in one frame is as much as 192 bytes (=115200/60). If there are more to be sent, the plaback will slow down. (^_^; (now I think I should have made a USB receiver)

I also implemented noise channels. But when I connect the "PsgChannelN", the PSG channel with noise capability, everything refuses to work. I guess there must be some problems like clock skew, delay, etc., but I don't know.

In a previous section, I made a 15-bit ladder type DAC, but somehow I feel that 1-bit DAC may sound better. Well, I'll just compare when I make a 1-bit DAC.

I want to upload the latest source code here, but I'm concerned about infringing NEC-HE patent, which is probably still valid. So I decided to wait and see. Personally, I think there is no problem because I'm not trying to make any money from it, but just in case... If you really want the source code, just email me, and I'll personally give it to you (I'll take this style hereafter).

Debugging the PSG using everything I can use

Hmm... I noticed that there isn't really anything this time. Well, I guess that's life.


[Apr.23 2006] Comparing PSG Playback

Since there was nothing to update, I quickly made a small music data, and compared the playback result among the three compatible PSGs: FPGA-PSG, PC2E (software emulation), and the real PC Engine. It's not really a good comparison, since my music data is not awesome, and FPGA-PSG is still a bit buggy.

playback by FPGA-PSG

playback by PC2E

playback by the real PC Engine

I finally found the name of the music: "Arabesques" by Debussy

I listened to the "Piano Practice 7" in Final Fantasy V, and made the data. It is written in MML, and compiled for HuSIC driver using HuSIC-Watch(HESw0073).

MML source code

HES data (LZH archive)

In comparison, I think the real PC Engine has a pretty effective low-pass filter. The beggining of the FPGA-PSG slows down a little, due to the slow transfer-rate of RS232C.

Although FPGA-PSG and PC2E sound pretty much alike, I think PC2E is much better. FPGA-PSG has some kind of pop-noise (signal overshooing??), and it's very annoying. On the other hand, PC2E sounds smooth, and no annoying pop-noise. I don't know whether the pop-noise is due to a bug or the load being too heavy (probably a bug though).

By the way, is there anyone who would like to provide better HES data?


[May 20 2006] Unsuccessful CPU

It seems that the way I implemented it was wrong. Now it utilizes more than 70% of XC3S200 gates, and started to become too large that I don't think it's practically useful anymore. I just leave the source code here as it is. The reason for more than 70% be NG is that PSG takes about 30%. I think CPU and PSG need to be fit within a single XC3S200, or I will end up using three or more XC3S200s for the whole PC Engine. But that probably won't satisfy my self-complacency. So I think it's good time to re-think about what I've done. (It's pretty amazing that I didn't realize it until I've written such a large amount of code...)

Unsuccessful CPU

Comments at the beginning are pretty much false, so don't believe it. (^_^; Most of them are correct, but some are incorrect. I didn't do any sort of designing. It's rather a stupid work. But at least it's a stupid sample which tells you how you would end up if you keep writing this kind of code...

If you add BRK, RTI and some interrupt features, it may work as a CPU. But anyway, it already uses about 1300 slices with the area-optimization option, so I don't think it's for any use.

The main cause is I didn't manually decode instruction code. I thought compiler does this for me, but I guess I was excessively easy-going.

As for the block-transfer instructions, they do single byte transfer in 6 clocks, except for the initial and final stack push / pull overhead. If they do it in 8 bits per clock basis:

  1. Read 1 byte from source address, and update source address (low byte).
  2. Write 1 byte to destination, and update source address (high byte).
  3. Update destination address (low byte).
  4. Update destination address (high byte).
  5. Update length counter (low byte).
  6. Update length counter (high byte).
Although I think they do something like above, the unsuccessful CPU can't do this because of the way it is structured. Block-transfer instructions were the last set of instructions to be implemented. It's pretty bad that I found structural error after implementing most instructions...

Block-transfer instructions are the first to be implemented on my next CPU.


[May 23 2006] PCE Development Board

I'm getting tired of setting up my ROM emulator on each time I run my code on my PC Engine, so I made a PCE development board. I learned that DEVELO BOX used 74HC157 and I thought it was a good idea, so I used the IC in my dev board as well. The main difference is that DEVELO BOX used RS232C, but mine only uses LPT for PC <--> PCE communication.

Circuitry and Soldering Pattern ([May 28 2006] Rejected due to problem in communication)

Quickly soldered the board, no idea whether it works or not...
[May 28 2006] didn't work.

Soldering side. Looks pretty nice since I made soldering pattern before actually soldered.

A 74HC14 and 1[kOhm] resistors are added for avoiding what is assumed to be signal reflection I experienced with the JTAG communication while back. While 1[kOhm] is suggested in the circuit diagram, I actually used 1.5[kOhm] since I saw many of them. I don't think this will make any difference, but I haven't tested, so I can't be sure.

I also made soldering pattern on the circuit editor along with the circuit diagram this time. If you do this in advance, then you don't need to think about the wiring direction on each side of the board going opposite anymore, so the time it takes for soldering work would be shortened. I think I completed it 2x faster than doing it by thinking the direction of wiring on-the-fly. You can also think to have as less wire-jumping on the soldering side (which takes time) as possible in advance, so the completed board can look much better. The device side rather looks uglier this time.

I soldered the circuit board between 1st line and 2nd line of D-SUB female connector pins. By doing so, you don't need to use a D-SUB 25-pin connector <--> universal circuit board converting board which is pretty expensive.

I noticed that control pad input pins of 74HC157 (A input pins) don't have pull-up resistors after I completed my board. So this requires control pad to be always connected when using the board. Otherwise, you need to pull-up A input pins of 74HC157 (2, 5, 11, 14).

It looks like the hardware is complete, so now I need a software. I'm going to write a small software which receives and executes programs from LPT using HuC or PCEAS, and make it bootable from the PCE CD-ROM system. I don't know whether it's going to work or not...


[May 28 2006] Unable to Complete

The quality of communication between PC and PCE has been bad, and there seem to be no way I can get it work. It seems PCE --> PC direction is failing. The quality of communication seemed too bad that I made it read the port value 4 times, and proceed only if all of them are the same, but still one byte or two out of 8kB became $10 or $20 while it had to be $00. I then thought about sending a check-sum byte, and resend the data if the sum didn't match. But how long would it take for sending a 8kB block of data when there are always one or two bytes of errors in it...?

I observed the signal on my oscilloscope, but the signal was actually pretty clean and there seemed to be no signal reflection. I don't know why it doesn't work...

The PC watches the LPT port by polling method while communicating, and so the CPU load becomes 100%. Then the CPU cooling fan becomes so fast and loud that greatly decreases my willingness to complete it.

PCE doesn't seem to boot from CD-Rs without trying for several ten seconds to several minutes. This decreases my willingness to complete it even more.

I think there is not much motivation left, so I will stop here (just imagine that you spent all weekend trying to make a PC Engine development board and failed). I will recall about this one when I do something similar in the future.

You might have noticed that I forgot soldering one part in the picture above, but it is fixed now. So this is not the cause.

Programs are left here.
And below are for those who want to see them a little bit now.

PCE-side program

PC-side program

pcedev_pce.c is the PCE-side program, and pcedev_pc.c is the PC-side. It was the first time I wrote a program which does hand-shaking, and I thought it was pretty confusing (the way I wrote it was pretty bad too). It still contains some bugs, but oh well, who cares.

PCE's program is compiled by typing

huc -scd pcedev_pce.c

and a file named pcedev_pce.iso will be generated. Then you can use something like cdrecord to burn it into a CD-R.

cdrecord -v speed=4 dev=0,1,0 -data pcedev_pce.iso

By doing such, for the time being, CD-ROM which can somehow be booted from PCE can be generated. Note that dev=0,1,0 is my PC's configuration, and it can be either 0,0,0 or 1,0,0 or something, depending on how your CD drive is installed in your PC. This is something you need to find out.

PC's program uses Craig Peacock's PortTalk. Put files related to the PortTalk in the same folder as the PC's program, and type

gcc pcedev_pc.c -o pcedev_pc.exe

to compile under the MinGW environment. I don't think you really need any compiling options for speed, since I think more than 99% of CPU time is consumed by polling the LPT port. Probably adding -Os -s options is all you can effectively do.

I'm not doing well lately...

PCE Development Board - DISCONTINUED -


[Jun.03 2006] Working-OK-Hardware

Fixed the board a little and made it work (revived). It turned out that it was too difficult to communicate accurately with the previous circuit. I will just upload the improved circut diagram.

Fixed PCE Development Board

A D-FF (74HC74) was added. The reason why you need a D-FF is that... if you think of PCE --> PC communication, you either think of these 2 ways:

  1. Determine SEL as clock signal, and CLR as data signal
  2. Determine CLR as clock signal, and SEL as data signal
but hand-shaking communication would fail in both cases.

For the case of 1, when CLR = 0 (i.e. the data bits is zero) PCE pad is selected (by the 74HC157) for the data input D0-D4, and LPT data can't be read. The hand shaking communication fails at this point.

For the case of 2, there would be no problem if CLR = 1 only when PCE reads data bits from LPT. But PCE needs to read LPT before setting CLR from 0 to 1, hand shaking fails.

Because I wasn't aware of these when I wrote the communication program, the communication of PCE --> PC direction was completely messed up.

Hence I decided to use a D-FF. By connecting the D-FF as the circuit diagram above, it will save the CLR value on positive transition of SEL. It will hold the CLR value until another positive transition of SEL will occur. So for the case of 1, this D-FF will make it possible to save CLR value (holding a data which may be zero) to D-FF and then set CLR = 1 to read LPT (explanation for the case of 2 is omitted).

In this update, I also added the pull-up resistors for the PCE pad side. The 1.5[kOhm] resistors might be too small since each 'L' level signal line will make several [mA] of current to go through the resistor from the 5[V] power supply. Use higher resistance if you are not comfortable with them.


[Jun.09 2006] Working-OK-Software

I was thinking of improving them before putting them here, but I guess had no energy for that.

PC software (source + exe)

PCE software (source only)

An executable file to be transferred from PC --> PCE "startup.pce" (source + PCE)

Below is a brief explanation of the above files. Read the source for detail.

  1. The PC software is to be just executed on your PC. It will wait for a request from PCE. When it receives any request that can handle, it will respond to it.
  2. The PCE software is to be compiled using HuC, burned to a CD-R, and booted on the PCE. It will request an executable file "startup.pce".
  3. startup.pce is to be placed in the same folder as the PC software. The PC software will receive a request from PCE, and send it to PCE.
  4. The PCE software will load startup.pce at bank4($8000-$9FFF), and then do JSR $8000. After that, it's a matter of what is written in startup.pce.
  5. For instance, the above startup.pce is programmed to run all the decimal mode ADC instructions from #$00+#$00 to #$FF+#$FF, and sends result files adc_0.bin - adc_f.bin (8kB each) to PC.
  6. After transfering 16 files to PC on 5, startup.pce will execute RTS, and returns to the PCE software. Then the PCE software will send the bank4($8000-$9FFF) back to PC as "file.bin". The code sending "file.bin" to PC is just a remaining part of my debugging code and should be deleted.

The PC software will exit if any key is pressed when it waits for a request from PCE. Any request from PCE will not be responded until the PC software is invoked for the next time. PCE software is programmed so that it continues the same request again and again until it is accepted, so as soon the PC software is invoked again, it should continue from where it left off. I have seen this didn't happen once, so there might be some bugs left.

Once 6 is completed, PCE side will permanently repeat 2-6. So it is a good time to press a key during transfer of 6. This is something you probably want to change.

The only bank you can use without destructing existing code at the time startup.pce is invoked is bank4. If you want to use any other banks, then you should be ready to initialize everything (including the PCE software). Code related to HuC exist in bank3($6000-$7fff) and bank6($c000-$dfff). It's up to you to whether writing a code which cooperates with existing code or one which initializes everything to make them your own, although you need to cooperate with the BIOS in bank7($e000-$ffff) at least, since it is a ROM.

startup.pce is assembled by entering the following command.

pceas -raw startup.asm

pcedev.asm and pcedev.inc are included from startup.asm.

The receive_byte routine in pcedev.asm is not tested.

The data transfer on 6 is rather slow since the PCE --> PC transfer program is written in HuC. On the other hand, the PCE --> PC on 5 is written in assembly. The transfer speed is probably 2-3 times faster in assembly. HuC6280 may be considered as a good old CPU which gave meaning to writing program in assembly. (?)

Interrupt needs to be disabled before PC <--> PCE data transfer.

The LPT read/write speed using PortTalk was 120k times per second on my PC. Since a Pentium machine performed 1M times per second under MS-DOS mode, I think recent PC can perform more than 120k reads/writes per second and transfer speed can become much faster. It currently takes several seconds for 8kB PCE --> PC data transfer and about 3-times slower than that for the opposite direction.

As I mentioned before, it seems that CD-Rs are fairly tough media for PCE to read, as it fails so many times. The PCE software is made so that you don't need to restart it once it successfully starts. But taking minutes of time for the initial start is quite bit of pain.

The improved PCE dev board which now became 2-floor due to addition of a 74HC74.

Now I can run a test which requires a lot of memory.

Decimal Mode ADC (CF=0/CF=1, $00+$00 - $ff+$ff) test results

Decimal Mode SBC (CF=0/CF=1, $00-$00 - $ff-$ff) test results

PCE development board - THE END -


[Jun.10 2006] ALU Revised

Since I have even less time than before, I will make it more like a note.


[Jul.23 2006] Building a Logic Analyzer(?)

I'm trying to make a sort of logic analyzer using DRAM (72-pin SIMM), USB (USBN9604+PIC16F873), and CPLD (XC95108PC84 x 2). I put (?) because I'm not going to implement any kind of "analyzing" features. I will write in a note style to save my time.

Recognized as a USB hardware, finally

Motivation (to be translated later)

  1. CPU を作るのにステートマシンをなるべくきちんとつくりたい→練習が必要
  2. PCE 互換機のワーク RAM と SCD のバッファ RAM をあわせると結構大容量になる
    → DRAM を使いたい
    → DRAM ってどうやって使うんだっけ?
  3. USB 通信ができれば PC ←→ PCE 互換機の通信がかなり楽になる(RS-232C だとやや能力不足)
  4. ロジックアナライザみたいなものを作っておけば、今後のハードの製作がかなり楽になるはず


とりあえず 後閑氏のページに公開されているプログラム(usbdvc1)を以下のように変更して使用しています。


; usbsym.h の最初の2行をコメントアウト
; USBN9604 ←→ PIC16F873 接続
; マクロ追加 (usbmac.h)
	movlw	0ffh
	movwf	TRISC

	movlw	0
	movwf	TRISC
	movlf	0x6,ADCON1		;RA0-5, RE0-2 = Digital
	movlf	0x00,TRISA		;RA0-5 = OUT
	movlf	0x01,TRISB		;RB0=INT
	movlf	0xff,TRISC		;RC0-7 = IN

	BSF	USBCS			;USBN9602 CS = 1 USBN9602 OFF
	BCF	USBA0			; A0 = L
	BSF	USBRD			; /RD = H
	BSF	USBWR			; /WR = H

	CLRF	STALLD			;USB用変数の初期化

; ディスクリプタ送信 (SENDDESC) 部分

; PIC16F873 ← USBN9604 読み出し
	; write address

	; read data from the address
; PIC16F873 ← USBN9604 連続読み出し
	; write address

	; read data from the address
; PIC16F873 → USBN9604 書き込み
	; write address
	IORLW	0x80

	; write data to the address
; PIC16F873 → USBN9604 連続書き込み
	; write address

	; write data to the address
PIC16F873  ←→  USBN9604 接続 (あとで回路図描きます)
 RC0-RC7   ←→ D0-D7
 RA0        →  A0
 RA1        →  /CS
 RA2        →  /RD
 RA3        →  /WR
 RB0        ←  INTR
/MCLR = H       /RESET= H
VDD = +5V       MODE0 = MODE1 = DRQ = AGND = GND
VSS = GND       V3.3 → 1.5kOhm → D+
その他はNC      VCC = +5V


[Aug.20 2006] Logic Analyzer(?) Circuit (Ver 0.8)

I wrote a circuit diagram, which is not yet OK.

Circuit V0.8 (Click to enlarge)


CPLD source is like this at the moment. It seems that EDO Page Mode Early Write Cycle and CBR Refresh Cycle is working on a simulation. CPLD --> DMA transfer feature is not written yet. I might have to use two CPLDs when I implement DMA.

Behavior of /OE is described in the CPLD code above, but I just looked at the circuit diagram of SIMM module and all of them were connected to GND (darn).


[Aug.27 2006] Logic Analyzer(?) Wiring Completed (V0.9)

I think I finished wiring.

The right CPLD is a decoration :p

I learned a technique to mount 74VHC244 (SOP20-P-300-1.27) on the soldering side


I will update the circuit diagram after verifying correct operations.

PIC, USBN9604, and part of CPLD is working at least, so I will debug the hardware by transferting data to PC.

Now it's a bit shady whether this is really going to work as a logic analyzer. But I think it's OK as long as I become able to use USB, and also able to make a working DRAM controller, which is a state machine.


[Sep.03 2006]

It seems to be working to a certain extent. Address behavior is still a bit strange. I wrote a test code which does SIMM --> PC solely by PIC, and it was way too slow that it was just useless (it took several seconds for transferring 16kB). Maybe I need to implementing DMA. I wonder all can fit in a single CPLD...?

[Sep.10 2006]

I think I'm learning alot from this sub-project.

[Sep.16 2006]

Finally, it worked as I expected.

Gathered the stuff and tested

I will update circuit and other things soon.


[Nov.08 2005] PC Engine Compatible Hardware Project BBS

I have set up a BBS. Take a look if you are interested.

PCE Compatible Hardware Project BBS


I respectfully thank peple who made following technical pages.




(C) Ki 2003-2006