[10.16.2005]
I Started developing a PC Engine compatible hardware.
I'll upload information about developing it.
Please excuse my sloppy English, as I don't regularly use it in my daily life.
[10.16.2005] Motivation
The progress on my feeling:
Since I'll probably fall into situations that I need to test things on the real machine, I decided to build a ROM emulator. Here are some images of what I've got.


I'm debugging the ROM emulator hardware. Most part of the circuit is taken from ChaN's ROM emulator. I expanded address lines so it can support 4M bit SRAM. I was going to control the hardware through USB, but it's currently controlled through parallel port since I now realized that the USB interface requires a CPU to be controlled...
Here I uploaded a snippet of control program written in C. It uses parallel port. Although I'm still debugging the hardware, I think this code will work.
#define CTRLPORT 0x378
static unsigned char s_PrevData = 0x00;
/*-----------------------------------------------------------------------------
[rome_reset]
Change the ROM emulator to wite-mode,
and zero-clear the counter and the shift register.
-----------------------------------------------------------------------------*/
static
void
rome_reset(void)
{
outportb(CTRLPORT, 0x00);
outportb(CTRLPORT, 0x08);
s_PrevData = 0xff;
rome_sendbyte(0x00);
}
/*-----------------------------------------------------------------------------
[rome_sendbyte]
Write 1 byte to the ROM emulator.
-----------------------------------------------------------------------------*/
static
void
rome_sendbyte(
unsigned char v)
{
if (v != s_PrevData)
{
int i;
unsigned char bit0;
s_PrevData = v;
for (i = 0; i < 8; i++)
{
bit0 = ((v << i) & 0x80) ? 1 : 0;
outportb(CTRLPORT, 0x08|bit0); // clock=L
outportb(CTRLPORT, 0x08|bit0|0x02); // clock=H
}
}
outportb(CTRLPORT, 0x0c); // strobe=H
outportb(CTRLPORT, 0x08); // strobe=L
}
First you call rome_reset(), then call
rome_sendbyte() until whole SRAM bytes are written.
After the write, outportb(CTRLPORT, 0x00); then
the target can see the hardware as a ROM.
[01.04.2004] PICTURES OF THE ROM EMULATOR
[01.11.2004]
Some notes on the ROM emulator...
I used a general purpose transistor 2SC1815 for the RESET signal output.
Any "general purpose" transistor should work.
But in the case of the ROM emulator for PCE, I had to add a 1.5[kohm] base resistor
to ChaN's ROM emulator
in order to make it work. Without this resistor connected to the base of the 2SC1815,
the G1 and G2 pins of 74HC541 (pin #1 and #19 which opens/closes address lines from PCE)
become somewhere around 2.5[V], which I think is troublesome voltage for
CMOS ICs. It seemed that too much current flew through the ICs when such voltage
is supplied to their input pins. One of the ICs actually became very hot that
I thought I had blown up the IC.
[01.17.2004]
incorrect: "the G1 and G2 pins of 74HC541 (pin #1 and #19 ..."
correct: "the G2 pin of 74HC541 (pin #19 ..."
Another thing to note is that you should connect any unused input pins of CMOS ICs to either +V or to the GND to avoid Latch-Up condition to occur and break them. I would connect input pins of inverting ICs (such as 74HC14) to +V, and input pins of non-inverting ICs (such as 74HC541) to the GND so their output become 0V.
I used HM628512BLP-5 for the 4 Megabit SRAM. It's access time is 55[ns]. Suppose the gate latency is 30[ns], then the overall latency of the ROM emulator would be 55 + 30 = 85 [ns]. Since the access time of the main memory of my PCE seems to be 100[ns], I think the ROM emulator is responding quick enough.
I currently take power supply of the ROM emulator from USB port, not from PCE. It didn't work when I took power supply from PCE's cartridge connector. The PCE's power supply circuit uses 78M05, if I remember. So what I think is that there is not enough capacity left in the PCE's power supply, but I don't know.
I used 4 Megabit SRAM, which is too large for the purpose of just
executing simple test programs. It takes more time to program as well
(it takes about 15 seconds to program a whole 512KB SRAM).
For the case of simple tests on the PCE, I think 512 Kbit (64KB) SRAM
should be enough. Since you don't have to add an extra 74HC541 and a 74HC590
to ChaN's circuit if you use 64KB SRAM, building the hardware should be
relatively easy.
[01.17.2004] Circuit Diagram of the ROM Emulator
NOTE: Name of control signal lines are different from ChaN's circuit due to the
circuit editor I used (CEAT V2.4).
See the pin numbers.
The emulator is no more open to the public.
Since there are some people who just don't use my emulator properly, I've changed the main point of the project to "Developing a compatible hardware."
I'm currently using BSch for circuit editing. I'll build my own circuit simulator, or use a freeware. The actual building of the circuit will probably be done by utilizing CPLD, but I suppose as if it's going to be built using general logic ICs, for studying sake.
I'm very new to the digital hardware, I'm not sure whether I can do it or not.
So this time I would like to invite you for collaboration.
If you are interested, please drop me a line:

or you can just write here:
PCE Compatible Hardware Project BBS
Please don't use any information found hereafter for commercial purpose.
There is no reason, but I started with making the timer circuit. It looks a little bit messy. I haven't debugged it, so it probably doesn't work yet.

The PC Engine Timer is constructed with
I'll list some important characteristics of the PC Engine Timer.
I simulated the 4040 using D-FF with my hand-made simulator.
I'll start from easy ones. I have drawn the address-decoding circuit. I still haven't debugged it, so it probably won't work yet.

Below is a memory map compatible with the PC Engine. Although I think this one has above 95% compatibility with the real machine, it's compatibility still hasn't fully verified.
-------------------+--------------------+------------------------ Logical Address | Physical Address | Activated Signal ===================+====================+======================== 00:0000-7F:1FFF | 000000-0FFFFF | /ROM 68:0000-87:1FFF | 0D0000-10FFFF | /CDRAM (64kB + 192kB) F7:0000-F7:1FFF | 1EE000-1EFFFF | /BRAM F8:0000-FB:1FFF | 1F0000-1F1FFF | /MAINRAM FF:0000-FF:03FF | 1FE000-1FE3FF | /VDC FF:0400-FF:07FF | 1FE400-1FE7FF | /VCE FF:0800-FF:0BFF | 1FE800-1FEBFF | /PSG FF:0C00-FF:0FFF | 1FEC00-1FEFFF | /TIMER FF:1000-FF:13FF | 1FF000-1FF3FF | /PAD FF:1400-FF:17FF | 1FF400-1FF7FF | /INTCTRL FF:1800-FF:19FF | 1FF800-1FF9FF | /CDROM FF:1A00-FF:1AFF | 1FFA00-1FFAFF | /AC -------------------+--------------------+------------------------ NOTE: addresses are in hexadecimal notation
You may have noticed that the rage of ROM and CDRAM are overlapped. This will be bank-switched using the signal which goes to "LO" level when a cartridge is inserted.
The upper two digits "XX:" of the Logical Address are the values of the MPR register within the CPU. You probably won't understand Logical Address section of the above table if you are not familiar with this MPR register. So I'll briefly explain about the MPR register.
The basic idea of the MPR register is the same as the "segment register" in the i8086. In the i8086, +1 of a segment register corresponds to +16 of the physical address. But in the PC Engine hardware, +1 of a MPR register corresponds to +8192 of the physical address.
There are 8 MPR registers in the CPU, and each one holds a segment number (0-255) of 8kb ($2000 bytes) segment blocks. This segment number directly becomes the upper 8 bits of the physical address.
The reason why there are 8 MPR registers is that there are 8 blocks of 8kb ($2000 bytes) segments in the 16-bit CPU effective address memory space. How these 8 MPR registers MPR0-MPR7 correspond to these 8 blocks of segments are shown in the table below. Values 0-255 are written to each MPR, and the address conversion from logical address to the physical address is done using values in the MPR registers. Suppose x is the upper 3 bits of the effective address, and Offset Address is the upper 13 bits of the effective address, then the address conversion follows the equation:
---------------------+--------------------+---------------------
Effective Address | Valid MPR | Offset Address
=====================+====================+=====================
0000-1FFF | MPR0 | 0000-1FFF
2000-3FFF | MPR1 | 0000-1FFF
4000-5FFF | MPR2 | 0000-1FFF
6000-7FFF | MPR3 | 0000-1FFF
8000-9FFF | MPR4 | 0000-1FFF
A000-BFFF | MPR5 | 0000-1FFF
C000-DFFF | MPR6 | 0000-1FFF
E000-FFFF | MPR7 | 0000-1FFF
---------------------+--------------------+---------------------
NOTE: addresses are in hexadecimal notation
When the CPU is reset, MPR7 will be loaded with zero. CPU accesses the effective addresses $FFFE and $FFFF for reading the reset vector. So the effective addresses are decoded to [MPR7]:1FFE and [MPR7]:1FFF respectively, and the physical addresses actually accessed are $1FFE and $1FFF. This means that the reset vector needs to be stored in the first $1FFE and $1FFF of a ROM.
I'm going to verify the address-decoding circuit some time soon.
I have redrawn the timer circuit.

I have redrawn the timer circuit again.
I have simulated the main part of it. It is too unefficient to simulate the circuit with my own simulator (since I need to debug the simulator before I can debug the circuit), I'll try something different.
The simulator writes '1' to the control register on the 1023rd clock. This triggers a value stored in the reload register (fixed to 0x1f) to load to the down counter, and enables the count-down behavior. Before the simulation is finished, it write '0' to the control register, and checks if the count-down is disabled.
Below is a truth table of the /LOAD signal. +EDGE is the output Q of U10, which becomes '1' when detecting a rising edge of COUNT signal.
---------+---------+-------+--------
RELOAD | CTRL.WR | +EDGE | /LOAD
=========+=========+=======+========
0 | 0 | 0 | 1
0 | 0 | 1 | 1
0 | 1 | 0 | 1
0 | 1 | 1 | 0
1 | 0 | 0 | 0
1 | 0 | 1 | 0
1 | 1 | 0 | 0
1 | 1 | 1 | 0
---------+---------+-------+--------
From the table, the following equation is obtained.
The /LOAD is generated using the CPU's write strobe signal (CTRL.WR), so I may need to modify the circuit if its timing is different from what I expected.
I just drew it. It still hasn't been tested.

Here I translated the article Nuts and Volts (COMPUTE II ISSUE 1 / APRIL/MAY 1980) into Japanese. But I don't think you need the translation. :) I just leave the figures which are a bit hard to see in the original page.


I have written the timer circuit in Verilog-HDL. ==> view source
The Verilog-HDL code is verified with the learning board which came with the book "FPGA ボードで学ぶ論理回路設計". It ran too fast with the 33[MHz] system clock, so I added a divide-by-1024 counter before the prescaler.
Here are some more notes on implementing the Timer.
I have written clock dividers which generate clock frequencies used in the PC Engine hardware. With the 1/2 and 1/3 dividers, 10.73863[MHz] and 7.159090[MHz] clock frequencies are generated from the system clock 21.47727[MHz]. Then each outputs are fed to the other 1/2 dividers so 5.369317[MHz], 3.579545[MHz] and 1.789772[MHz] frequencies are obtained. ==> view source
This time, I tried the iverilog + IVI for its verification.
I have also verified that it works on the FPGA chip.
ALU (=Arithmetic Logic Unit) is a combination circuit which is responsible for executing operations of a CPU (addition, subtraction, AND, OR, EXOR, bit shift, etc.). Since it is a combination circuit, no FF circuit nor a clock signal input is required.
In HuC6280, when D flag is set (D=1), one extra clock cycle is required to do the decimal mode ADC/SBC. I guess that it does the normal ADC/SBC first, and then it converts the result into BCD on the next clock. So I have implemented a "BCD adjustment" function to the ALU.
I have rewritten the code. I think it's fairly close to completion.
I have checked the code to work on the FPGA chip, but there may be some bugs left.
==> ALU source
==> general decoders (used by the ALU)
==> Adder source (used by the ALU)
I have written an ALU, and I decided to look more closely on the adder.
An adder can be constructed without using any FFs. The basic circuit is a 1-bit full adder. Below is the truth table of the 1-bit full adder.
-----------+------- A B Ci | Co S ===========+======= 0 0 0 | 0 0 0 0 1 | 0 1 0 1 0 | 0 1 0 1 1 | 1 0 1 0 0 | 0 1 1 0 1 | 1 0 1 1 0 | 1 0 1 1 1 | 1 1 -----------+------- nA = not A nB = not B nCi = not Ci Co = (A*B) + (A*Ci) + (B*Ci) S = (nA*nB*Ci) + (nA*B*nCi) + (A*nB*nCi) + (A*B*Ci)If you use EXOR gates, the equation will be much more simple. But I heard that the delay time of EXOR gates are larger than the other gates, so I just don't use them now. Since CPLDs and FPGAs have EXOR primitives within the chip, so I don't think this is really true when it comes to CPLDs and FPGAs, but I just decided not to use them anyway.
If you make a circuit directly from the above equation, it will look like this.

Suppose we have 8-bit inputs A0-A7, B0-B7, and a 1-bit input CF. The carry signals C0-C7 which occurs on each bits of an adder are calculated as follows. The expantion of right hand side is omitted on C2 and thereafter.
C0 = A0*B0 + A0*CF + B0*CF C1 = A1*B1 + A1*C0 + B1*C0 = A1*B1 + A1*(A0*B0 + B0*CF + CF*A0) + B1*(A0*B0 + B0*CF + CF*A0) C2 = A2*B2 + A2*C1 + B2*C1 = ... C3 = A3*B3 + A3*C2 + B3*C2 = ... : : : C7 = A7*B7 + A7*C6 + B7*C6 = ...
Signals zero(Z), overflow(V), and negative(N) are implemented as well as the carry(C) signal within the adder. I have seen that overflow can be detected using the logic (C6 EXOR C7), so I just used it.
Here I uploaded the adder in Verilog-HDL. I have tested the code on the real FPGA.
This is rather Off-Topic, but I suddenly wanted to build one, so I quickly made it.

This is a tool for "programming" a circuit into the XILINX CPLDs. If you buy it, it will cost about 6500yen ($55US). But if you build one by yourself, it will cost about 2000yen($17US). It doesn't make me feel that I saved money because it takes about half a day to complete it. And if you think about how much mony you can make if you work for half a day...
CPLD chips are only for fitting small-scale circuits, so it's definately not for fitting a whole PC Engine in it.
I referred to the circuit diagram found on the XILINX official web page. I used very old parts that I have had for many years, so I rather worried if those parts were still alive, but it worked on the first try.
I didn't draw any circuit diagrams. All you need is the circuit diagram of the download cable (found on the XILINX web page), pin assignment chart of XC9572-PC84 or XC95108-PC84, and pin assignment figure of 84-pin PLCC package.
I said I suddenly wanted to build one, but actual motivation is like this...
A happy new year. (^_^; I couldn't do anything impressive lastyear afterall. I hope I'll be able to proceed a bit more this year.
This is rather OT too, but I designed a ROM emulator using a CPLD.
Below is the block diagram.

HCmd, HClk and HData are command, clock, data output and datainput from PC, respectively. TAddr, TData, /TOE and /TWE are address, data, read strobe and write strobe of a target system. RAddr, RData, /ROE and /RWE are address, data, read strobe and write strobe of a S-RAM.
First you output a command to HCmd, and change HClk from '0' to '1' to submit command to the control logic. Below is a list of commands.
Since the data lines are connected to RAM and CPU data buses, they need to be bi-directional (inout). This is what makes it complexed. From the CPLD point of view, RData outputs data only when /RWE is active(0), while TData outputs data only when /TOE is active(0).
==> Verilog-HDL source of the control logic
Below is the simulated result of the control logic.

I'm going to actually build one some time soon.
I've done wiring, but it didn't work. As always, it really makes me down when I realize that the circuit isn't working...
I've taken a photo anyway.

My impression so far is that number of wires became less by using CPLD, indeed, but not that fewer so that I would feel like building two or more.
I'm going to debug it from now... Not sure whether I'm doing this right.
The software was what contained the bug. I spent hours looking for a mistake in hardware. Ouch! Or is it a kind of bug that you think you have a bug where there really isn't any?
I quickly drew the circuit. You'll see that only thing I'm doing is wiring them. I haven't verified the pin numbers. You can fix it as you like if you find it wrong.

Also, I put my "probably-complete" control program here. For controlling the PC's parallel port under Windows XP, I used the PortTalk by Craig Peacock.
I'm currently implementing the PSG in Verilog-HDL. It doesn't seem very hard in the waveform output part, but the volume control part seems very difficult.
There are three attenuators implemented in the PSG for audio volume adjustment. One is for adjusting both left and right volumes simultaneously within a channel (CH-ATT), another one is for adjusting left and right volume independently within a channel (L-ATT, R-ATT), and the last one is the master attenuator for adjusting the whole output volume (M-ATT). In the real hardware, they might be implemented as amplifiers rather than attenuators, but they both are just matter of amplitude ratio, so I think either way is fine. It looks like I've implemented the CH-ATT as an amplifier this time.
Since these attenuators changes the amplitude in logarithmic scale, you basically need to calculate logarithm in hardware. But it would probably use too many gates, so I decided to implement only the CH-ATT in digital, and implement the rest in analog circuit.
CH-ATT is a 32-step attenuator with 1.5[dB] attenuation per step. The weight on each bit of the 5-bit step input is as follows.
D0: 1.5 [dB] ~= x1.1885 ~= x(1+1/8+1/16) = x1.1875 ~= 1.4927[dB] D1: 3.0 [dB] ~= x1.4125 ~= x(1+1/4+1/8) = x1.4375 ~= 3.1522[dB] D2: 6.0 [dB] ~= x2.0 = << 1 ~= 6.0206[dB] D3: 12.0[dB] ~= x4.0 = << 2 ~= 12.041[dB] D4: 24.0[dB] ~= x16.0 = << 4 ~= 24.082[dB] D0|D1: 4.5[dB] ~= x1.6788 ~= x(1+1/2+1/8+1/16) = x1.6875 ~= 4.5449[dB]~= means "roughly equal". Being aware that 6.0[dB] is approximately x2, 12[dB] is approximately 4, and 24[dB] is approximately x16, these high-cost logarithm calculations are approximated using logical bit shift operation.
I have implemented the CH-ATT using this approximation.
The output of the CH-ATT will be fed to the attenuators L-ATT/R-ATT. With these, the left and right volumes on each channel can be changed independently, and so panpot setting can be done. These are 16-step attenuators with 3.0[dB] attenuation per step. I really want to implement these attenuators digitally, but even more gates the CH-ATT would be required to implement them because of the fixed point calculation (if you keep "multiplying" data, more bits are required to represent results). So I'll just wait until I see there are plenty of unused gates. In the meantime I'll try to use this analog circuit.

Since this is the first time I've drawn a circuit like this, I have no idea whether this thing would work or not. It seems to me that the audio signal would be affected alot by the switching noise generated by digital gates. The left one is the L-ATT. You use the same circuit for R-ATT as well. A L-ATT and a R-ATT exist within each PSG channel. Since there are 6 PSG channels, you need to make 6 sets of L-ATT and R-ATT, 12 circuits in total. Wow, that would be a lot... A buffer may be required right after the ladder resistors.
The one on the right is the M-ATT. Two of them are required for left and right. These attenuators are the last ones which are fed from L-ATT and R-ATT outputs. The output of M-ATT finally becomes the final audio output of the console.
I'll debug the circuit sooner or later.
I implemented the PSG in Verilog-HDL. It seems to output waveform, but I haven't actually listened to it. So I don't know whether it is correctly working yet.
I'm still suffering from how I can implement the attenuators (ATTs). I changed my mind, and decided not to build all those analog circuits I designed above. I rather thought that it would be ALOT easier even if I add another FPGA solely for the ATTs. So this time, I put up with using more gates for the ATTs, and implemented the whole thing in digital. I tried hard using shift operations before, but I simply used multiplication this time.
The three ATTs can be summed up into one ATT by adding their attenuation levels. This can be done because all of them are in logarithmic scale, and also the following equation is true in logarithmic scale:
L-ATT/R-ATT and M-ATT have different attenuation level per step than CH-ATT. The CH-ATT has the smallest granularity. It has 5-bit step input, 32 steps, and -1.5[dB] per step. whereas the others have 4-bit step input, 16 steps, and -3.0[dB] per step. This means that one step of L-ATT/R-ATT and M-ATT is equal to CH-ATT's 2 steps.
We need to unify the attenuation level per step in order to sum up all the ATTs into one. Here, we just assume L-ATT/R-ATT and M-ATT has 5-bit step input and -1.5[dB] attenuation per step, but we skip the step input by one. In another words, we multiply the step input by 2, and divide the attenuation level per step by 2.
Sorry if my explanation was too stupid. Anyway, once we have done this, then we can just add the attenuation level of three ATTs together and attenuate waveform by that value. Let me just call this unified ATT "composite attenuator (CATT)".
The ranges of each ATT's step input are:
Although we unified the ATTs into one, we need two of them since the PSG's audio output is stereo.
The output is designed to be 16-bits. The upper 5 bits is the integer part, and the lower 11 bits is the fractional part. The waveform level is internally calculated as 16+5 bits, but outputted as 16-bits.
Now for the PSG hardware. I used up most of my energy writing the ATT section, so I'll just explain the points.
Six of "PsgChannel" circuits belong to "PSG". PSG is a mediator, and each PsgChannel outputs waveforms.
PSG basically works at 3.579545[MHz]. But I used the system clock 21.47727[MHz] = 3.579545*6[MHz] (!) for the signal attenuation part, since I could sequentially attenuate all the 6 channels on each rising edge of it. In this way, I don't need to use one CATT per channel. The side effect of doing this is that output of some channels are delayed by 1/3579545 second. But I think this is OK, because the PSG is not structured as to be able to start output mutiple channels on the same edge of the clock in the first place.
I said PSG basically works at 3.579545[MHz], but I used 7.159090[MHz]. This is because I thought the PSG working slower than CPU may cause timing faults. But I'm not really sure about this at the moment, because the PC Engine CPU can work at 1.789772[MHz] when CSL instruction is executed...
Noise channel and LFO are not implemented yet.
I compiled the whole PSG hardware targeted for ALTERA Cyclone, and it used 1330 LEs - much more than I expected...
I asked one of my foreign friends if he could send me some W65C02s, and he kindly sent me five of them (there is no way to buy 6502/65C02 chips now in Japan).
I quickly drew a circuit mainly to test this interesting chip.

I'm going to use a CPLD for implementing devices other than CPU, ROM, RAM and crystal oscillator. Well, they are actually clock dividers and address decoders, so they must fit within a XC9536PC-44. I'm not going to use general logic gates for these, since the total propagation delay would probably be larger than CPLD, and so I would have to use faster memory chips.
The circuit made of three FFs is a divide-by-three frequency divider of 50% duty cycle. I managed to get a 21.47727[MHz] crystal oscillator, so I'm going to divide this frequency by three and run the 65C02 at 7.159090[MHz]. Here is the timing chart of the frequency divider (NOTE: actually this is the the timing chart of the fixed one shown later):
__ __ __ __ __ __ __ __ __
CLK __| |__| |__| |__| |__| |__| |__| |__| |__|
_____ _____ _____
FF1Q _____| |___________| |___________| |_____
_____ _____ _____
FF2Q ___________| |___________| |___________|
_____ _____ __
FF3Q ______________| |___________| |___________|
________ ________ _____
PHI2 ___________| |________| |________|
But this is for using general logic ICs.
I have already implemented 1/3 frequency divider in the
[Nov.20 2005] Implementing Clock Divider in Verilog-HDL section,
so I'm going to use this one for CPLDs.
I'll just connect /IRQ1 line to /IRQ, and /TIRQ to /NMI.
154, 138, and 00 are used for the address decoding. This is quite different from the PC Engine hardware, but it's OK since this is only for testing.
I'm going to use a 32kB ROM, and a 32kB RAM. When the A15 line is 'L', RAM is selected, otherwise ROM is selected. I/O is mapped from $8000 to $FFFF... ooops! This is going to be read-only... Ouch! It looks like I need a radical reconsideration. (^_^;
For instance, I used 27256 for the 32kB ROM. But it can be a EEPROM or even a ROM emulator, which I'm going to use.
By the way, what are the good values for R and C in the crystal oscillator circuit? I searched a little, and it seems that R is between 1[MOhm] and 10[MOhm], and C is between 10[pF] and 20[pF]. I put 1/2 divider so I can observe the waveform with my old oscilloscope I recently got.

I quickly made a experimental circuit on a bread board. The circuit is pretty simple, but it looks rather complicated when it is built on a breadboard (mainly because unused input pins are connected to Vcc or GND). It is said that you should use dip-mica or film capacitor for C to prevent from being affected by surrounding temperature. But I didn't have such good ones, so I just used ceramic types.

I observed the waveform. Wow, I see something!

I was too lazy to fix the 65C02 test circuit at the top of this section, and I wrote it in Verilog-HDL instead. I'll just leave the circuit as it is. But I did fix the 1/3 divider circuit, so I just show it here.

Perhaps it's getting unclear about what I'm doing, but I'm going to stop even more on the way to making PC Engine compatible hardware.
I debugged the address decoder + clock divider in Verilog-HDL.
I decided the memory map as follows:
I/O MODE: $0000-$3FFF RAM (16kB) $4000-$47FF VDC $4800-$4FFF VCE $5000-$57FF PSG $5800-$5FFF Timer $6000-$67FF Pad $6800-$6FFF IntCtrl $7000-$77FF CdRom $8000-$FFFF ROM (32kB) RAM MODE: $0000-$3FFF RAM (16kB) $4000-$7FFF RAM (16kB) $8000-$FFFF ROM (32kB)
A 32kB RAM is still used, and is bank-switched with 16kB on each bank. The 6502 CPUs don't distinguish I/O space and memory space. So if you place a 32kB ROM + 32kB RAM like a Z80 system, there is no space left for I/O. Also 6502s need interrupt vector table located from $FFFA to $FFFF, and zero-page and stack from $0000 to $01FF. Therefore you normally place RAM from $0000, place ROM up to $FFFF, and place I/O somewhere between RAM and ROM. Here, I/O is placed from $4000 to $7FFF (in I/O mode).
Writing zero to anywhere from $8000 to $FFFF will set to "I/O mode", and writing 1 will set to "RAM MODE". A 16kB of RAM is enough since it is already twice as much as PC Engine, but since we have another 16kB, and there is no reason for not useing it, I just made it possible to use it.
And for the clock divider. I had many CPLD pins left unused, so I made it output 10.73863[MHz], 7.159090[MHz], 5.369317[MHz], 3.579545[MHz] and 1.789772[MHz] clock signals from the 21.47727[MHz] system clock input.
I recently got ALTERA EPM7160ELC84-10 very cheaply. But when I tried to use it, I had to install MAX+PLUS II design software, and it didn't support Verilog-HDL by default, and more over, it required a special hardware to program. So I gave up using this CPLD. Too bad!
Hence I ended up waiting for a CPLD to arrive.
I finally got a CPLD that I can use, and I quickly wired it, but it didn't work! Why doesn't it work!? ...Too bad... This weekend is about to end... I have to wait until next weekend... I don't know how I can debug this...

I implemented the interrupt controller and the timer as well as the address decoder and the clock dividers. But since it didn't work, I don't have much to say... At least I can observe the clock signals, so the clock dividers seem to be working. I haven't connected the ROM emulator yet. Insted, I'm using a EEPROM (HN58C256-20) at the moment. This one has 200[ns] access time, so I fed 3.579545[MHz] clock to the CPU. But is it still too fast...?
I made a manual clock circuit (generates a single clock cycle by pressing and releasing a push button) and checked each bit of the address bus with a tester. The circuit is from the book "CPU no Tsukurikata", with modified time constant.

1 ffff -- all the bits are set to '1' on the first clock 2 0724 -- a value I don't know 3 01fc -- stack access; store PCH ? 4 01fb -- stack access; store PCL ? 5 01fa -- stack access; store flag register? 6 fffc 04 -- read lowe byte of the reset vector = 04 7 fffd e0 -- read high byte of the reset vector = e0 8 ???? -- unknown value from hereafter
I read somewhere that the first and second clock is "internal work", so I just skip them.
It is thought that reset is a kind of interrupt, and it seems that the address bus is actually accessing stack to save program counter and flag register. But after a little bit more of investigation, I found that the R/W signal doesn't become 'L', so these stack accesses must be invalid.
I first thought that ROM wasn't working, but it actually outputted the reset vector $E004. The reset vector is stored at $FFFC and $FFFD in the ROM. This means ROM is working... If the ROM is working, the address bus should output $E004 on the 8th clock. But it doesn't... The data bus connection is wrong...? ...Hmmm....
The data bus connection was reversed.
The address bus value on the 8th clock was $0720. This is 0000 0111 0010 0000 in binary notation. If we group it as two 8-bit values, and reverse the bit order respectively, we get 1110 0000 0000 0100. This is $E004 in hexadecimal notation.
Anyway, the CPU seems to be working... I don't have time to fix it today, so I'll do it tomorrow. It's only to change 8 connections of wires, but the solder side of the board is pretty messy, and solder wouldn't easily reach the fixing point.
It worked!
I connected the timer interrupt request singal to the /NMI input of 65C02. In the interrupt handler, I made it so it writes to the interrupt controller and acknowledges the timer interrupt. I observed the /NMI input pin of the 65C02 and it was periodically repeating 'H' and 'L'.
The CPLD chip on the bottom-right of the photo is the heart of the CPU board. It contains timer, interrupt controller, address decoder and clock divider. If I had to make this board without a CPLD, it might have become three or four times larger than the current size.
Here are the circuit and the CPLD source.
Use PCEAS to assemble the 65C02 test program:
Now it seems I finally can do some interesting tests with this board.
I found another bug. The "locking edge" of 65C02 is the negative edge of PHI2, but the timer and the interrupt controller was looking at the positive edge. This caused the board not working at 7.159090[MHz]. I updated the source above.
I also slightly changed the ROM emulator. Here are the updated versions. The host PC program doesn't need any change.

I hope there is no more bug...
Yet I found another bug. (^_^; I couldn't access to the banked RAM. I updated the above source as BUG-FIX2. I also added a RAM test code in the 65C02 test program.
This is the part contained the bug:
always @(posedge i_RW)
begin
if (w_ExRAM)
r_ExRAM <= i_D0;
end
If you think this is "a matter of course", yes it is... I changed it to this.
always @(negedge o_Clk7M159)
begin
if (w_ExRAM & ~i_RW)
r_ExRAM <= i_D0;
end
What I learned from this is that you mustn't do what you mustn't do. Sure...
I bought a couple of SPARTAN 3 chips. My cheap digital camera can't clearly photo the gap between pins...

Since the pich of pins are too small to use (0.5mm), I haven't logically obtained these chips yet.
I can't say I obtained these chips until I actually become able to use it, by implementing them on the pich converter board:

It was easier than I thought. Here is a photo which can be enlarged by clicking it. My cheap camera doesn't show it in detail though...

The most important thing is placing the IC package precisely in position. Since pins of IC are very small, you won't need so much soldering iron. First you paint well-fluxed soldering iron on the board, then solder IC pins by gently pressing pins from top using solder. This resulted in pretty good quality.
I did it like this:
Now you have the Spartan-3 chip ready for use. Let's have some coffee and take a break. (^_^;
I started building a hardware which can actually work as PC Engine. But it seems that PC Engine hardware doesn't fit in a single XC3S200. At least I don't think I can, so I'm thinking of switching to XC3S400 later. Fortunately, XC3S200 and XC3S400 have the same pin assignment for the QFP208 package, So switching from XC3S200 to XC3S400 can probably be completed by simply altering the package.

Hopefully, this will be able to run Famicom/NES without any change.
Although they are not wired yet, there are three 32kB SRAMs to be implemented. One of them is used for MAIN RAM (8kx8 bits) + BRAM (8kx8 bits), and the other two are used for VRAM (32kx16 bits). I forgot that XC3 can't use 5[V] for I/O, and the only SRAM that I had that supported 3.3[V] operation was these. So I ended up using three SRAMs separately.
VREF = 1.25[V] of LM317T is used for VCCINT. Normally, 1.20[V] is supplied for VCCINT. The rated value of VCCINT is from -0.5[V] to 1.32[V]. The fluctuation of VREF of LM317T seems to be +/-0.05[V], so 1.20 <= VREF <= 1.30 [V], therefore I think this satisfies the rated value of VCCINT.

Three power supplies drive me crazy (I would never do this again...). I have wired only power supplies, but it already looks pretty ugly. There are probably too many by-pass capacitors, since I was afraid of malfunction caused by noise from power supplies. Surface-mount linear voltage regulators are used to generate VCCIO 3.3[V] and VCCAUX 2.5[V]. I was going to generate 1.25[V] with the one at the top, but it didn't generate 1.25[V] with the circuit same as LM317T (These SMD regulators are designed to output 2.5[V] when Adj is connected to GND??). Hence the top one is unused.
Now that it seems I finally finished wiring power supplies, I'll test if it is recognised by iMPACT.
At last, it is recognized by iMPACT.
The JTAG logic of XC3S uses VCCAUX for its power supply. Hence the communication works at 2.5[V] (LVCMOS25). It seems that the CPLD programmer I made while back is called "Parallel Cable III", and it works at 5.0[V].
Then a question is can 5.0[V] be input to logic operating at 2.5[V]? In the case of XC3S, even 3.3[V] input will destroy the JTAG logic. But there is a documentation in the Xilinx webpage on how to deal with it. They say that a resister RSER must be connected in series with input. The value of RSER is 56[Ohm] for 3.3[V], and 300[Ohm] for 5.0[V].
Looking at the circuit diagram of Parallel Cable III, there are 100[Ohm] resistors implemented in series with input already. So it seems OK to operate it at 3.3[V].
But there was another problem. When it comes to using Parallel Cable III for XC3S configuration, the propagation delay of 74HC125 becomes a major problem, hence communication will fail. This information is written in detail in Nahitafu's page(in Japanese).
As written in Nahitafu's page, I changed 74HC125 to 74AC125. Then XC3S is recognized by iMPACT.

Below is the photo of XC3S being detected. Sloppy wiring since I wasn't sure if it was going to succeed. I'll probably destroy the chip by accident, if I keep doing things like this...

It was pretty tough... I'll draw the circuit diagram.
I drew the circuit diagram. XCF02S is added & recognized.

The VCCINT 1.2[V] generated by LM317T satisfies the absolute rating, but it doesn't fully satisfy recommended operating range. But since this is a prototype, I think it's OK.
Since I changed 74HC125 of Parallel Cable III to 74AC125, XC3S has indeed become configurable, but CPLDs became unconfigurable instead.
It seems that signal reflection is causing the problem, and the TCK line seems to be mostly affected.
It took me a while, but then I added the following circuit, and now it's working OK.

The input is the signals from 74AC125 of Parallel Cable III. The output should be connected to TCK/TMS/TDI of CPLD. This will use all of six inverters in the 74HC14. I tested without 74HC14 (with only resistors), but it didn't work (74HC14 with no resistors didn't work either).
Below is my understanding of the circuit above, but I don't know whether I'm right or not.
Singal reflection occur at the end of wires. The input impedance of ICs (74HC14) are rather close to infinite(to signals, it should be like a thick rigid wall). So only a few amount of the singals, which has flooded into the input, can actually go through. The rest majority of them have to go elsewhere. But if this is the only input for the signal, then this is the terminal point for the signal. Since there is no way to get around, the signal reflects and start going back to 74AC125. What's happening at this point is a reflection, so the phase of the signal is reversed. The reversed signal is likely to interfere with either the current, or the next coming signal.
It is said that you should connect terminal resistors to prevent signal reflections from occurring. What this resistors do is that they sink terminal signal current to the GND. With the terminal resistors, signals which would cause reflection are absorbed to the GND.
The problem is, what values these resistors should be. I don't know the perfect answer. The idea is to first loosen the rigid wall of 74HC14 input by the resistor connected serial to the input and dull the reflection. Signal still reflect will be absorbed by the terminal resistor which is connected to the GND. The terminal resistor should be small enough so that the input impedance of 74HC14 (~=infinite) can be ignored, and large enough so that the output inpedance of 74AC125 (seveal ohm - several 10 ohm??). This time I decided to use 1[kOhm]. I also used 1[kOhm] connected to the GND.
Again, I don't know if this is correct. But at least it's woking now, so I guess it's OK.
... but this wasn't the main topic of the day. (^_^)
I made a D/A converter for audio output.

Since I wanted to save I/O pins of FPGA, I made a serial audio output on the FPGA, made a CPLD to receive it, and then the CPLD output parallel data. Due to the limitation of the CPLD I used (XC9572PC44), the output format ended up in 15-bit stereo. After that, the parallel data is converted to analog audio signal by R-2R ladder D/A converter.

The FPGA outputs 15-bit L and R samples (30 bits in total) alternatively at 21.47727MHz. The output sample rate is 21.47727 / 30 = 715.909 [kHz].
I would like the FPGA to be a PSG for a while, and play around with some "beep" sound with the 65C02 CPU board.
Here are the DAC circuit diagram and Verilog-HDL source code for CPLD.

The circuit looks weird, because I was kind of playing with it. (^_^;
In the circuit diagram, two resistors, 1.5[kOhm] and 3.0[kOhm], are used. But actual implementation uses only 3.0[kOhm] resistors, and 1.5[kOhm] are made by connecting two 3.0[kOhm] resistors in parallel. There are 90 resistors used for the above circuit in total.
The precision of resistors are vital to the R-2R ladder type DAC. It's highly likely that resistors in the same production lot has the least relative errors. Therefore we should buy large quantity (say, 100) of resistors of the same resistance as a unit, hoping that they are made in the same production lot. I bought one hundred 3.0[kOhm] 5%-error resistors and measured the relative errors, and they were actually less than 1%. The relative errors I got here are probably less than the case of which I buy and use 1.5[kOhm] and 3.0[kOhm] resistors separately. Using 1%-error resistors should reduce the relative errors even more.
There are buffer-amplifiers added to the outputs, because the output impedance is pretty high due to the R-2R ladder resistors. Actually, I haven't implemented the buffer-amplifiers yet, so the effect of them hasn't been verified. However, by actually listening to the sound, I noticed that the quality is worse than my software-emulated PSG. I thought this was caused by the mismatch of impedance between the DAC output and my PC audio input, but I haven't been able to find the real cause.

The week point of this circuit is that you need DC decoupling capacitors connected to the output, which adds some distortion to the original waveform. It may get better if you insert small capacitors at the high-impedance side (input) rather than using large capacitors at the output side, and operate the buffer-amplifiers with 2 power supplies, in which case we can omit the output capacitors.
I quickly implemented a RS232C receiver module for FPGA debugging purpose. I made it so quickly that it may still have some bugs left. It's set for 115200bps, 8-bit, one start-bit and one stop-bit. When it receives data with no stop-bit, then it's behaviour is indefinite. I tested on a FPGA chip and succeeded receiving values 0-255, so here it is...
The input clock frequency is fixed at 7.159090[MHz], and you need to slightly modify the source if you want different frequencies. If you use 7.159090[MHz] and get 115200bps speed, you need a divide-by-62 counter since 7159090/115200 ~= 62. But here I just used a divide-by-64 counter instead. In this case, the baud rate becomes 7159090/64 ~= 111861. So the baud rate error is (115200-111861)/115200 * 100 ~= 2.90[%]. Since we transfer 10 bits as a unit (start-bit + 8-bit data + stop-bit = 10 bits), it should be safe if baud rate error is less than 100[%] / 10 = 10 [%]. Therefore, 2.9[%] baud rate error should be OK.
1/64 counter is made by a 6-bit binary counter. When 1 --> 0 transistion occurs on the MSB (bit5), that's 64 count. Everytime we detect a start-bit, we reset the counter to prevent the baud rate error from being accumulated.
The input signal i_Rxd is assumed to be the TXD signal from a PC's COM port which is already converted to the CMOS level by using MAX232 or compatible level converting device.
Using the RS232C receiver, I transferred a music data to my FPGA-PSG and made it play a music.
I actually tried with a commercial PC Engine game music data, and it sounded like the real PC Engine. It's too bad that these commercial game music data are copyrighted, and I can't just let you listen to it here... I wonder if there are any public-domain music data for PC Engine PSG? I should look for them.
Since the transferring speed is only 115200bps, the number of bytes it can receive in one frame is as much as 192 bytes (=115200/60). If there are more to be sent, the plaback will slow down. (^_^; (now I think I should have made a USB receiver)
I also implemented noise channels. But when I connect the "PsgChannelN", the PSG channel with noise capability, everything refuses to work. I guess there must be some problems like clock skew, delay, etc., but I don't know.
In a previous section, I made a 15-bit ladder type DAC, but somehow I feel that 1-bit DAC may sound better. Well, I'll just compare when I make a 1-bit DAC.
I want to upload the latest source code here, but I'm concerned about infringing NEC-HE patent, which is probably still valid. So I decided to wait and see. Personally, I think there is no problem because I'm not trying to make any money from it, but just in case... If you really want the source code, just email me, and I'll personally give it to you (I'll take this style hereafter).

Hmm... I noticed that there isn't really anything this time. Well, I guess that's life.
Since there was nothing to update, I quickly made a small music data, and compared the playback result among the three compatible PSGs: FPGA-PSG, PC2E (software emulation), and the real PC Engine. It's not really a good comparison, since my music data is not awesome, and FPGA-PSG is still a bit buggy.
I listened to the "Piano Practice 7" in Final Fantasy V, and made the data. It is written in MML, and compiled for HuSIC driver using HuSIC-Watch(HESw0073).
In comparison, I think the real PC Engine has a pretty effective low-pass filter. The beggining of the FPGA-PSG slows down a little, due to the slow transfer-rate of RS232C.
Although FPGA-PSG and PC2E sound pretty much alike, I think PC2E is much better. FPGA-PSG has some kind of pop-noise (signal overshooing??), and it's very annoying. On the other hand, PC2E sounds smooth, and no annoying pop-noise. I don't know whether the pop-noise is due to a bug or the load being too heavy (probably a bug though).
By the way, is there anyone who would like to provide better HES data?
It seems that the way I implemented it was wrong. Now it utilizes more than 70% of XC3S200 gates, and started to become too large that I don't think it's practically useful anymore. I just leave the source code here as it is. The reason for more than 70% be NG is that PSG takes about 30%. I think CPU and PSG need to be fit within a single XC3S200, or I will end up using three or more XC3S200s for the whole PC Engine. But that probably won't satisfy my self-complacency. So I think it's good time to re-think about what I've done. (It's pretty amazing that I didn't realize it until I've written such a large amount of code...)
Comments at the beginning are pretty much false, so don't believe it. (^_^; Most of them are correct, but some are incorrect. I didn't do any sort of designing. It's rather a stupid work. But at least it's a stupid sample which tells you how you would end up if you keep writing this kind of code...
If you add BRK, RTI and some interrupt features, it may work as a CPU. But anyway, it already uses about 1300 slices with the area-optimization option, so I don't think it's for any use.
The main cause is I didn't manually decode instruction code. I thought compiler does this for me, but I guess I was excessively easy-going.
As for the block-transfer instructions, they do single byte transfer in 6 clocks, except for the initial and final stack push / pull overhead. If they do it in 8 bits per clock basis:
Block-transfer instructions are the first to be implemented on my next CPU.
I'm getting tired of setting up my ROM emulator on each time I run my code on my PC Engine, so I made a PCE development board. I learned that DEVELO BOX used 74HC157 and I thought it was a good idea, so I used the IC in my dev board as well. The main difference is that DEVELO BOX used RS232C, but mine only uses LPT for PC <--> PCE communication.



A 74HC14 and 1[kOhm] resistors are added for avoiding what is assumed to be signal reflection I experienced with the JTAG communication while back. While 1[kOhm] is suggested in the circuit diagram, I actually used 1.5[kOhm] since I saw many of them. I don't think this will make any difference, but I haven't tested, so I can't be sure.
I also made soldering pattern on the circuit editor along with the circuit diagram this time. If you do this in advance, then you don't need to think about the wiring direction on each side of the board going opposite anymore, so the time it takes for soldering work would be shortened. I think I completed it 2x faster than doing it by thinking the direction of wiring on-the-fly. You can also think to have as less wire-jumping on the soldering side (which takes time) as possible in advance, so the completed board can look much better. The device side rather looks uglier this time.
I soldered the circuit board between 1st line and 2nd line of D-SUB female connector pins. By doing so, you don't need to use a D-SUB 25-pin connector <--> universal circuit board converting board which is pretty expensive.
I noticed that control pad input pins of 74HC157 (A input pins) don't have pull-up resistors after I completed my board. So this requires control pad to be always connected when using the board. Otherwise, you need to pull-up A input pins of 74HC157 (2, 5, 11, 14).
It looks like the hardware is complete, so now I need a software. I'm going to write a small software which receives and executes programs from LPT using HuC or PCEAS, and make it bootable from the PCE CD-ROM system. I don't know whether it's going to work or not...
The quality of communication between PC and PCE has been bad, and there seem to be no way I can get it work. It seems PCE --> PC direction is failing. The quality of communication seemed too bad that I made it read the port value 4 times, and proceed only if all of them are the same, but still one byte or two out of 8kB became $10 or $20 while it had to be $00. I then thought about sending a check-sum byte, and resend the data if the sum didn't match. But how long would it take for sending a 8kB block of data when there are always one or two bytes of errors in it...?
I observed the signal on my oscilloscope, but the signal was actually pretty clean and there seemed to be no signal reflection. I don't know why it doesn't work...
The PC watches the LPT port by polling method while communicating, and so the CPU load becomes 100%. Then the CPU cooling fan becomes so fast and loud that greatly decreases my willingness to complete it.
PCE doesn't seem to boot from CD-Rs without trying for several ten seconds to several minutes. This decreases my willingness to complete it even more.
I think there is not much motivation left, so I will stop here (just imagine that you spent all weekend trying to make a PC Engine development board and failed). I will recall about this one when I do something similar in the future.
You might have noticed that I forgot soldering one part in the picture above, but it is fixed now. So this is not the cause.
Programs are left here.
And below are for those who want to see them a little bit now.
pcedev_pce.c is the PCE-side program, and pcedev_pc.c is the PC-side. It was the first time I wrote a program which does hand-shaking, and I thought it was pretty confusing (the way I wrote it was pretty bad too). It still contains some bugs, but oh well, who cares.
PCE's program is compiled by typing
PC's program uses Craig Peacock's PortTalk. Put files related to the PortTalk in the same folder as the PC's program, and type
I'm not doing well lately...
Fixed the board a little and made it work (revived). It turned out that it was too difficult to communicate accurately with the previous circuit. I will just upload the improved circut diagram.

A D-FF (74HC74) was added. The reason why you need a D-FF is that... if you think of PCE --> PC communication, you either think of these 2 ways:
For the case of 1, when CLR = 0 (i.e. the data bits is zero) PCE pad is selected (by the 74HC157) for the data input D0-D4, and LPT data can't be read. The hand shaking communication fails at this point.
For the case of 2, there would be no problem if CLR = 1 only when PCE reads data bits from LPT. But PCE needs to read LPT before setting CLR from 0 to 1, hand shaking fails.
Because I wasn't aware of these when I wrote the communication program, the communication of PCE --> PC direction was completely messed up.
Hence I decided to use a D-FF. By connecting the D-FF as the circuit diagram above, it will save the CLR value on positive transition of SEL. It will hold the CLR value until another positive transition of SEL will occur. So for the case of 1, this D-FF will make it possible to save CLR value (holding a data which may be zero) to D-FF and then set CLR = 1 to read LPT (explanation for the case of 2 is omitted).
In this update, I also added the pull-up resistors for the PCE pad side. The 1.5[kOhm] resistors might be too small since each 'L' level signal line will make several [mA] of current to go through the resistor from the 5[V] power supply. Use higher resistance if you are not comfortable with them.
I was thinking of improving them before putting them here, but I guess had no energy for that.
Below is a brief explanation of the above files. Read the source for detail.
The PC software will exit if any key is pressed when it waits for a request from PCE. Any request from PCE will not be responded until the PC software is invoked for the next time. PCE software is programmed so that it continues the same request again and again until it is accepted, so as soon the PC software is invoked again, it should continue from where it left off. I have seen this didn't happen once, so there might be some bugs left.
Once 6 is completed, PCE side will permanently repeat 2-6. So it is a good time to press a key during transfer of 6. This is something you probably want to change.
The only bank you can use without destructing existing code at the time startup.pce is invoked is bank4. If you want to use any other banks, then you should be ready to initialize everything (including the PCE software). Code related to HuC exist in bank3($6000-$7fff) and bank6($c000-$dfff). It's up to you to whether writing a code which cooperates with existing code or one which initializes everything to make them your own, although you need to cooperate with the BIOS in bank7($e000-$ffff) at least, since it is a ROM.
startup.pce is assembled by entering the following command.
The receive_byte routine in pcedev.asm is not tested.
The data transfer on 6 is rather slow since the PCE --> PC transfer program is written in HuC. On the other hand, the PCE --> PC on 5 is written in assembly. The transfer speed is probably 2-3 times faster in assembly. HuC6280 may be considered as a good old CPU which gave meaning to writing program in assembly. (?)
Interrupt needs to be disabled before PC <--> PCE data transfer.
The LPT read/write speed using PortTalk was 120k times per second on my PC. Since a Pentium machine performed 1M times per second under MS-DOS mode, I think recent PC can perform more than 120k reads/writes per second and transfer speed can become much faster. It currently takes several seconds for 8kB PCE --> PC data transfer and about 3-times slower than that for the opposite direction.
As I mentioned before, it seems that CD-Rs are fairly tough media for PCE to read, as it fails so many times. The PCE software is made so that you don't need to restart it once it successfully starts. But taking minutes of time for the initial start is quite bit of pain.

Now I can run a test which requires a lot of memory.
Since I have even less time than before, I will make it more like a note.
I'm trying to make a sort of logic analyzer using DRAM (72-pin SIMM), USB (USBN9604+PIC16F873), and CPLD (XC95108PC84 x 2). I put (?) because I'm not going to implement any kind of "analyzing" features. I will write in a note style to save my time.

とりあえず 後閑氏のページに公開されているプログラム(usbdvc1)を以下のように変更して使用しています。
; usbsym.h の最初の2行をコメントアウト ;#DEFINE USBINT PORTB,0 ;RB0 ;#DEFINE USBCS PORTB,1 ;RB1
; USBN9604 ←→ PIC16F873 接続 #DEFINE USBINT PORTB,0 ;RB0 #DEFINE USBA0 PORTA,0 ;A0 #DEFINE USBCS PORTA,1 ;/CS #DEFINE USBRD PORTA,2 ;/RD #DEFINE USBWR PORTA,3 ;/WR #DEFINE USBDATA PORTC ;DATA
; マクロ追加 (usbmac.h) USBDATA_IN MACRO BANK1 movlw 0ffh movwf TRISC BANK0 ENDM USBDATA_OUT MACRO BANK1 movlw 0 movwf TRISC BANK0 ENDM
init_PIC BANK1 movlf 0x6,ADCON1 ;RA0-5, RE0-2 = Digital movlf 0x00,TRISA ;RA0-5 = OUT movlf 0x01,TRISB ;RB0=INT movlf 0xff,TRISC ;RC0-7 = IN BANK0 BSF USBCS ;USBN9602 CS = 1 USBN9602 OFF BCF USBA0 ; A0 = L BSF USBRD ; /RD = H BSF USBWR ; /WR = H CLRF USTATUS ;USB用変数の初期化 CLRF STALLD ;USB用変数の初期化 CLRF DATAPID ;USB用変数の初期化 RETURN
; ディスクリプタ送信 (SENDDESC) 部分 NXT630 USBDATA_OUT MOVLW TXD0 IORLW 0xC0 BSF USBA0 MOVWF USBDATA BCF USBCS BCF USBWR BSF USBWR BCF USBA0 LP67 MOVF DESC_INDEX,W CALL DESC MOVWF USBDATA BCF USBWR BSF USBWR INCF DESC_INDEX,F DECFSZ G_CNT,F GOTO LP67 BSF USBCS USBDATA_IN
; PIC16F873 ← USBN9604 読み出し RD_USB ; write address MOVWF UADR USBDATA_OUT MOVFF UADR, USBDATA BSF USBA0 BCF USBCS BCF USBWR BSF USBWR BCF USBA0 ; read data from the address USBDATA_IN BCF USBRD MOVF USBDATA, W BSF USBRD BSF USBCS RETURN
; PIC16F873 ← USBN9604 連続読み出し RD_USB_BURST ; write address USBDATA_OUT BSF USBA0 MOVFF UADR, USBDATA BCF USBCS BCF USBWR BSF USBWR BCF USBA0 ; read data from the address USBDATA_IN LP20 BCF USBRD MOVFF USBDATA, INDF BSF USBRD INCF FSR,F DECFSZ G_CNT,F GOTO LP20 BSF USBCS RETURN
; PIC16F873 → USBN9604 書き込み WR_USB ; write address USBDATA_OUT BSF USBA0 MOVF UADR, W IORLW 0x80 MOVWF USBDATA BCF USBCS BCF USBWR BSF USBWR BCF USBA0 ; write data to the address MOVFF UDAT, USBDATA BCF USBWR BSF USBWR BSF USBCS USBDATA_IN RETURN
; PIC16F873 → USBN9604 連続書き込み WR_USB_BURST ; write address USBDATA_OUT BSF USBA0 MOVF UADR, W IORLW 0xC0 MOVWF USBDATA BCF USBCS BCF USBWR BSF USBWR BCF USBA0 ; write data to the address LP17 MOVFF INDF, USBDATA BCF USBWR BSF USBWR INCF FSR, F DECFSZ G_CNT,F GOTO LP17 BSF USBCS USBDATA_IN RETURN
PIC16F873 ←→ USBN9604 接続 (あとで回路図描きます) ------------------------------------------------------ RC0-RC7 ←→ D0-D7 RA0 → A0 RA1 → /CS RA2 → /RD RA3 → /WR RB0 ← INTR CLKIN ← CLKOUT /MCLR = H /RESET= H VDD = +5V MODE0 = MODE1 = DRQ = AGND = GND VSS = GND V3.3 → 1.5kOhm → D+ その他はNC VCC = +5V ------------------------------------------------------

I wrote a circuit diagram, which is not yet OK.

CPLD source is like this at the moment. It seems that EDO Page Mode Early Write Cycle and CBR Refresh Cycle is working on a simulation. CPLD --> DMA transfer feature is not written yet. I might have to use two CPLDs when I implement DMA.
Behavior of /OE is described in the CPLD code above, but I just looked at the circuit diagram of SIMM module and all of them were connected to GND (darn).
I think I finished wiring.


I will update the circuit diagram after verifying correct operations.
PIC, USBN9604, and part of CPLD is working at least, so I will debug the hardware by transferting data to PC.
Now it's a bit shady whether this is really going to work as a logic analyzer. But I think it's OK as long as I become able to use USB, and also able to make a working DRAM controller, which is a state machine.
It seems to be working to a certain extent. Address behavior is still a bit strange. I wrote a test code which does SIMM --> PC solely by PIC, and it was way too slow that it was just useless (it took several seconds for transferring 16kB). Maybe I need to implementing DMA. I wonder all can fit in a single CPLD...?
I think I'm learning alot from this sub-project.
Finally, it worked as I expected.

I will update circuit and other things soon.
I have set up a BBS. Take a look if you are interested.
I respectfully thank peple who made following technical pages.