Форум DL

Мой профиль

-----Original Message-----
From: Martin Schoeberl [mailto:martin.schoeberl@chello.at]
Sent: Thursday, November 24, 2005 12:19 PM
To: fpga-cpu@yahoogroups.com
Subject: Re: [fpga-cpu] Wishbone comments

> You are probably right for high clock rate interconnects or high latency
> accesses (DRAM, etc).
> However, WB works very well for single cycle accesses as you usually
> get in very simple SoCs
> with only primitve peripherals. Especially the early ACKs can get in the
> way of single cycle accesses.
> Holding the last output valid is only easy for the slave if it registers
> the addresses.

The idea is that the address and data register should reside inside
the slave and not the master.

> Anyway, I am a big fan of pipelined busses (ever seen the SCI link
> controller interface?) so I would like

No, have not seen it. Do you have a link to it handy?

At the momnet I'm also trying to collect different interconnect
standards to avoid to reinvent the wheel.

> to get a draft of your spec.

The idea for (some) pipeline support is twofold:

1.) The slave will provide more information than a single ack
or wait states. It will (if it is capable to do) signal the
number of clock cycles remaining till the read data is available
(or the write has finished) to the master. This feature allows
the pipelined master to prepare for the upcomming read.

2.) If the slave can provide pipelining the master can use
overlapped wr or rd requests. The slave has a static output
port that tells how many pipeline stages are available.
I call this 'pipeline level':
0 means non overlapping
1 a new rd/wr request can be issued in the same cycle
when the former data is read.
2 one earlier and
3 is the maximum level where you get full pipelining
on the basic read cycle with one wait state
(command - read - read - result).

The draft of the spec at the moment are few sketches on real
paper - takes some time to draw all diagrams for a document
(BTW does anybody know a tool for quick drawing of timing
diagrams).

I have a first implementation of SimpCon on JOP to test the
ideas: A master in JOP and a slave for SRAM access.

If you are interested in a early access I can upload the
VHDL files to the opencores CVS server.

Martin

> Martin Schoeberl schrieb:
>
>>After implementing the Wishbone interface for main memory access
>>from JOP I see several issues with the Wishbone specification that
>>makes it not the best choice for SoC interconnect.
>>
>>The Wishbone interface specification is still in the tradition of
>>microcomputer or backplane busses. However, for a SoC interconnect,
>>which is usually point-to-point, this is not the best approach.
>>
>>The master is requested to hold the address and data valid through
>>the whole read or write cycle. This complicates the connection to a
>>master that has the data valid only for one cycle. In this case the
>>address and data have to be registered *before* the Wishbone connect
>>or an expensive (time and resources) MUX has to be used. A register
>>results in one additional cycle latency. A better approach would be
>>to register the address and data in the slave. Than there is also
>>time to perform address decoding in the slave (before the address
>>register).
>>
>>There is a similar issue for the output data from the slave: As it
>>is only valid for a single cycle it has to be registered by the
>>master when the processor is not reading it immediately. Therefore,
>>the slave should keep the last valid data at it's output even when
>>wb.stb is not assigned anymore (which is no issue from the hardware
>>complexity).
>>
>>The Wishbone connection for JOP resulted in an unregistered Wishbone
>>memory interface and registers for the address and data in the
>>Wishbone master. However, for fast address and control output (tco)
>>and short setup time (tsu) we want the registers in the IO-pads of
>>the FPGA. With the registers buried in the WB master it takes some
>>effort to set the right constraints for the Synthesizer to implement
>>such IO-registers.
>>
>>The same issue is true for the control signals. The translation from
>>the wb.cyc, wb.stb and wb.we signals to ncs, noe and nwe for the
>>SRAM are on the critical path.
>>
>>The ack signal is too late for a pipelined master. We would need to
>>know it *earlier* when the next data will be available --- and this
>>is possible, as we know in the slave when the data from the SRAM
>>will arrive. A work around solution is a non-WB-conforming early ack
>>signal.
>>
>>Due to the fact that the data registers not inside the WB interface
>>we need an extra WB interface for the Flash/NAND interface (on the
>>Cyclone board). We cannot afford the address decoding and a MUX in
>>the data read path without registers. This would result in an extra
>>cycle for the memory read due to the combinational delay.
>>
>>In the WB specification (AFAIK) there is no way to perform pipelined
>>read or write. However, for blocked memory transfers (e.g. cache
>>load) this is the usual way to get a good performance.
>>
>>Conclusion -- I would prefer:
>>
>> * Address and data (in/out) register in the slave
>> * A way to know earlier when data will be available (or
>> a write has finished)
>> * Pipelining in the slave
>>
>>As a result from this experience I'm working on a new SoC
>>interconnect (working name SimpCon) definition that should avoid the
>>mentioned issues and should be still easy to implement the master
>>and slave.
>>
>>As there are so many projects available that implement the WB
>>interface I will provide bridges between SimpCon and WB. For IO
>>devices the former arguments do not apply to that extent as the
>>pressure for low latency access and pipelining is not high.
>>Therefore, a bridge to WB IO devices can be a practical solution for
>>design reuse.
>>
>>Martin

Мой профиль

-----Original Message-----
From: fpga-cpu@yahoogroups.com [mailto:fpga-cpu@yahoogroups.com] On Behalf Of Tommy Thorn
Sent: Monday, March 10, 2008 7:02 PM
To: fpga-cpu@yahoogroups.com
Subject: Re: [fpga-cpu] interconnection between FPGA and PC

--- RANJITH KUMAR REDDY <er_ranjith_edula@yahoo.co.in>
wrote:
> I am actually trying to do a project on Using
> FPGA to implement
> Floating point multiplier were the FPGA does the
> required
> multiplication for microprocessor to speed up, for
> which i am
> writing a Verilog code and dump it into FPGA now i
> want to simulate
> it and see the performance. I just cant buy a micro
> processor to
> interact with FPGA so i was thinking to simulate it
> Using some sofware simulator like 8085 simulator.

Assuming that you're trying to design an embedded
system, what you're trying to do might not be a good
idea.

Communication latency will dominate the time to
compute
in the coprocessor. Besides, FPGA are much faster than
nearly all microcontrollers. You'd be much better of
just integrating the floating point core with a
softcore of your choice. There are oogles of options
available, but I'd obviously recommend my own MIPS
implementation YARI (http://repo.or.cz/w/yari.git)

As for interfacing with the outside world (eg a PC)
it really all depends on what hardware you have
available.
PCI-E is the fastest,
Ethernet is the most flexible and versatile,
RS232 serial is the slowest and simplest (fewer pins),
USB is ubiquitous, etc.

Use what you got.

Tommy

Мой профиль

-----Original Message-----
From: fpga-cpu@yahoogroups.com [mailto:fpga-cpu@yahoogroups.com] On Behalf Of :: aH[sIM] ::
Sent: Friday, March 21, 2008 6:26 PM
To: fpga-cpu@yahoogroups.com
Subject: [fpga-cpu] Some Looping error i think

Here's the scenario, typical camera have vsync,href,pclk and 8-bits data.
vsync represents new frame
href represents new line
pclk represents new pixel

I would like to capture the first 50x50 pixels of the frame and another frame after the 60th frame.

All I need is two frames.

I'm getting this error
Quote:

Error (10519): VHDL Type or Variable Declaration error at
camera.vhd(44): bounds of type or variable range must have same type
Quote:

Error (10515): VHDL type mismatch error at camera.vhd(44): integer type does not match string literal

Here my code:
Quote:
LIBRARY ieee;
USE ieee.std_logic_1164.all;
USE ieee.numeric_std.all;
use ieee.std_logic_arith.all;
use ieee.std_logic_unsigned.all;

ENTITY camera IS

PORT
(
start : IN STD_LOGIC;
pixel : IN STD_LOGIC_VECTOR(7 DOWNTO 0);
clk : IN STD_LOGIC;
vsync : IN STD_LOGIC;
href : IN STD_LOGIC;
pclk : IN STD_LOGIC;

pixel_out : OUT STD_LOGIC_VECTOR(7 DOWNTO 0);
reference : OUT STD_LOGIC
);

END camera;

ARCHITECTURE behave OF camera IS

SIGNAL frame_count : STD_LOGIC_VECTOR(6 DOWNTO 0);
SIGNAL line_count : STD_LOGIC_VECTOR(6 DOWNTO 0);
SIGNAL pixel_count : STD_LOGIC_VECTOR(6 DOWNTO 0);

BEGIN

PROCESS (vsync, href, pclk)
BEGIN

IF start='1' THEN--------------------------------------------STARTS THE CAPTURING SEQUENCE

frame_count <= (OTHERS => '0');
line_count <= (OTHERS => '0');
pixel_count <= (OTHERS => '0');

IF(vsync'EVENT AND vsync = '1' ) THEN
FOR frame_count IN "0000000" TO "111100" LOOP
frame_count <= frame_count + "0000001"; -------------COUNTS FRAME UNTIL 60

IF(href'EVENT AND href = '1' ) THEN
FOR line_count IN "0000000" TO "110010" LOOP
line_count <= line_count + "0000001";---------COUNTS LINE UNTIL 50

IF(pclk'EVENT AND pclk = '1' ) THEN
FOR pixel_count IN "0000000" TO "110010" LOOP
pixel_count <= pixel_count + "0000001"; --COUNTS PIXEL UNTIL 50

pixel_out<=pixel;-----------------------------OUTPUTS THE PIXEL DATA
reference<='1';------------------------------FOR SAVING THE DATA INTO RAM PURPOSES
END LOOP;

END IF;
END LOOP;
END IF;
END LOOP;
END IF;
ELSE

null;
END IF;

END PROCESS;

END behave;

Мой профиль

-----Original Message-----
From: fpga-cpu@yahoogroups.com [mailto:fpga-cpu@yahoogroups.com] On Behalf Of John Kent
Sent: Thursday, May 01, 2008 7:17 PM
To: fpga-cpu@yahoogroups.com
Subject: [fpga-cpu] Micro16

Hi Richard,

Yes, it was originally intended as a replacement for a state machine to
read a compact flash card.
In the latest version I'm currently working on, I've combined the RTS
(Return from Subroutine)
and RTI (Return from Interrupt) instructions, reordered the opcodes, and
added a multiply instruction.
The CPU alone now uses about 9% of a XC3S200. With the UART, Timer and
I/O port it's about
18%. I haven't posted this version yet.

It may slow the CPU down, but I'm wondering if I should turn the JSR
instruction into a vectored
Software Interrupt. i.e. push the condition codes and the accumulator as
well as the return PC.
The RTS would simply skip over the accumulator and condition codes,
while the RTI would pop them.
I can't remember why I wanted to do that but I think it might have had
something to do with the ability
to set break points in the code. There are complications in implementing
break points because you don't
have access to the stack pointer so you have to do tricks like scan for
a particular return address to find
the bottom of the stack.

I'm trying to design a cross point data switch for it so I can build an
array of these cores.
I spent a day or two trying to work out a parallel data switch, but got
in a confused mess.
I need to tackle it in sections rather than trying to do the whole lot
at once.
I need an I/O port with handshaking, which I might make serial rather
than parallel,
a multi-port arbitrator, and a cross point switch.

This are more important things I should be working on at the moment, so
I might have to take
a break from it.

John.

>
> John,
>
> I really like your Micro16 project. I don't have an application at
> the moment (summer is coming and it is time to play with boats) but I
> am thinking that it would be a nice embedded controller for those
> times where it seems like too much effort to create a state machine.
>
> Thanks for the link!
>
> Richard
>
>

--
http://www.johnkent.com.au
http://members.optushome.com.au/jekent