前言
3.图像写请求模块(img_write_req_gen.v)仿真
5.图像通道写仲裁模块(mem_write_arbit.v)仿真(可选,非必须)
9.图像通道读仲裁模块(mem_read_arbit.v)仿真(可选,非必须)
前言
上一篇文章谈到如何利用Xilinx 7系列FPGA实现DDR3 SDRAM的读写操作(详情请参见基于FPGA+MIG+AXI4实现DDR3 SDRAM读写操作(附代码)_春风细雨无声的博客-CSDN博客),大家积极留言或发消息联系希望能用仿真的形式和大家进一步沟通。本篇文章利用第三方软件Modelsim(没有该软件的小伙伴可以联系我获取)对工程中涉及到的各模块进行仿真,给出仿真时序图,帮助大家进一步理解FPGA实现DDR3 SDRAM读写的整个过程。
一、仿真工程结构
首先建立仿真工程,需要包含Test Bench文件、源文件顶层模块、DDR3模型和连接线延迟等模块,工程结构如图所示。
二、Test Bench文件代码
Test Bench文件是整个工程仿真的第一步,如何用代码来描述Test Bench文件可能是大家比较关心的问题,此处直接附上源码,如下。
`timescale 1ps/100fs
module tb_top_design;
//***************************************************************************
// Traffic Gen related parameters
//***************************************************************************
parameter SIMULATION = "TRUE";
parameter BEGIN_ADDRESS = 32'h00000000;
parameter END_ADDRESS = 32'h00000fff;
parameter PRBS_EADDR_MASK_POS = 32'hff000000;
//***************************************************************************
// The following parameters refer to width of various ports
//***************************************************************************
parameter COL_WIDTH = 10;
// # of memory Column Address bits.
parameter CS_WIDTH = 1;
// # of unique CS outputs to memory.
parameter DM_WIDTH = 2;
// # of DM (data mask)
parameter DQ_WIDTH = 16;
// # of DQ (data)
parameter DQS_WIDTH = 2;
parameter DQS_CNT_WIDTH = 1;
// = ceil(log2(DQS_WIDTH))
parameter DRAM_WIDTH = 8;
// # of DQ per DQS
parameter ECC = "OFF";
parameter RANKS = 1;
// # of Ranks.
parameter ODT_WIDTH = 1;
// # of ODT outputs to memory.
parameter ROW_WIDTH = 15;
// # of memory Row Address bits.
parameter ADDR_WIDTH = 29;
// # = RANK_WIDTH + BANK_WIDTH
// + ROW_WIDTH + COL_WIDTH;
// Chip Select is always tied to low for
// single rank devices
//***************************************************************************
// The following parameters are mode register settings
//***************************************************************************
parameter BURST_MODE = "8";
// DDR3 SDRAM:
// Burst Length (Mode Register 0).
// # = "8", "4", "OTF".
// DDR2 SDRAM:
// Burst Length (Mode Register).
// # = "8", "4".
parameter CA_MIRROR = "OFF";
// C/A mirror opt for DDR3 dual rank
//***************************************************************************
// The following parameters are multiplier and divisor factors for PLLE2.
// Based on the selected design frequency these parameters vary.
//***************************************************************************
parameter CLKIN_PERIOD = 5000;
// Input Clock Period
//***************************************************************************
// Simulation parameters
//***************************************************************************
parameter SIM_BYPASS_INIT_CAL = "FAST";
// # = "SIM_INIT_CAL_FULL" - Complete
// memory init &
// calibration sequence
// # = "SKIP" - Not supported
// # = "FAST" - Complete memory init & use
// abbreviated calib sequence
//***************************************************************************
// IODELAY and PHY related parameters
//***************************************************************************
parameter TCQ = 100;
//***************************************************************************
// IODELAY and PHY related parameters
//***************************************************************************
parameter RST_ACT_LOW = 1;
// =1 for active low reset,
// =0 for active high.
//***************************************************************************
// Referece clock frequency parameters
//***************************************************************************
parameter REFCLK_FREQ = 200.0;
// IODELAYCTRL reference clock frequency
//***************************************************************************
// System clock frequency parameters
//***************************************************************************
parameter tCK = 2500;
// memory tCK paramter.
// # = Clock Period in pS.
parameter nCK_PER_CLK = 4;
// # of memory CKs per fabric CLK
//***************************************************************************
// AXI4 Shim parameters
//***************************************************************************
parameter C_S_AXI_ID_WIDTH = 4;
// Width of all master and slave ID signals.
// # = >= 1.
parameter C_S_AXI_ADDR_WIDTH = 29;
// Width of S_AXI_AWADDR, S_AXI_ARADDR, M_AXI_AWADDR and
// M_AXI_ARADDR for all SI/MI slots.
// # = 32.
parameter C_S_AXI_DATA_WIDTH = 64;
// Width of WDATA and RDATA on SI slot.
// Must be <= APP_DATA_WIDTH.
// # = 32, 64, 128, 256.
parameter C_S_AXI_SUPPORTS_NARROW_BURST = 0;
// Indicates whether to instatiate upsizer
// Range: 0, 1
//***************************************************************************
// Debug and Internal parameters
//***************************************************************************
parameter DEBUG_PORT = "OFF";
// # = "ON" Enable debug signals/controls.
// = "OFF" Disable debug signals/controls.
//***************************************************************************
// Debug and Internal parameters
//***************************************************************************
parameter DRAM_TYPE = "DDR3";
//**************************************************************************//
// Local parameters Declarations
//**************************************************************************//
localparam real TPROP_DQS = 0.00;
// Delay for DQS signal during Write Operation
localparam real TPROP_DQS_RD = 0.00;
// Delay for DQS signal during Read Operation
localparam real TPROP_PCB_CTRL = 0.00;
// Delay for Address and Ctrl signals
localparam real TPROP_PCB_DATA = 0.00;
// Delay for data signal during Write operation
localparam real TPROP_PCB_DATA_RD = 0.00;
// Delay for data signal during Read operation
localparam MEMORY_WIDTH = 16;
localparam NUM_COMP = DQ_WIDTH/MEMORY_WIDTH;
localparam ECC_TEST = "OFF" ;
localparam ERR_INSERT = (ECC_TEST == "ON") ? "OFF" : ECC ;
localparam real REFCLK_PERIOD = (1000000.0/(2*REFCLK_FREQ));
localparam RESET_PERIOD = 200000; //in pSec
localparam real SYSCLK_PERIOD = tCK;
//**************************************************************************//
// Wire Declarations
//**************************************************************************//
reg sys_rst_n;
wire sys_rst;
reg sys_clk_i;
reg clk_ref_i;
wire ddr3_reset_n;
wire [DQ_WIDTH-1:0] ddr3_dq_fpga;
wire [DQS_WIDTH-1:0] ddr3_dqs_p_fpga;
wire [DQS_WIDTH-1:0] ddr3_dqs_n_fpga;
wire [ROW_WIDTH-1:0] ddr3_addr_fpga;
wire [3-1:0] ddr3_ba_fpga;
wire ddr3_ras_n_fpga;
wire ddr3_cas_n_fpga;
wire ddr3_we_n_fpga;
wire [1-1:0] ddr3_cke_fpga;
wire [1-1:0] ddr3_ck_p_fpga;
wire [1-1:0] ddr3_ck_n_fpga;
wire init_calib_complete;
wire tg_compare_error;
wire [(CS_WIDTH*1)-1:0] ddr3_cs_n_fpga;
wire [DM_WIDTH-1:0] ddr3_dm_fpga;
wire [ODT_WIDTH-1:0] ddr3_odt_fpga;
reg [(CS_WIDTH*1)-1:0] ddr3_cs_n_sdram_tmp;
reg [DM_WIDTH-1:0] ddr3_dm_sdram_tmp;
reg [ODT_WIDTH-1:0] ddr3_odt_sdram_tmp;
wire [DQ_WIDTH-1:0] ddr3_dq_sdram;
reg [ROW_WIDTH-1:0] ddr3_addr_sdram [0:1];
reg [3-1:0] ddr3_ba_sdram [0:1];
reg ddr3_ras_n_sdram;
reg ddr3_cas_n_sdram;
reg ddr3_we_n_sdram;
wire [(CS_WIDTH*1)-1:0] ddr3_cs_n_sdram;
wire [ODT_WIDTH-1:0] ddr3_odt_sdram;
reg [1-1:0] ddr3_cke_sdram;
wire [DM_WIDTH-1:0] ddr3_dm_sdram;
wire [DQS_WIDTH-1:0] ddr3_dqs_p_sdram;
wire [DQS_WIDTH-1:0] ddr3_dqs_n_sdram;
reg [1-1:0] ddr3_ck_p_sdram;
reg [1-1:0] ddr3_ck_n_sdram;
reg sys_clk_pix_i ;
reg sys_clk_200m_i ;
reg sys_clk_50m_i ;
reg sys_clk_14p75m_i ;
parameter CLKIN_PERIOD2 = 40;//25MHZ
parameter CLKIN_PERIOD3 = 25;//40MHZ
parameter CLKIN_PERIOD4 = 5;//200MHZ
parameter CLKIN_PERIOD5 = 20;//50MHZ
parameter CLKIN_PERIOD6 = 67.8;//14.75MHZ
//**************************************************************************//
//**************************************************************************//
// Reset Generation
//**************************************************************************//
initial begin
sys_rst_n = 1'b0;
#RESET_PERIOD
sys_rst_n = 1'b1;
end
assign sys_rst = RST_ACT_LOW ? sys_rst_n : ~sys_rst_n;
//**************************************************************************//
// Clock Generation
//**************************************************************************//
initial
sys_clk_i = 1'b0;
always
// sys_clk_i = #(CLKIN_PERIOD2/2) ~sys_clk_i;
sys_clk_i = #20000 ~sys_clk_i;
initial
sys_clk_pix_i = 1'b0;
always
// sys_clk_pix_i = #(CLKIN_PERIOD3/2) ~sys_clk_pix_i;
sys_clk_pix_i = #12500 ~sys_clk_pix_i;
initial
sys_clk_200m_i = 1'b0;
always
// sys_clk_200m_i = #(CLKIN_PERIOD4/2) ~sys_clk_200m_i;
sys_clk_200m_i = #2500 ~sys_clk_200m_i;
initial
sys_clk_50m_i = 1'b0;
always
// sys_clk_50m_i = #(CLKIN_PERIOD5/2) ~sys_clk_50m_i;
sys_clk_50m_i = #10000 ~sys_clk_50m_i;
initial
sys_clk_14p75m_i = 1'b0;
always
// sys_clk_14p75m_i = #(CLKIN_PERIOD6/2) ~sys_clk_14p75m_i;
sys_clk_14p75m_i = #33900 ~sys_clk_14p75m_i;
//clk_pix
//clk_200m
//clk_50m
//clk_14p75m
initial
clk_ref_i = 1'b0;
always
clk_ref_i = #REFCLK_PERIOD ~clk_ref_i;
always @( * ) begin
ddr3_ck_p_sdram <= #(TPROP_PCB_CTRL) ddr3_ck_p_fpga;
ddr3_ck_n_sdram <= #(TPROP_PCB_CTRL) ddr3_ck_n_fpga;
ddr3_addr_sdram[0] <= #(TPROP_PCB_CTRL) ddr3_addr_fpga;
ddr3_addr_sdram[1] <= #(TPROP_PCB_CTRL) (CA_MIRROR == "ON") ?
{ddr3_addr_fpga[ROW_WIDTH-1:9],
ddr3_addr_fpga[7], ddr3_addr_fpga[8],
ddr3_addr_fpga[5], ddr3_addr_fpga[6],
ddr3_addr_fpga[3], ddr3_addr_fpga[4],
ddr3_addr_fpga[2:0]} :
ddr3_addr_fpga;
ddr3_ba_sdram[0] <= #(TPROP_PCB_CTRL) ddr3_ba_fpga;
ddr3_ba_sdram[1] <= #(TPROP_PCB_CTRL) (CA_MIRROR == "ON") ?
{ddr3_ba_fpga[3-1:2],
ddr3_ba_fpga[0],
ddr3_ba_fpga[1]} :
ddr3_ba_fpga;
ddr3_ras_n_sdram <= #(TPROP_PCB_CTRL) ddr3_ras_n_fpga;
ddr3_cas_n_sdram <= #(TPROP_PCB_CTRL) ddr3_cas_n_fpga;
ddr3_we_n_sdram <= #(TPROP_PCB_CTRL) ddr3_we_n_fpga;
ddr3_cke_sdram <= #(TPROP_PCB_CTRL) ddr3_cke_fpga;
end
always @( * )
ddr3_cs_n_sdram_tmp <= #(TPROP_PCB_CTRL) ddr3_cs_n_fpga;
assign ddr3_cs_n_sdram = ddr3_cs_n_sdram_tmp;
always @( * )
ddr3_dm_sdram_tmp <= #(TPROP_PCB_DATA) ddr3_dm_fpga;//DM signal generation
assign ddr3_dm_sdram = ddr3_dm_sdram_tmp;
always @( * )
ddr3_odt_sdram_tmp <= #(TPROP_PCB_CTRL) ddr3_odt_fpga;
assign ddr3_odt_sdram = ddr3_odt_sdram_tmp;
// Controlling the bi-directional BUS
genvar dqwd;
generate
for (dqwd = 1;dqwd < DQ_WIDTH;dqwd = dqwd+1) begin : dq_delay
WireDelay #
(
.Delay_g (TPROP_PCB_DATA),
.Delay_rd (TPROP_PCB_DATA_RD),
.ERR_INSERT ("OFF")
)
u_delay_dq
(
.A (ddr3_dq_fpga[dqwd]),
.B (ddr3_dq_sdram[dqwd]),
.reset (sys_rst_n),
.phy_init_done (init_calib_complete)
);
end
WireDelay #
(
.Delay_g (TPROP_PCB_DATA),
.Delay_rd (TPROP_PCB_DATA_RD),
.ERR_INSERT ("OFF")
)
u_delay_dq_0
(
.A (ddr3_dq_fpga[0]),
.B (ddr3_dq_sdram[0]),
.reset (sys_rst_n),
.phy_init_done (init_calib_complete)
);
endgenerate
genvar dqswd;
generate
for (dqswd = 0;dqswd < DQS_WIDTH;dqswd = dqswd+1) begin : dqs_delay
WireDelay #
(
.Delay_g (TPROP_DQS),
.Delay_rd (TPROP_DQS_RD),
.ERR_INSERT ("OFF")
)
u_delay_dqs_p
(
.A (ddr3_dqs_p_fpga[dqswd]),
.B (ddr3_dqs_p_sdram[dqswd]),
.reset (sys_rst_n),
.phy_init_done (init_calib_complete)
);
WireDelay #
(
.Delay_g (TPROP_DQS),
.Delay_rd (TPROP_DQS_RD),
.ERR_INSERT ("OFF")
)
u_delay_dqs_n
(
.A (ddr3_dqs_n_fpga[dqswd]),
.B (ddr3_dqs_n_sdram[dqswd]),
.reset (sys_rst_n),
.phy_init_done (init_calib_complete)
);
end
endgenerate
//===========================================================================
// FPGA Memory Controller
//===========================================================================
top_design #(
.MEM_DATA_BITS (64 ),
.WRITE_DATA_BITS (16 ),
.ADDR_BITS (25 ),
.BUSRT_BITS (10 ),
.BURST_SIZE (64 ),
.ADDR_SEC1 (0 ),
.ADDR_SEC2 (2073600 ),
.ADDR_SEC3 (4147200 ),
.ADDR_SEC4 (6220800 )
)
u_top_design(
.clk_in (sys_clk_i ),
.clk_pix (sys_clk_pix_i ),
.clk_200m (sys_clk_200m_i ),
.clk_50m (sys_clk_50m_i ),
.clk_14p75m (sys_clk_14p75m_i ),
.rst_sys_pix (1'd0 ),
.dsp6678_spics1 (1'd1 ),
.dsp6678_spimosi (1'd1 ),
.DAC_DATA ( ),
.DA_CLK ( ),
.DA_SYNC ( ),
.DA_BLANK ( ),
.max706_mr_n ( ),
.gm7123_psave ( ),
.fpga_flash_cs_n ( ),
.fpga_flash_d ( ),
.ddr3_dq (ddr3_dq_fpga ),
.ddr3_dqs_n (ddr3_dqs_n_fpga ),
.ddr3_dqs_p (ddr3_dqs_p_fpga ),
.ddr3_addr (ddr3_addr_fpga ),
.ddr3_ba (ddr3_ba_fpga ),
.ddr3_ras_n (ddr3_ras_n_fpga ),
.ddr3_cas_n (ddr3_cas_n_fpga ),
.ddr3_we_n (ddr3_we_n_fpga ),
.ddr3_reset_n (ddr3_reset_n ),
.ddr3_ck_p (ddr3_ck_p_fpga ),
.ddr3_ck_n (ddr3_ck_n_fpga ),
.ddr3_cke (ddr3_cke_fpga ),
.ddr3_cs_n (ddr3_cs_n_fpga ),
.ddr3_dm (ddr3_dm_fpga ),
.ddr3_odt (ddr3_odt_fpga )
// output init_calib_complete
);
//**************************************************************************//
// Memory Models instantiations
//**************************************************************************//
genvar r,i;
generate
for (r = 0; r < CS_WIDTH; r = r + 1) begin: mem_rnk
if(DQ_WIDTH/16) begin: mem
for (i = 0; i < NUM_COMP; i = i + 1) begin: gen_mem
ddr3_model u_comp_ddr3
(
.rst_n (ddr3_reset_n),
.ck (ddr3_ck_p_sdram),
.ck_n (ddr3_ck_n_sdram),
.cke (ddr3_cke_sdram[r]),
.cs_n (ddr3_cs_n_sdram[r]),
.ras_n (ddr3_ras_n_sdram),
.cas_n (ddr3_cas_n_sdram),
.we_n (ddr3_we_n_sdram),
.dm_tdqs (ddr3_dm_sdram[(2*(i+1)-1):(2*i)]),
.ba (ddr3_ba_sdram[r]),
.addr (ddr3_addr_sdram[r]),
.dq (ddr3_dq_sdram[16*(i+1)-1:16*(i)]),
.dqs (ddr3_dqs_p_sdram[(2*(i+1)-1):(2*i)]),
.dqs_n (ddr3_dqs_n_sdram[(2*(i+1)-1):(2*i)]),
.tdqs_n (),
.odt (ddr3_odt_sdram[r])
);
end
end
if (DQ_WIDTH%16) begin: gen_mem_extrabits
ddr3_model u_comp_ddr3
(
.rst_n (ddr3_reset_n),
.ck (ddr3_ck_p_sdram),
.ck_n (ddr3_ck_n_sdram),
.cke (ddr3_cke_sdram[r]),
.cs_n (ddr3_cs_n_sdram[r]),
.ras_n (ddr3_ras_n_sdram),
.cas_n (ddr3_cas_n_sdram),
.we_n (ddr3_we_n_sdram),
.dm_tdqs ({ddr3_dm_sdram[DM_WIDTH-1],ddr3_dm_sdram[DM_WIDTH-1]}),
.ba (ddr3_ba_sdram[r]),
.addr (ddr3_addr_sdram[r]),
.dq ({ddr3_dq_sdram[DQ_WIDTH-1:(DQ_WIDTH-8)],
ddr3_dq_sdram[DQ_WIDTH-1:(DQ_WIDTH-8)]}),
.dqs ({ddr3_dqs_p_sdram[DQS_WIDTH-1],
ddr3_dqs_p_sdram[DQS_WIDTH-1]}),
.dqs_n ({ddr3_dqs_n_sdram[DQS_WIDTH-1],
ddr3_dqs_n_sdram[DQS_WIDTH-1]}),
.tdqs_n (),
.odt (ddr3_odt_sdram[r])
);
end
end
endgenerate
//***************************************************************************
// Reporting the test case status
// Status reporting logic exists both in simulation test bench (sim_tb_top)
// and sim.do file for ModelSim. Any update in simulation run time or time out
// in this file need to be updated in sim.do file as well.
//***************************************************************************
initial
begin : Logging
fork
begin : calibration_done
wait (init_calib_complete);
$display("Calibration Done");
#50000000.0;
if (!tg_compare_error) begin
$display("TEST PASSED");
end
else begin
$display("TEST FAILED: DATA ERROR");
end
disable calib_not_done;
$finish;
end
begin : calib_not_done
if (SIM_BYPASS_INIT_CAL == "SIM_INIT_CAL_FULL")
#2500000000.0;
else
#1000000000.0;
if (!init_calib_complete) begin
$display("TEST FAILED: INITIALIZATION DID NOT COMPLETE");
end
disable calibration_done;
$finish;
end
join
end
endmodule
2.图像数据源模块(img_data_gen.v)仿真
关于该模块的代码上一篇文章已经介绍过了且该模块比较简单,此处不再赘述,直接上仿真图。
2.1全局视角仿真图
数据源产生640×512分辨率的图像数据,共512行数据,每行有640个像素,每行之间有几个时钟的消隐期。
2.2局部视角仿真图
图像数据源文件(img_data_gen.v)输出场同步信号(fval)、行有效信号(lval)、数据有效信号(dval)和16bit数据。16bit数据是一个0~639的累加数。
3.图像写请求模块(img_write_req_gen.v)仿真
写请求信号根据场同步信号上升沿和图像帧写模块(frame_write)反馈回的ack信号来确定。写请求信号持续2个时钟周期。write_addr_index信号用于指示当前将图像写入DDR3 SDRAM哪个区(本项目将DDR3 SDRAM划分为4个区域),read_addr_index信号用于指示当前从DDR3 SDRAM哪个区取出图像数据。
4.图像帧写入模块(frame_write.v)仿真
4.1全局视角仿真图
通过全局视角仿真图可以看到,要实现这个模块还是有一定难度的。frame_write模块需要做2件事情:一、将16bit图像源数据转换为AXI4总线传输位宽64bit(上一篇文章将mig IP 核中的AXI4总线数据位宽设置为64bit);二、将转换后的64bit图像数据按照AXI4突发传输模式写入MIG IP模块中。下面将针对每一部分进行仿真图分析。
4.2局部视角仿真图
图像帧写入模块组成:frame_write = write_buf + frame_fifo_write。分别对每个子模块仿真图进行说明。
4.2.1write_buf
write_buf子模块实际上就是调用了1个异步FIFO进行跨时钟域(wr_clk为40MHz,rd_clk为100MHz)和数据位宽转换。rd_en使能信号由aq_axi_master模块控制。
4.2.2frame_fifo_write
此子模块实现将一幅640×512×16bit的图像数据通过AXI4总线写入MIG IP,进而经由MIG IP写入FPGA外部存储器件DDR3 SDRAM中。由前文可知,AXI4数据总线位宽设置为64bit,1次Burst传输64个数据,即64bit×64=16bit×256,也就是说,1次Burst可以传输256个图像像素值。对于一整幅图像来讲,需要Burst的次数=640×512×16/64/64=1280次。那么,怎么来实现1280次的突发呢?我们可以使用状态机来完成,直接给出实现代码如下:
always@(posedge mem_clk or posedge rst)
begin
if(rst == 1'b1)
begin
state <= S_IDLE;
write_len_latch <= ZERO[ADDR_BITS - 1:0];
wr_burst_addr <= ZERO[ADDR_BITS - 1:0];
wr_burst_req <= 1'b0;
write_cnt <= ZERO[ADDR_BITS - 1:0];
fifo_aclr <= 1'b0;
write_req_ack <= 1'b0;
wr_burst_len <= ZERO[BUSRT_BITS - 1:0];
end
else
case(state)
S_IDLE:
begin
if(write_req_d2 == 1'b1)
begin
state <= S_ACK;
end
write_req_ack <= 1'b0;
end
S_ACK:
begin
if(write_req_d2 == 1'b0)begin
state <= S_CHECK_FIFO;
fifo_aclr <= 1'b0;
write_req_ack <= 1'b0;
end
else begin
write_req_ack <= 1'b1;
fifo_aclr <= 1'b1;
if(write_addr_index_d1 == 2'd0)
wr_burst_addr <= write_addr_0;
else if(write_addr_index_d1 == 2'd1)
wr_burst_addr <= write_addr_1;
else if(write_addr_index_d1 == 2'd2)
wr_burst_addr <= write_addr_2;
else if(write_addr_index_d1 == 2'd3)
wr_burst_addr <= write_addr_3;
write_len_latch <= write_len_d1;
end
write_cnt <= ZERO[ADDR_BITS - 1:0];
end
S_CHECK_FIFO:
begin
if(write_req_d2 == 1'b1)begin
state <= S_ACK;
end
else if(rdusedw >= BURST_SIZE)begin
state <= S_WRITE_BURST;
wr_burst_len <= BURST_SIZE[BUSRT_BITS - 1:0];
wr_burst_req <= 1'b1;
end
end
S_WRITE_BURST:
begin
if(wr_burst_finish == 1'b1)begin
wr_burst_req <= 1'b0;
state <= S_WRITE_BURST_END;
write_cnt <= write_cnt + BURST_SIZE[ADDR_BITS - 1:0];
wr_burst_addr <= wr_burst_addr + BURST_SIZE[ADDR_BITS - 1:0];
end
end
S_WRITE_BURST_END:
begin
if(write_req_d2 == 1'b1)begin
state <= S_ACK;
end
else if(write_cnt < write_len_latch)begin
state <= S_CHECK_FIFO;
end
else begin
state <= S_END;
end
end
S_END:
begin
state <= S_IDLE;
end
default:
state <= S_IDLE;
endcase
end
状态机状态为IDLE(0)状态,当检测到img_write_req_gen模块输出的write_req信号到来时,状态跳转至ACK(1)状态。在ACK状态下,我们给出写请求反馈、write_buf模块FIFO复位信号、AXI4写总线地址值等,准备开始写数据。这些完成之后,进入Check_FIFO状态,来判断write_buf模块中的FIFO是否已经存够1次Burst的数据(即前文计算的256个16bit像素值),若没有存够,继续等待;如果已存够,直接进入Write_Burst(3)状态。
下图为进入Write_Burst(3)状态仿真图。此时wr_burst_req拉高输出至aq_axi_master模块,告诉aq_axi_master模块我已准备好了,可以进行数据写操作了。此时我们可以看到wr_burst_data_req信号开始变化,这个信号和write_buf子模块中的rd_en信号相连,也就是说,64bit的数据在wr_burst_data_req信号为高时开始从FIFO中输出。
等到Burst为64次时,进入write_burst_end(4)状态,如下图所示,此时wr_burst_addr需要加64,重新进入下一轮状态,直到Burst写入次数为1280时,第一幅640×512×16bit图像才完全写入DDR3 SDRAM中。
5.图像通道写仲裁模块(mem_write_arbit.v)仿真(可选,非必须)
本工程中,需要将多路图像数据写入DDR3 SDRAM中,因此用到仲裁模块。若只有1路图像数据,则仲裁模块不是必须的。仿真时序图如下所示,可以利用状态机实现。
6.AXI主机端读写(aq_axi_master)模块仿真
aq_axi_master模块利用状态机来完成AXI写地址、写数据、写响应、读地址和读数据5个通道操作,直接给出实现代码如下:
6.1写部分状态机
always @(posedge ACLK or negedge ARESETN)
begin
if(!ARESETN) begin
wr_state <= S_WR_IDLE;
reg_wr_adrs[31 :0] <= 32'd0;
reg_wr_len[31 :0] <= 32'd0;
reg_awvalid <= 1'b0;
reg_wvalid <= 1'b0;
reg_w_last <= 1'b0;
reg_w_len[7 :0] <= 8'd0;
reg_w_stb[7 :0] <= 8'd0;
reg_wr_status[1 :0] <= 2'd0;
reg_w_count[3 :0] <= 4'd0;
reg_r_count[3 :0] <= 4'd0;
wr_chkdata <= 8'd0;
rd_chkdata <= 8'd0;
resp <= 2'd0;
rd_first_data <= 1'b0;
end
else begin
if(MASTER_RST) begin
wr_state <= S_WR_IDLE;
end
else begin
case(wr_state)
S_WR_IDLE: begin
if(WR_START) begin
wr_state <= S_WA_WAIT;
reg_wr_adrs[31:0] <= WR_ADRS[31:0];
reg_wr_len[31:0] <= WR_LEN[31:0] - 32'd1;
rd_first_data <= 1'b1;
end
reg_awvalid <= 1'b0;
reg_wvalid <= 1'b0;
reg_w_last <= 1'b0;
reg_w_len[7:0] <= 8'd0;
reg_w_stb[7:0] <= 8'd0;
reg_wr_status[1:0] <= 2'd0;
end
S_WA_WAIT: begin
if(!WR_FIFO_AEMPTY | (reg_wr_len[31:11] == 21'd0)) begin
wr_state <= S_WA_START;
end
rd_first_data <= 1'b0;
end
S_WA_START: begin
wr_state <= S_WD_WAIT;
reg_awvalid <= 1'b1;
reg_wr_len[31:11] <= reg_wr_len[31:11] - 21'd1;
if(reg_wr_len[31:11] != 21'd0) begin
reg_w_len[7:0] <= 8'hFF;
reg_w_last <= 1'b0;
reg_w_stb[7:0] <= 8'hFF;
end
else begin
reg_w_len[7:0] <= reg_wr_len[10:3];
reg_w_last <= 1'b1;
reg_w_stb[7:0] <= 8'hFF;
end
end
S_WD_WAIT: begin
if(M_AXI_AWREADY) begin
wr_state <= S_WD_PROC;
reg_awvalid <= 1'b0;
reg_wvalid <= 1'b1;
end
end
S_WD_PROC: begin
if(M_AXI_WREADY & ~WR_FIFO_EMPTY) begin
if(reg_w_len[7:0] == 8'd0) begin
wr_state <= S_WR_WAIT;
reg_wvalid <= 1'b0;
reg_w_stb[7:0] <= 8'h00;
end
else begin
reg_w_len[7:0] <= reg_w_len[7:0] -8'd1;
end
end
end
S_WR_WAIT: begin
if(M_AXI_BVALID) begin
reg_wr_status[1:0] <= reg_wr_status[1:0] | M_AXI_BRESP[1:0];
if(reg_w_last) begin
wr_state <= S_WR_DONE;
end
else begin
wr_state <= S_WA_WAIT;
reg_wr_adrs[31:0] <= reg_wr_adrs[31:0] + 32'd2048;
end
end
end
S_WR_DONE: begin
wr_state <= S_WR_IDLE;
end
default: begin
wr_state <= S_WR_IDLE;
end
endcase
end
end
end
6.2读部分状态机
always @(posedge ACLK or negedge ARESETN) begin
if(!ARESETN) begin
rd_state <= S_RD_IDLE;
reg_rd_adrs[31:0] <= 32'd0;
reg_rd_len[31:0] <= 32'd0;
reg_arvalid <= 1'b0;
reg_r_len[7:0] <= 8'd0;
end else begin
case(rd_state)
S_RD_IDLE: begin
if(RD_START) begin
rd_state <= S_RA_WAIT;
reg_rd_adrs[31:0] <= RD_ADRS[31:0];
reg_rd_len[31:0] <= RD_LEN[31:0] -32'd1;
end
reg_arvalid <= 1'b0;
reg_r_len[7:0] <= 8'd0;
end
S_RA_WAIT: begin
if(~RD_FIFO_AFULL) begin
rd_state <= S_RA_START;
end
end
S_RA_START: begin
rd_state <= S_RD_WAIT;
reg_arvalid <= 1'b1;
reg_rd_len[31:11] <= reg_rd_len[31:11] -21'd1;
if(reg_rd_len[31:11] != 21'd0) begin
reg_r_last <= 1'b0;
reg_r_len[7:0] <= 8'd255;
end else begin
reg_r_last <= 1'b1;
reg_r_len[7:0] <= reg_rd_len[10:3];
end
end
S_RD_WAIT: begin
if(M_AXI_ARREADY) begin
rd_state <= S_RD_PROC;
reg_arvalid <= 1'b0;
end
end
S_RD_PROC: begin
if(M_AXI_RVALID) begin
if(M_AXI_RLAST) begin
if(reg_r_last) begin
rd_state <= S_RD_DONE;
end else begin
rd_state <= S_RA_WAIT;
reg_rd_adrs[31:0] <= reg_rd_adrs[31:0] + 32'd2048;
end
end else begin
reg_r_len[7:0] <= reg_r_len[7:0] -8'd1;
end
end
end
S_RD_DONE:begin
rd_state <= S_RD_IDLE;
end
endcase
end
end
6.3全局视角仿真图
aq_axi_master模块完成AXI写地址、写数据、写响应、读地址和读数据5个通道操作,其模块逻辑信号较多,但比较规律,如下图所示。
6.4局部视角仿真图
6.4.1写地址通道
写通道的ID定义为0,当M_AXI_AWREAD和M_AXI_AWVALID为有效时,将地址2073600写入。前文已知AXI总线位宽为64,设置M_AXI_AWSIZE为3;M_AXI_AWBURST设置为“01”突发类型;M_AXI_AWLOCK设置为0;M_AXI_AWCACHE设置为0011;M_AXI_AWPROT设置为000;M_AXI_AWQOS设置为0000;M_AXI_AWUSER设置为1。
6.4.2写数据通道
当M_AXI_WREAD和M_AXI_WVALID为有效时,将数据写入对应地址中。
6.4.3写响应通道
M_AXI_BID地址为0,M_AXI_BRESP为00,即写事物状态OKAY。
6.4.4读地址通道
读通道的ID定义为0,当M_AXI_ARREAD和M_AXI_ARVALID为有效时,将地址写入读通道。其他设置和写地址通道类似,此处不再赘述。
6.4.5读数据通道
读数据通道M_AXI_RID为0,M_AXI_RRESP为0。其他设置和写数据通道类似,此处不再赘述。
7.MIG IP模块仿真
MIG IP模块直接例化IP核,其信号较多,如下图所示。但比较规律,分类是比较清晰的,在整个工程中做为从机Slave来使用,主机Master为前文讲到的aq_axi_master模块。使用时,在顶层模块直接将aq_axi_master模块信号和IP核的信号相连接即可,下面对其中的一些信号进行说明。
7.1与ddr3相关的信号
这些信号直接输出至FPGA芯片IO引脚上,通过物理层和外部DDR3 SDRAM直接进行连接。关于这些信号具体什么含义,大家可以参考上一篇文章(详见基于FPGA视角详细分析与解读DDR3 SDRAM_春风细雨无声的博客-CSDN博客)。
7.2本地接口维护信号
这组信号为本地接口维护信号,根据需求进行设计,一般情况下,不使用即可。
7.3与读写事务相关信号
这组信号主要包含写地址通道、写数据通道、写响应通道、读地址通道和读数据通道相关信号,这些信号前面aq_axi_master已经进行了详细说明,不再赘述。
7.4初始化完成、时钟等信号
sys_clk_i为200MHz,ui_clk为100MHz(MIG配置为4:1),init_calib_complete为ddr3初始化校准完成信号,其拉高,表明初始化校准已经完成。
8.图像帧读出模块(frame_read.v)仿真
8.1全局视角仿真图
通过全局视角仿真图可以看到,要实现这个模块还是有一定难度的。frame_read模块需要做2件事情:一、将AXI4总线64bit图像源数据转换为用户侧16bit图像数据(上一篇文章将mig IP 核中的AXI4总线数据位宽设置为64bit);二、将AXI4总线64bit图像数据按照AXI4突发传输模式从MIG I模块中读取。下面将针对每一部分进行仿真图分析。
8.2局部视角仿真图
图像帧读出模块组成:frame_read = read_buf + frame_fifo_read。分别对每个子模块仿真图进行说明。
8.2.1read_buf
read_buf子模块实际上就是调用了1个异步FIFO进行跨时钟域(wr_clk为100MHz,rd_clk为40MHz)和数据位宽转换。rd_en使能信号由用户模块控制。
8.2.2frame_fifo_read
此子模块实现将64bit的图像数据通过AXI4总线读出。由前文可知,AXI4数据总线位宽设置为64bit,1次Burst传输64个数据,即64bit×64=16bit×256,也就是说,1次Burst可以传输256个图像像素值。对于一整幅图像来讲,需要Burst的次数=640×512×16/64/64=1280次。那么,怎么来实现1280次的突发读出呢?我们可以使用状态机来完成,直接给出实现代码如下:
always@(posedge mem_clk or posedge rst)
begin
if(rst == 1'b1)
begin
state <= S_IDLE;
read_len_latch <= ZERO[ADDR_BITS - 1:0];
rd_burst_addr <= ZERO[ADDR_BITS - 1:0];
rd_burst_req <= 1'b0;
read_cnt <= ZERO[ADDR_BITS - 1:0];
fifo_aclr <= 1'b0;
rd_burst_len <= ZERO[BUSRT_BITS - 1:0];
read_req_ack <= 1'b0;
end
else
case(state)
S_IDLE:
begin
if(read_req_d2 == 1'b1)
begin
state <= S_ACK;
end
read_req_ack <= 1'b0;
end
S_ACK:
begin
if(read_req_d2 == 1'b0)
begin
state <= S_CHECK_FIFO;
fifo_aclr <= 1'b0;
read_req_ack <= 1'b0;
end
else
begin
read_req_ack <= 1'b1;
fifo_aclr <= 1'b1;
if(read_addr_index_d1 == 2'd0)
rd_burst_addr <= read_addr_0;
else if(read_addr_index_d1 == 2'd1)
rd_burst_addr <= read_addr_1;
else if(read_addr_index_d1 == 2'd2)
rd_burst_addr <= read_addr_2;
else if(read_addr_index_d1 == 2'd3)
rd_burst_addr <= read_addr_3;
read_len_latch <= read_len_d1;
end
read_cnt <= ZERO[ADDR_BITS - 1:0];
end
S_CHECK_FIFO:
begin
if(read_req_d2 == 1'b1)
begin
state <= S_ACK;
end
else if(wrusedw < (FIFO_DEPTH - BURST_SIZE))
begin
state <= S_READ_BURST;
rd_burst_len <= BURST_SIZE[BUSRT_BITS - 1:0];
rd_burst_req <= 1'b1;
end
end
S_READ_BURST:
begin
if(rd_burst_data_valid)
rd_burst_req <= 1'b0;
if(rd_burst_finish == 1'b1)
begin
state <= S_READ_BURST_END;
read_cnt <= read_cnt + BURST_SIZE[ADDR_BITS - 1:0];
rd_burst_addr <= rd_burst_addr + BURST_SIZE[ADDR_BITS - 1:0];
end
end
S_READ_BURST_END:
begin
if(read_req_d2 == 1'b1)
begin
state <= S_ACK;
end
else if(read_cnt < read_len_latch)
begin
state <= S_CHECK_FIFO;
end
else
begin
state <= S_END;
end
end
S_END:
begin
state <= S_IDLE;
end
default:
state <= S_IDLE;
endcase
end
状态机状态为IDLE(0)状态,当检测到外部模块发起的读请求read_req信号到来时,状态跳转至ACK(1)状态。在ACK状态下,给出读请求反馈、read_buf模块FIFO复位信号、AXI4读总线地址值等,准备开始读数据。这些完成之后,进入Check_FIFO状态,来判断read_buf模块中的FIFO是否已经足够存1次Burst的数据,若不足以存1次Burst数据,则继续等待;如果满足,发起rd_burst_req请求信号,直接进入Read_Burst(3)状态。
下图为进入Read_Burst(3)状态仿真图。rd_burst_req信号拉高,此时我们可以看到rd_burst_data_vaild信号开始变化,这个信号和read_buf子模块中的wr_en信号相连,也就是说,64bit的图像数据在rd_burst_data_vaild信号为高时开始写入FIFO中。
等到rd_burst_finish拉高时,进入read_burst_end(4)状态,如下图所示,此时rd_burst_addr需要加64,重新进入下一轮状态,直到Burst读出次数为1280时,第一幅图像才完全从DDR3 SDRAM中读出。
9.图像通道读仲裁模块(mem_read_arbit.v)仿真(可选,非必须)
本工程中,需要将多路图像数据从DDR3 SDRAM中读出,因此用到仲裁模块。若只有1路图像数据,则仲裁模块不是必须的。仿真时序图如下所示,可以利用状态机实现。
10.其他部分
这部分主要是产生读请求信号read_req和读使能信号read_en,大家可根据自己工程读取数据时刻进行设计,此处不过多说明。
至此,基于FPGA_MIG+AXI4图像读写操作仿真主要部分论述完毕。
总结
本篇文章利用第三方软件Modelsim对工程中涉及到的各模块进行仿真,给出仿真时序图,帮助大家进一步理解FPGA实现DDR3 SDRAM读写的整个过程。