【 声明:版权所有,欢迎转载,请勿用于商业用途。 联系信箱:feixiaoxing @163.com】
读书的时候,《计算机组成原理》也看了,《计算机体系结构》也学了,老师也给我们讲了各种各样的流水线知识,但是实践的机会很少,感觉就是没有把理论转成实际的东西。工作之后,倒是有机会接触各种各样的开源代码,这里面也包括了开源cpu代码,比如openrisc(https://github.com/openrisc/or1200/tree/master/rtl/verilog)这样的开源代码。但是内容又过于复杂,学习的曲线比较陡,难度很高。所以至此,很多同学就云里雾里不知道流水线和cpu的指令是怎么完成的,总是搞不清楚。

前面分析过,一条指令的过程是按照取指、译码、执行、访存、写回这5个步骤来完成的。那么今天,就可以用一条5级流水线来实现ori指令。
1、 题外话的一个知识点
之前我们在编写取指那篇文章的时候,谈到过组合逻辑和时序逻辑的区别。很多刚学verilog的同学,很容易把wire看成是组合逻辑,把reg看成是时序逻辑,这是不对的。是组合逻辑,还是时序逻辑,归根到底还是要看触发条件,比如之前的那个代码,我们稍微调整下,
- module test(clk, rst, in, out_a, out_b);
-
- input wire clk;
- input wire rst;
- input wire in;
-
- output wire out_a;
- output reg out_b;
-
- assign out_a = in;
-
- always@(*) begin
- if(rst)
- out_b <= 1'b0;
- else
- out_b <= in;
- end
- endmodule
这里的out_b从类型上看,好像是register。但是它的触发条件却是always(*),这就意味着,在这段电路描述中,只要rst或者in发生改变,out_b就会随之改变。这不就是组合逻辑的思路吗?同样,可以借助于gtkwave观察下图形确认一下,

简单分析下,这里的out_b在rst置位的时候,一直输出为0。但是当rst撤去之后,out_b就开始随着in的改变而改变。这个示例告诉我们,组合逻辑和时序逻辑,最终都要通过触发条件来进行区别和判断。如果心里还是拿不准,就仿真看一下波形图结果就好了。
2、流水线编写
注1:关于本章所有的verilog代码,参考这个地址,https://github.com/feixiaoxing/design_mips_cpu/tree/master/rtl/day03 。
注2:文中涉及代码,来自于《自己动手写cpu》,向原作者雷思磊表示感谢。

2.1 取指if
取指这部分之前已经描述过了(https://feixiaoxing.blog.csdn.net/article/details/127914989?spm=1001.2014.3001.5502)。主要包含了两个部分,一个是地址的生成,一个是rom数据的读取。做好了这两个,取指的工作就完成了。其中,地址pc生成是时序逻辑,rom读取是组合逻辑。
2.2 指令传递if-id
指令传送是一个时序逻辑。在这个模块,需要把指令传递给下一个模块,内容不复杂,
- `include "defines.v"
-
- module if_id(
- input wire clk,
- input wire rst,
-
- input wire[`InstAddrBus] if_pc,
- input wire[`InstBus] if_inst,
- output reg[`InstAddrBus] id_pc,
- output reg[`InstBus] id_inst
- );
-
- always @(posedge clk) begin
- if(rst == `RstEnable) begin
- id_pc <= `ZeroWord;
- id_inst <= `ZeroWord;
- end else begin
- id_pc <= if_pc;
- id_inst <= if_inst;
- end
- end
-
-
- endmodule
2.3 译码id
译码是流水线中很重要的工作。它主要的目的,就是从指令数据中提取到合适的信息。比如当前操作是寄存器操作,还是访存操作。如果是寄存器操作,是逻辑运算,还是数学运算。如果是逻辑运算,源操作数1是哪个,源操作数2是哪个,目的操作数是哪个,是什么样的逻辑运算等等。
另外,在译码的过程中,对cpu通用寄存器的操作也是很重要的,这部分可以看一下,
- `include "defines.v"
-
- module regfile(
- input wire clk,
- input wire rst,
-
- // write
- input wire we,
- input wire[`RegAddrBus] waddr,
- input wire[`RegBus] wdata,
-
- // read1
- input wire re1,
- input wire[`RegAddrBus] raddr1,
- output reg[`RegBus] rdata1,
-
- // read2
- input wire re2,
- input wire[`RegAddrBus] raddr2,
- output reg[`RegBus] rdata2
- );
-
- reg[`RegBus] regs[0:`RegNum-1];
-
- always @(posedge clk) begin
- if(rst == `RstDisable) begin
- if((we == `WriteEnable) && (waddr!= `RegNumLog2'h0)) begin
- regs[waddr] <= wdata;
- end
- end
- end
- always@(*) begin
- if(rst == `RstEnable) begin
- rdata1 <= `ZeroWord;
- end else if(raddr1 == `RegNumLog2'h0) begin
- rdata1 <= `ZeroWord;
- end else if((raddr1 == waddr) && (we == `WriteEnable) && (re1 == `ReadEnable)) begin
- rdata1 <= wdata;
- end else if(re1 == `ReadEnable) begin
- rdata1 <= regs[raddr1];
- end else begin
- rdata1 <= `ZeroWord;
- end
- end
-
- always@(*) begin
- if(rst == `RstEnable) begin
- rdata2 <= `ZeroWord;
- end else if(raddr2 == `RegNumLog2'h0) begin
- rdata2 <= `ZeroWord;
- end else if((raddr2 == waddr) && (we == `WriteEnable) && (re2 == `ReadEnable)) begin
- rdata2 <= wdata;
- end else if(re2 == `ReadEnable) begin
- rdata2 <= regs[raddr1];
- end else begin
- rdata2 <= `ZeroWord;
- end
- end
- endmodule
从代码上看,regfile最多支持两个register的读取和一个register的写入。注意,这里读取动作是组合逻辑,写入动作是时序逻辑,这非常重要。此外,不管寄存器是这样,rom操作、ram操作也是这样,读取一般都是组合逻辑,而写入才是时序逻辑。
说完了寄存器访问,下面就是具体的译码工作了,
- `include "defines.v"
-
- module id(
- input wire rst,
- input wire[`InstAddrBus] pc_i,
- input wire[`InstBus] inst_i,
-
- input wire[`RegBus] reg1_data_i,
- input wire[`RegBus] reg2_data_i,
-
- output reg reg1_read_o, // read signal
- output reg reg2_read_o, // read signal
- output reg[`RegAddrBus] reg1_addr_o,
- output reg[`RegAddrBus] reg2_addr_o,
-
- output reg[`AluOpBus] aluop_o,
- output reg[`AluSelBus] alusel_o,
- output reg[`RegBus] reg1_o,
- output reg[`RegBus] reg2_o,
- output reg[`RegAddrBus] wd_o,
- output reg wreg_o
- );
-
- wire[5:0] op = inst_i[31:26];
- wire [4:0] op2 = inst_i[10:6];
- wire [5:0] op3 = inst_i[5:0];
- wire [4:0] op4 = inst_i[20:16];
-
- reg[`RegBus] imm;
-
- reg instvalid;
-
-
- always @(*) begin
- if(rst == `RstEnable) begin
- aluop_o <= `EXE_NOP_OP;
- alusel_o <= `EXE_RES_NOP;
- wd_o <= `NOPRegAddr;
- wreg_o <= `WriteDisable;
- instvalid <= `InstValid;
- reg1_read_o <= 1'b0;
- reg2_read_o <= 1'b0;
- reg1_addr_o <= `NOPRegAddr;
- reg2_addr_o <= `NOPRegAddr;
- imm <= 32'h0;
- end else begin
- aluop_o <= `EXE_NOP_OP;
- alusel_o <= `EXE_RES_NOP;
- wd_o <= inst_i[15:11];
- wreg_o <= `WriteDisable;
- instvalid <= `InstValid;
- reg1_read_o <= 1'b0;
- reg2_read_o <= 1'b0;
- reg1_addr_o <= inst_i[25:21];
- reg2_addr_o <= inst_i[20:16];
- imm <= `ZeroWord;
-
- case(op)
- `EXE_ORI: begin
- wreg_o <= `WriteEnable;
- aluop_o <= `EXE_OR_OP;
- alusel_o <= `EXE_RES_LOGIC;
- reg1_read_o <= 1'b1;
- reg2_read_o <= 1'b0;
- imm <= {16'h0, inst_i[15:0]};
- wd_o <= inst_i[20:16];
- instvalid <= `InstValid;
- end
-
- default: begin
- end
- endcase
- end
- end
-
- always @(*) begin
- if(rst == `RstEnable) begin
- reg1_o <= `ZeroWord;
- end else if(reg1_read_o == 1'b1) begin
- reg1_o <= reg1_data_i;
- end else if(reg1_read_o == 1'b0) begin
- reg1_o <= imm;
- end else begin
- reg1_o <= `ZeroWord;
- end
- end
-
- always @(*) begin
- if(rst == `RstEnable) begin
- reg2_o <= `ZeroWord;
- end else if(reg2_read_o == 1'b1) begin
- reg2_o <= reg2_data_i;
- end else if(reg2_read_o == 1'b0) begin
- reg2_o <= imm;
- end else begin
- reg2_o <= `ZeroWord;
- end
- end
-
-
-
- endmodule
译码的工作其实也分成了两个部分。一部分,就是之前讨论的指令解析。这里分析的指令是EXE_ORI,所以看到wreg_o、aluop_o、alusel_o等这样的赋值动作。另一部分,就是寄存器的读取动作,至于reg1_o和reg_o是访存register寄存器,还是直接从imm立即数中获取,这取决于具体的情况。就EXE_ORI而言,reg1_o来自于reg1_data_i,reg2_o来自于imm。
2.4 操作数传递id-exe
操作数的传递也是一个时序逻辑。它的主要功能就是把读取到的操作数、操作方式、写入地址告诉exe模块,
- `include "defines.v"
-
- module id_ex(
-
- input wire clk,
- input wire rst,
-
- input wire[`AluOpBus] id_aluop,
- input wire[`AluSelBus] id_alusel,
- input wire[`RegBus] id_reg1,
- input wire[`RegBus] id_reg2,
- input wire[`RegAddrBus] id_wd,
- input wire id_wreg,
-
- output reg[`AluOpBus] ex_aluop,
- output reg[`AluSelBus] ex_alusel,
- output reg[`RegBus] ex_reg1,
- output reg[`RegBus] ex_reg2,
- output reg[`RegAddrBus] ex_wd,
- output reg ex_wreg
- );
-
-
- always @(posedge clk) begin
- if(rst == `RstEnable) begin
- ex_aluop <= `EXE_NOP_OP;
- ex_alusel <= `EXE_RES_NOP;
- ex_reg1 <= `ZeroWord;
- ex_reg2 <= `ZeroWord;
- ex_wd <= `NOPRegAddr;
- ex_wreg <= `WriteDisable;
- end else begin
- ex_aluop <= id_aluop;
- ex_alusel <= id_alusel;
- ex_reg1 <= id_reg1;
- ex_reg2 <= id_reg2;
- ex_wd <= id_wd;
- ex_wreg <= id_wreg;
- end
- end
-
- endmodule
2.5 执行exe
有了从id模块获取的操作数和写入地址,这里只要完成对应的操作就可以。注意,执行exe属于组合逻辑,
- `include "defines.v"
-
- module ex(
- input wire rst,
-
- input wire[`AluOpBus] aluop_i,
- input wire[`AluSelBus] alusel_i,
- input wire[`RegBus] reg1_i,
- input wire[`RegBus] reg2_i,
- input wire[`RegAddrBus] wd_i,
- input wire wreg_i,
-
- output reg[`RegAddrBus] wd_o,
- output reg wreg_o,
- output reg[`RegBus] wdata_o
- );
-
- reg[`RegBus] logicout;
-
- always@(*) begin
- if(rst == `RstEnable) begin
- logicout <= `ZeroWord;
- end else begin
- case (aluop_i)
- `EXE_OR_OP: begin
- logicout <= reg1_i | reg2_i;
- end
-
- default: begin
- logicout <= `ZeroWord;
- end
- endcase
- end
- end
-
- always@(*) begin
- wd_o <= wd_i;
- wreg_o <= wreg_i;
-
- case(alusel_i)
- `EXE_RES_LOGIC: begin
- wdata_o <= logicout;
- end
- default: begin
- wdata_o <= `ZeroWord;
- end
- endcase
- end
-
-
- endmodule
因为exe中可能会有逻辑运算、数学运算、移位运算等等,所以一般这里都会先进行一下区分,最后把结果汇总上来。如上面的代码所示,logicout就是汇总之前的计算,而最终输出的数据时wdata_o。
2.6 执行传递ex-mem
做完了exe,下面就需要把结果传递给mem这个模块了。也许有同学说,这里不是不需要访存吗,为什么还要传递给mem。这主要是因为之前设计的就是5级流水线,即使最终用不到这一块内容,也需要透传一下。
- `include "defines.v"
-
- module ex_mem(
- input wire clk,
- input wire rst,
-
- input wire[`RegAddrBus] ex_wd,
- input wire ex_wreg,
- input wire[`RegBus] ex_wdata,
-
- output reg[`RegAddrBus] mem_wd,
- output reg mem_wreg,
- output reg[`RegBus] mem_wdata
-
- );
-
- always @(posedge clk) begin
- if(rst ==`RstEnable) begin
- mem_wd <= `NOPRegAddr;
- mem_wreg <= `WriteDisable;
- mem_wdata <= `ZeroWord;
- end else begin
- mem_wd <= ex_wd;
- mem_wreg <= ex_wreg;
- mem_wdata <= ex_wdata;
- end
- end
-
- endmodule
2.7 访问mem
之前说过,这部分其实不需要,只是流水线已经设计好,所以需要的操作就是继续透传下去。接着,我们可以看下verilog代码,
- `include "defines.v"
-
- module mem(
- input wire rst,
-
- input wire[`RegAddrBus] wd_i,
- input wire wreg_i,
- input wire[`RegBus] wdata_i,
-
- output reg[`RegAddrBus] wd_o,
- output reg wreg_o,
- output reg[`RegBus] wdata_o
-
- );
-
- always @(*) begin
- if(rst ==`RstEnable) begin
- wd_o <= `NOPRegAddr;
- wreg_o <= `WriteDisable;
- wdata_o <= `ZeroWord;
- end else begin
- wd_o <= wd_i;
- wreg_o <= wreg_i;
- wdata_o <= wdata_i;
- end
- end
-
- endmodule
2.8 访存传递mem-wb
有了mem阶段的处理,这部分就可以正式交给wb了。需要注意的是,mem访问是组合逻辑,而mem-wb是时序逻辑。
- `include "defines.v"
-
- module mem_wb(
- input wire clk,
- input wire rst,
-
- input wire[`RegAddrBus] mem_wd,
- input wire mem_wreg,
- input wire[`RegBus] mem_wdata,
-
- output reg[`RegAddrBus] wb_wd,
- output reg wb_wreg,
- output reg[`RegBus] wb_wdata
-
- );
-
- always @(posedge clk) begin
- if(rst ==`RstEnable) begin
- wb_wd <= `NOPRegAddr;
- wb_wreg <= `WriteDisable;
- wb_wdata <= `ZeroWord;
- end else begin
- wb_wd <= mem_wd;
- wb_wreg <= mem_wreg;
- wb_wdata <= mem_wdata;
- end
- end
-
- endmodule
2.9 wb写回
很多同学会问,问什么没有wb写回的组合逻辑和时序逻辑。一般来说,wb阶段就是把数据直接写到寄存器里面,这个阶段一般是不会有什么问题的。而写回的数据、寄存器地址,直接给regfile这个模块就可以了,大家可以从openmips.v这个文件看的出来,
- `include "defines.v"
-
- module openmips(
- input wire clk,
- input wire rst,
-
- input wire[`RegBus] rom_data_i,
- output wire[`RegBus] rom_addr_o,
- output wire rom_ce_o
- );
-
- wire[`InstAddrBus] pc;
- wire[`InstAddrBus] id_pc_i;
- wire[`InstBus] id_inst_i;
-
- wire[`AluOpBus] id_aluop_o;
- wire[`AluSelBus] id_alusel_o;
- wire[`RegBus] id_reg1_o;
- wire[`RegBus] id_reg2_o;
- wire id_wreg_o;
- wire[`RegAddrBus] id_wd_o;
-
- wire[`AluOpBus] ex_aluop_i;
- wire[`AluSelBus] ex_alusel_i;
- wire[`RegBus] ex_reg1_i;
- wire[`RegBus] ex_reg2_i;
- wire ex_wreg_i;
- wire[`RegAddrBus] ex_wd_i;
-
- wire ex_wreg_o;
- wire[`RegAddrBus] ex_wd_o;
- wire[`RegBus] ex_wdata_o;
-
- wire mem_wreg_i;
- wire[`RegAddrBus] mem_wd_i;
- wire[`RegBus] mem_wdata_i;
-
- wire mem_wreg_o;
- wire[`RegAddrBus] mem_wd_o;
- wire[`RegBus] mem_wdata_o;
-
- wire wb_wreg_i;
- wire[`RegAddrBus] wb_wd_i;
- wire[`RegBus] wb_wdata_i;
-
- wire reg1_read;
- wire reg2_read;
- wire[`RegBus] reg1_data;
- wire[`RegBus] reg2_data;
- wire[`RegAddrBus] reg1_addr;
- wire[`RegAddrBus] reg2_addr;
-
- // initialize pc_reg
- pc_reg pc_reg0(
- .clk(clk),
- .rst(rst),
- .pc(pc),
- .ce(rom_ce_o)
- );
-
- assign rom_addr_o = pc;
-
-
- // initialize if_id
-
- if_id if_id0(
- .clk(clk),
- .rst(rst),
-
- .if_pc(pc),
- .if_inst(rom_data_i),
-
- .id_pc(id_pc_i),
- .id_inst(id_inst_i)
- );
-
- // initialize id
- id id0(
- .rst(rst),
- .pc_i(id_pc_i),
- .inst_i(id_inst_i),
-
- .reg1_data_i(reg1_data),
- .reg2_data_i(reg2_data),
-
- .reg1_read_o(reg1_read),
- .reg2_read_o(reg2_read),
- .reg1_addr_o(reg1_addr),
- .reg2_addr_o(reg2_addr),
-
- .aluop_o(id_aluop_o),
- .alusel_o(id_alusel_o),
- .reg1_o(id_reg1_o),
- .reg2_o(id_reg2_o),
- .wd_o(id_wd_o),
- .wreg_o(id_wreg_o)
- );
-
- // initialize regfile
-
- regfile regfile1(
- .clk(clk),
- .rst(rst),
-
- .we(wb_wreg_i),
- .waddr(wb_wd_i),
- .wdata(wb_wdata_i),
-
- .re1(reg1_read),
- .raddr1(reg1_addr),
- .rdata1(reg1_data),
-
- .re2(reg2_read),
- .raddr2(reg2_addr),
- .rdata2(reg2_data)
- );
-
- // initialize idid_ex
-
- id_ex id_ex0(
- .clk(clk),
- .rst(rst),
-
- .id_aluop(id_aluop_o),
- .id_alusel(id_alusel_o),
- .id_reg1(id_reg1_o),
- .id_reg2(id_reg2_o),
- .id_wd(id_wd_o),
- .id_wreg(id_wreg_o),
-
- .ex_aluop(ex_aluop_i),
- .ex_alusel(ex_alusel_i),
- .ex_reg1(ex_reg1_i),
- .ex_reg2(ex_reg2_i),
- .ex_wd(ex_wd_i),
- .ex_wreg(ex_wreg_i)
- );
-
- // initialize ex
-
- ex ex0(
- .rst(rst),
-
- .aluop_i(ex_aluop_i),
- .alusel_i(ex_alusel_i),
- .reg1_i(ex_reg1_i),
- .reg2_i(ex_reg2_i),
- .wd_i(ex_wd_i),
- .wreg_i(ex_wreg_i),
-
- .wd_o(ex_wd_o),
- .wreg_o(ex_wreg_o),
- .wdata_o(ex_wdata_o)
- );
-
-
- // initialize ex_mem
-
-
- ex_mem ex_mem0(
-
- .clk(clk),
- .rst(rst),
-
- .ex_wd(ex_wd_o),
- .ex_wreg(ex_wreg_o),
- .ex_wdata(ex_wdata_o),
-
- .mem_wd(mem_wd_i),
- .mem_wreg(mem_wreg_i),
- .mem_wdata(mem_wdata_i)
-
- );
-
-
-
- // initialize mem
-
- mem mem0(
-
- .rst(rst),
-
- .wd_i(mem_wd_i),
- .wreg_i(mem_wreg_i),
- .wdata_i(mem_wdata_i),
-
- .wd_o(mem_wd_o),
- .wreg_o(mem_wreg_o),
- .wdata_o(mem_wdata_o)
-
- );
-
- // initialize mem_wb
-
- mem_wb mem_wb0(
-
- .clk(clk),
- .rst(rst),
-
- .mem_wd(mem_wd_o),
- .mem_wreg(mem_wreg_o),
- .mem_wdata(mem_wdata_o),
-
- .wb_wd(wb_wd_i),
- .wb_wreg(wb_wreg_i),
- .wb_wdata(wb_wdata_i)
-
- );
-
- endmodule
-
直接观察wb_wd_i、wb_wreg_i、wb_wdata_i这几个信号,最终是返回给了regfile1这个模块,这也从形式上面完成了流水线的闭环操作。
2.10 仿真和测试
为了仿真和测试,需要做两步。第一步,给openmips准备一个小的soc模块,把cpu和rom加进去,比如像这样,
- `include "defines.v"
-
- module openmips_min_sopc(
-
- input wire clk,
- input wire rst
- );
-
- wire[`InstAddrBus] inst_addr;
- wire [`InstBus] inst;
- wire rom_ce;
-
- openmips openmips0(
- .clk(clk),
- .rst(rst),
- .rom_addr_o(inst_addr),
- .rom_data_i(inst),
- .rom_ce_o(rom_ce)
- );
-
- inst_rom inst_rom0(
- .ce(rom_ce),
- .addr(inst_addr),
- .inst(inst)
- );
-
-
- endmodule
其次,给这个soc模块准备一个testbench测试,发一下激励信号。
-
- `timescale 1ns/1ns
-
- module openmips_min_sopc_tb();
-
- reg CLOCK_50;
- reg rst;
-
- initial begin
- CLOCK_50 = 1'b0;
- forever #10 CLOCK_50=~CLOCK_50;
- end
-
- initial begin
- rst = `RstEnable;
- #195 rst = `RstDisable;
- #1000 $stop;
- end
-
- openmips_min_sopc openmips_min_sopc0(
- .clk(CLOCK_50),
- .rst(rst)
- );
-
- initial
- begin
- $dumpfile("hello.vcd");
- $dumpvars(0, openmips_min_sopc_tb);
- end
-
- endmodule
注意,测试romdata也发生了变化,
- 34011100
- 34020020
- 3403ff00
- 3404ffff
有了这两步,就可以用iverilog和gtkwave开始测试了,

上面的图形中,其实regfile1的部分没有显示出来,可以通过regfile.v中添加一些代码,比如这样,
- wire[0:31] regs0_wire;
- wire[0:31] regs1_wire;
- wire[0:31] regs2_wire;
- wire[0:31] regs3_wire;
- assign regs0_wire = regs[0];
- assign regs1_wire = regs[1];
- assign regs2_wire = regs[2];
- assign regs3_wire = regs[3];
这样波形图就看的比较清楚了,

3、调试方法
调试的时候,可以优先测试register,也就是时序逻辑。如果时序逻辑没有问题,再对组合逻辑进行问题和验证。测试往往是一个循环往复的过程,需要不断进行,更需要找到root cause。