• 【计算机组成与设计】-第五章 memory hierarchy(二)


    5.3 the basics of cache

    Directed mapping cache

    通常cache mapping方式有三种,directed、set association、full mapping。这里介绍directedmapping。

    Directed mapping就是每个memory location只能存在cache中的固定位置。

    在cache中的位置是根据地址来计算的,如下

    因为cache中的存储粒度是block(也就是cache line),所以上面用的是block address。地址的低位决定存在cache中哪个cache line中。

    多个location可以映射到cache中的同一个位置,所以需要比较哪个memory location存在了cache 中,这个用来比较的东西是tag,tag一般是地址中的高位。

    Valid bit

    每个cache line都有一个valid bit,用来表明该cache line是否有效

    The hit rates of the cache prediction on modern computers are often above 95%。

    每个cache line中存储:

    1. Data(block)
    2. Tag
    3. Valid bit

    如下,是directed cache 地址mapping过程。地址分成三部分:

    1. Tag  A tag field, which is used to compare with the value of the tag field of the

    cache

    1. Index。 A cache index, which is used to select the block
    2. Offset

    上图中的cache:

    64-bit addresses

    A direct-mapped cache

    The cache size is 2^n blocks, so n bits are used for the index

    The block size is 2^m words (2^(m+2) bytes), so m bits are used for the word within

    the block, and two bits are used for the byte part of the address

    The size of the tag field is

    64 - (n+m+2) .

    The total number of bits in a direct-mapped cache is

    Hit rate and miss rate

    hit rate The fraction of memory accesses found  in a level of the memory hierarchy.

    miss rate The fraction of memory accesses not found in a level of the memory hierarchy.

    Miss penalty

    miss penalty The time required to fetch a block into a level of the memory hierarchy from the lower level, including the time to access the block, transmit it from one level to the other, insert it in the level that experienced

    the miss, and then pass the block to the requestor.

    Hit time

    hit time The time required to access a level of the memory hierarchy, including the time needed to determine whether the access is a hit or a miss.

    Relationship of hit rate, penalty and block size

    Cache line block size越大,hit rate越大,但是发生miss的时候,penalty越大;因为需要更多时间从lower memory hierarchy搬运数据到higher hierarchy。

    降低penalty的技术

    Early restart

    resume execution as soon as the requested word of the block is returned, rather than wait for the entire block

    Requested word first or critical word first

    the requested word is transferred from the memory to the cache first. The remainder

    of the block is then transferred, starting with the address after the requested word and wrapping around to the beginning of the block.

    Cache miss

    发生cache miss时候,对于in-order processor,它会pipeline stall,等待cache miss被处理,也就是从memory中搬运对应的block到cache中。

    对于out-order processor,可以继续执行指令。

    发生instruction cache miss的处理过程如下,data cache miss处理与此类似:

    1. Send the original PC value to the memory.

    2. Instruct main memory to perform a read and wait for the memory to

    complete its access.

    3. Write the cache entry, putting the data from memory in the data portion of

    the entry, writing the upper bits of the address (from the ALU) into the tag

    field, and turning the valid bit on.

    4. Restart the instruction execution at the first step, which will refetch the

    instruction, this time finding it in the cache

    Write through and write back

    Write through 和write back是两种常用的cache写回策略。

    Write through

    Write through就是每次CPU改写cache中的某个word,同时会将这个word写回到memory,保证cache和memory 是consistent,一致的。

    只将被改写的word写回到memory中,而不是整个cache line。

    Write through策略中,每个store、write操作都会产生memory write access,比较慢,降低性能。

    Write buffer

    Write buffer用来解决write through策略中,每次store都要等待memory access done的问题。CPU将数据写入到cache和write buffer中,CPU就可以继续执行程序。Write buffer中的数据被写入到memory中后,entry in write buffer被释放;如果write buffer满了,那么CPU要等待write buffer为空,将数据写入到write buffer,才可以继续执行程序。

    Write buffer满的情况有两种:

    • cpu的memory store rate大于数据从write buffer写到memory的速度,那么write buffer总是会满,write buffer也就不起作用了。
    • 在一个长的write burst中,write buffer满了。这种情况可以通过增大buffer depth,使得depth大于一个cache line entry。

    Write back

    当modified cache line要被其他block替换掉的时候,才将modified cache line写回memory中。

    在实现上,write back 比write through更困难,特别是在多核处理器,要保证多个core看到的memory是一样的。

    Write allocation and write non-allocation

    Write allocation:

    Write的时候发生cache miss,先从memory中读block,然后将block写入到cache中。如果是write through策略的话,还要将写入的数据写回到memory。

    Write non-allocation:

    Write时候发生cache miss,直接将数据写入到memory中。

    Replace cache line

    对于write through cache, 直接替换就行了,因为cache和memory block是一样的。

    对于write back cache ,要判断cache line是否是dirty的,如果是,需要将cache line写回memory后,再替换掉cache line。

    Write back 也可以利用write buffer,把要替换的cache line移动到write buffer(一个cache line大小),然后从memory读数据,写到cache 中。

    Cache Example

    如下cache,cache line size是16个word,也就是64个byte。Cache size是16KB。

    所以,offset是6 bits,低2bit是word对齐,所以忽略,bit5-bit2是索引哪个word

    Index:用来索引那个cache line,2KB/64byte = 2^8,所以8bit为index;

    Tag:最高18bit作为tag,用来比较。

  • 相关阅读:
    docker 实战系列
    计算机视觉项目实战-目标检测与识别
    【六天案例教程】全流程TOUGH系列软件实践技术应用
    Linux命令之常用基础命令备查手册
    简单好用的轻量级思维导图:ClickCharts 激活for mac
    .NET CLR介绍
    【疫情动态条形图】用Python开发全球疫情排名动态条形图bar_chart_race
    产品经理基础(一)
    [概述] 获取点云数据的仪器
    Python数据分析与机器学习51-EDA之粮农组织数据
  • 原文地址:https://blog.csdn.net/m0_38037810/article/details/126639393