Final F20 Solution

docx

School

University of Michigan *

*We aren’t endorsed by this school

Course

370

Subject

Computer Science

Date

Dec 6, 2023

Type

docx

Pages

23

Uploaded by PresidentScienceKomodoDragon18

Report
© EECS 370, University of Michigan EECS 370 Final Exam Fall 2020 Solution This exam is presented with modifications. This exam was originally released through gradescope in a virtual semester. Some comments have been added to reflect modifications for a paper exam. Question 1: Short Answer T/F Specify whether statements a-j are True or False. (On the exam a random set of 10 questions was chosen) True/False a. There are 32-bit two's complement integers that cannot be represented in a 32-bit IEEE-754 single precision number. T b. Deeper pipelines will generally have a shorter clock period. T c. It is possible for a write-back and a write-through cache to perform the same number of writes. T d. For a given cache size and block size, increasing associativity will reduce cache misses for any program. F e. Resolving data hazards with detect and stall and control hazards with speculate and squash will produce the least number of errors T f. Direction prediction for jalr is worse than beq. F g. Fetch stage can read an instruction from memory when there is a lw/sw in the MEM stage. F h. Physical memory and disk is typically large enough to hold all of a process’s virtual memory. F i. DRAM is a non-volatile memory. F j. You have scrolled through the whole exam to understand point distribution and allocate your time as such. T k. If a program accesses the same cache block repeatedly, it is has higher temporal locality T
2 l. Multi-level page tables will always take up less space than single- level page tables. F m. Cache is about the same size as main memory. F n. Dirty bits are needed for write-through caches but not for write- back caches. F o. Assuming we have an allocate-on-write store policy, an infinitely large cache would have only one miss per cache block. T p. The number of block offset bits is equal to log 2 (size of the cache). F q. For a given cache configuration, a write-through cache may write fewer bytes to memory than a write-back cache running the same program. T r. Virtual memory offers the illusion of infinite memory space. T s. A disadvantage of having a deeper pipeline is that there will be more stalls from data and control hazards. T t. For a given cache size and block size, increasing associativity will reduce tag size. F
3 Question 2: Points _/8 Assume the following LC2K program is executed on the 5-stage pipeline from lecture until it halts. V1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 loop end neg1 pos1 five count lw lw lw lw add add sw beq beq halt .fill .fill .fill .fill 0 0 0 0 4 3 0 0 0 1 2 3 4 2 1 4 3 0 neg1 pos1 five count 4 3 count end loop -1 1 5 0 V2 1 2 3 4 5 loop lw lw lw lw add 0 0 0 0 2 1 2 3 4 4 four count neg1 pos1 2
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
4 6 7 8 9 10 11 12 13 14 end neg1 pos1 four count add sw beq beq halt .fill .fill .fill .fill 1 0 0 0 3 2 1 0 1 count end loop -1 1 4 0 V3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 loop end neg1 pos1 four count lw lw lw lw add add sw beq beq halt .fill .fill .fill .fill 0 0 0 0 4 3 0 0 0 1 2 3 4 2 1 4 3 0 neg1 pos1 four count 42q 3 count end loop -1 1 4 0 V4 1 2 3 4 5 6 7 loop lw lw lw lw add add sw 0 0 0 0 2 1 0 1 2 3 4 4 3 2 six count neg1 pos1 2 1 count
5 8 9 10 11 12 13 14 end neg1 pos1 six count beq beq halt .fill .fill .fill .fill 0 0 1 0 end loop -1 1 6 0 V5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 loop end neg1 pos1 five count lw lw lw lw add add sw beq beq halt .fill .fill .fill .fill 0 0 0 0 2 1 0 0 0 1 2 3 4 4 3 2 1 0 five count neg1 pos1 2 1 count end loop -1 1 5 0 A. Select all registers that cause a RAW data hazard in a detect and stall pipeline. a. reg0 b. reg1 c. reg2 d. reg3 e. reg4 f. reg5 g. reg6 Lines 4-5 need 2 stalls Lines 5-7 need 1 stall V1: reg4 V2: reg4 and reg2
6 V3: reg4 V4: reg4 and reg2 V5: reg4 and reg2 B. Using detect-and-stall to resolve data hazards, what is the number of stalls that occur due to data hazards? V1 7 = 2 + 5 * 1. 2 stalls from lines 4-5. 1 stall for lines 5-7 (Lines 6-8 is NOT a dependency) V2 : 6 = 2 + 4 * 1. 2 stalls from lines 4-5. 1 stall for lines 5-7 (Lines 6-8 is NOT a dependency) V3 6 = 2 + 4 * 1. 2 stalls from lines 4-5. 1 stall for lines 5-7 (Lines 6-8 is NOT a dependency) V4 8 = 2 + 6 * 1. 2 stalls from lines 4-5. 1 stall for lines 5-7 (Lines 6-8 is NOT a dependency) V5 7 = 2 + 5 * 1. 2 stalls from lines 4-5. 1 stall for lines 5-7 (Lines 6-8 is NOT a dependency) C. Using detect-and-forward to resolve data hazards, what is the number of stalls that occur due to data hazards? All versions : 1 stall is required to resolve the lw-dependency on lines 4 - 5. D. Using speculate-and-squash, along with predicting branches always-not-taken to resolve control hazards, what is the number of stalls that occur due to branch mispredictions? Each misprediction costs 3 cycles. You can calculate the number of mispredictions by looking at how many times each branch is taken. V1 15 = 5 * 3. Line 9, beq 0 0, is taken 4 times. Line 8, beq 0 3, is taken 1 time. V2 : 12 = 4 * 3. Line 9, beq 0 0, is taken 3 times. Line 8, beq 0 3, is taken 1 time. V3 12 = 4 * 3. Line 9, beq 0 0, is taken 3 times. Line 8, beq 0 3, is taken 1 time. V4 18 = 6 * 3. Line 9, beq 0 0, is taken 5 times. Line 8, beq 0 3, is taken 1 time. V5 15 = 5 * 3. Line 9, beq 0 0, is taken 4 times. Line 8, beq 0 3, is taken 1 time. Question 3: Pipeline Design (11 Points)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
7 Consider the LC2K pipeline with the following change: Execution (EX) stage split into two (EX1, EX2). EX1 calculates the branch destination and performs the equality check of beq . EX2 utilizes the ALU for the add/nor/lw/sw instructions. EX2 also updates the PC for branches, if necessary. Assume all values are forwarded to EX1. Control hazards resolved using speculate and squash , where branches are predicted to be not-taken . Data hazards resolved using detect and forward . Consider the following LC2K program: 1. lw 0 1 0 2. lw 0 2 1 3. beq 2 2 1 4. add 1 1 1 5. add 1 2 1 6. add 2 2 3 7. halt You are a hacker that has somehow managed to take the following snapshot of the pipeline registers while executing the above program before cycle C . Your goal is to determine a secret number that is stored in virtual memory address 0 . IF/ID ID/EX1 instr: X X X X instr: X X X X PC+1: X PC+1: X regA val: regB val: offset: EX1/EX2 EX2/MEM Instr: (A) _ X X X Instr: (B) _ _ _ _ PC Plus 1: X AluResult: 36 regA val: X regB val: (C) _ regB val: X branchTarget: X MEM/WB WB/END Instr: (D) add _ _ _ instr: noop X X X writeData: 60 writeData: X
8 List all the data hazards. Specify each data hazard in the following format: line #X, line #Y, number-of-cycles-stalled where X, Y are lines in the LC2K program that have a RAW dependency causing the data hazard. Line #2, Line #3, 2 cycles List all the control hazards. Specify each control hazard in the following format: line #X, number-of-cycles-stalled where X is a line in the LC2K program that causes the control hazard. Line #3, 3 cycles How many instructions does the LC2K program execute? (enter only the number here) 6 instructions Fill in the missing blanks in the highlighted boxes from the pipeline snapshot above. EX1/EX2 - (A): halt EX2/MEM - (B): add 2 2 3 EX2/MEM - (C): 18 MEM/WB - (D): (add) 1 2 1 If we assume that the first instruction of the program is in the fetch stage during Cycle 0, before which cycle (C) was the pipeline snapshot taken? (show your work) 13 2 data hazard stalls + 3 control hazard stalls + 6 instructions + 2 cycles for halt to get to EX1
9 What is the secret number at memory location 0? (enter only the number here) 42 Question 4: Pipeline Performance (6 points) V1 Consider a normal 5-stage LC2K pipeline as discussed in class with the following features: Using detect-and-forward to handle data hazards. Using speculate-and-squash to handle control hazards and always predict "Not Taken". Branches are resolved in the MEM stage. Data memory access (and the critical timing path in the MEM stage) is 10 ns, while the critical path in every other stage is 6 ns Assume a benchmark with the following characteristics will be run on this pipeline: add/nor: 50% beq: 15% lw: 30% sw: 5% 40% of all branches are Taken 20% of lw instructions are immediately followed by a dependent instruction. 10% of lw instructions (disjoint from the previous percentage) are followed by a dependent instruction, but have a single non-dependent instruction between the lw and the dependent instruction. What is the CPI of this pipeline when running this benchmark? 1 + 0.25 * 0.40 * 3 (beq) + 0.20 * 0.10 * 1 (lw) = 1.32 CPI Now, the MEM stage is split into two stages. It reduces the cycle time by splitting the data memory access latency equally between the MEM stages. Branches are resolved in the first MEM stage. What is the new CPI? 1 + 0.25 * 0.40 * 3 (beq) + 0.20 * 0.10 * 2 (lw imm dep) + 0.20 * 0.20 * 1 (lw) = 1.38 CPI Assuming 10 billion instructions execute, what is the total execution time for both the original (subquestion 1 above) and modified (subquestion 2 above) pipeline? Show your work for credit. Old execution time: 1.32 * 100 million * 20 ns = 26.4 * 10^-9 * 10^8 = 2.64 s New execution time: 1.38 * 100 million * 10 ns = 1.38 s
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
10 Question 4: Pipeline Performance (6 points) V2 Consider a normal 5-stage LC2K pipeline as discussed in class with the following features: Using detect-and-forward to handle data hazards. Using speculate-and-squash to handle control hazards and always predict "Not Taken". Branches are resolved in the MEM stage. Data memory access (and the critical timing path in the MEM stage) is 10 ns, while the critical path in every other stage is 6 ns Assume a benchmark with the following characteristics will be run on this pipeline: add/nor: 50% beq: 25% lw: 15% sw: 10% 30% of all branches are Taken 15% of lw instructions are immediately followed by a dependent instruction. 10% of lw instructions (disjoint from the previous percentage) are followed by a dependent instruction, but have a single non-dependent instruction between the lw and the dependent instruction. What is the CPI of this pipeline when running this benchmark? 1 + 0.25 * 0.3 * 3 (beq) + 0.15 * 0.15 * 1 (lw) = 1.2475 CPI Now, the MEM stage is split into two stages. It reduces the cycle time by splitting the data memory access latency equally between the MEM stages. Branches are resolved in the first MEM stage. What is the new CPI? 1 + 0.25 * 0.3 * 3 (beq) + 0.15 * 0.15 * 2 (lw) + 0.10 * 0.15 * 1 (lw) = 1.285 CPI Assuming 10 billion instructions execute, what is the total execution time for both the original (subquestion 1 above) and modified (subquestion 2 above) pipeline? Show your work for credit. Old execution time: 1.2475 * 10 billion * 10 ns = 12.925 * 10 9 * 10 -8 = 124.75 s New execution time: 1.285 * 10 billion * 6 ns = 12.85 * 10 9 * 6 * 10 -9 = 77.1 s Question 4: Pipeline Performance (6 points) V3 Consider a normal 5-stage LC2K pipeline as discussed in class with the following features: Using detect-and-forward to handle data hazards. Using speculate-and-squash to handle control hazards and always predict "Not Taken".
11 Branches are resolved in the MEM stage. Data memory access (and the critical timing path in the MEM stage) is 10 ns, while the critical path in every other stage is 6 ns Assume a benchmark with the following characteristics will be run on this pipeline: add/nor: 40% beq: 30% lw: 20% sw: 10% 30% of all branches are Taken 20% of lw instructions are immediately followed by a dependent instruction. 10% of lw instructions (disjoint from the previous percentage) are followed by a dependent instruction, but have a single non-dependent instruction between the lw and the dependent instruction. What is the CPI of this pipeline when running this benchmark? 1 + 0.3 * 0.3 * 3 (beq) + 0.20 * 0.20 * 1 (lw) = 1.31 CPI Now, the MEM stage is split into two stages. It reduces the cycle time by splitting the data memory access latency equally between the MEM stages. Branches are resolved in the first MEM stage. What is the new CPI? 1 + 0.3 * 0.3 * 3 (beq) + 0.20 * 0.20 * 2 (lw) + 0.10 * 0.20 * 1 (lw) = 1.37 CPI Assuming 10 billion instructions execute, what is the total execution time for both the original (subquestion 1 above) and modified (subquestion 2 above) pipeline? Show your work for credit. Old execution time: 1.31 * 10 million * 10 ns = 0.131 s New execution time: 1.37 * 10 million * 6 ns = 0.0822 s Question 4: Pipeline Performance (6 points) V4 Consider a normal 5-stage LC2K pipeline as discussed in class with the following features: Using detect-and-forward to handle data hazards. Using speculate-and-squash to handle control hazards and always predict "Not Taken". Branches are resolved in the MEM stage. Data memory access (and the critical timing path in the MEM stage) is 15 ns, while the critical path in every other stage is 12 ns Assume a benchmark with the following characteristics will be run on this pipeline: add/nor: 50% sw: 5% lw: 30% beq: 15% 40% of all branches are Not Taken 20% of lw instructions are immediately followed by a dependent instruction. 10% of lw instructions (disjoint from the previous percentage) are followed by a dependent instruction, but have a single non-dependent instruction between the lw and the dependent instruction.
12 What is the CPI of this pipeline when running this benchmark? 1 + 0.15 * 0.60 * 3 (beq) + 0.30 * 0.20 * 1 (lw) = 1.33 CPI Now, the MEM stage is split into two stages. It reduces the cycle time by splitting the data memory access latency equally between the MEM stages. Branches are resolved in the MEM stage. What is the new CPI? (resolved in first MEM) 1 + 0.15 * 0.60 * 3 (beq) + 0.30 * 0.20 * 2 (lw imm dep) + 0.30 * 0.10 * 1 (lw) = 1.42 CPI (resolved in second MEM) 1 + 0.15 * 0.60 * 4 (beq) + 0.30 * 0.20 * 2 (lw imm dep) + 0.30 * 0.10 * 1 (lw) = 1.51 CPI Assuming 10 billion instructions execute, what is the total execution time for both the original (subquestion 1 above) and modified (subquestion 2 above) pipeline? Show your work for credit. (resolved in first MEM) New execution time: 1.42 * 1 million * 12 ns = 0.01704 s (resolved in second MEM) New execution time: 1.51 * 1 million * 12 ns = 0.01812 s Question 4: Pipeline Performance (6 points) V5 Consider a normal 5-stage LC2K pipeline as discussed in class with the following features: Using detect-and-forward to handle data hazards. Using speculate-and-squash to handle control hazards and always predict "Not Taken". Branches are resolved in the MEM stage. Data memory access (and the critical timing path in the MEM stage) is 10 ns, while the critical path in every other stage is 6 ns Assume a benchmark with the following characteristics will be run on this pipeline: add/nor: 55% beq: 15% lw: 20% sw: 10% 30% of all branches are Taken 20% of lw instructions are immediately followed by a dependent instruction. 10% of lw instructions (disjoint from the previous percentage) are followed by a dependent instruction, but have a single non-dependent instruction between the lw and the dependent instruction. What is the CPI of this pipeline when running this benchmark? CPI = 1 + 0.15 * 0.3 * 3 (beq) + 0.2 * 0.2 * 1 (lw) = 1.175
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
13 Now, the MEM stage is split into two stages. It reduces the cycle time by splitting the data memory access latency equally between the MEM stages. Branches are resolved in the first MEM stage. What is the new CPI? CPI = 1 + 0.15 * 0.3 * 3 + 0.2 * 0.2 * 2 + 0.2 * 0.1 * 1 = 1.235 Assuming 10 billion instructions execute, what is the total execution time for both the original (subquestion 1 above) and modified (subquestion 2 above) pipeline? Show your work for credit. old exec time = 1.175 * 10^-8 * 10^10 = 117.5s new exec time = 1.235 * 6 * 10^-9 * 10^10 = 74.1s Question 5: Reverse Engineering the Cache (14 points) Assume the following: 8-bit byte-addressable ISA Cache size: <= 32B (not including overhead). Associativity: Fully-associative cache Cache block size and number of sets are both powers of two. Cache is initially empty. Given the following cache outcomes (hit/miss), determine the cache block size and number of cache blocks by answering the following questions. Access # Address Tag Cache Hit/ Miss 1 0x80 M 2 0xB4 M 3 0x81 H 4 0xAF M 5 0xAC H 6 0xB3 M 7 0x81 M
14 8 0x87 H 9 0x75 M 10 0x79 M 5.1) Cache block size is greater than or equal to (>=) N Bytes. Determine maximum value for N and enter only a number for N here: 8B - maximum referenced the largest lower bound possible (ie 8 is a larger lower bound than 2) partial credit was awarded when applicable due to confusion in wording Your choice of (>=) cache block size above is known because of which access numbers from the table (choose exactly two)? access # 7 & 8 Cache block size is less than (<) N Bytes. Determine minimum value for N and enter only a number for N here: 16 B - minimum referenced the smallest upper bound possible (ie 32 is a larger upper bound than 16) partial credit was awarded when applicable due to confusion in wording Your choice of (<) cache block size above is known because of which access numbers from the table (choose exactly two)? access 9 & 10 Therefore, cache block size is N Bytes (enter only a number for N here): 8B 5.2) Number of cache blocks is greater than (>=) N # of blocks (enter only a number for N here): question was intended to match wording for 5.1 and should have said greater than or equal to 2 blocks >= 1 was also accepted due to typo Your choice of (>=) number of cache blocks above is known because of which access numbers from the table? Choose exactly two accesses with the same tag: access # 1 & 3
15 Number of cache blocks is less than or equal to (<=) N # of blocks (enter only a number for N here): question was intended to match wording of 5.1 should have been less than < < 4 blocks also accept 3 even if not power of 2 2 also accepted due to typo saying <= Your choice of (<=) number of cache blocks above is known because of which access numbers from the table. Choose exactly two accesses with the same tag: access # 3 & 7 or # 2 & 6 (same tag req.) access 1&3 or the ones above were accepted for students who chose 2 blocks bc of typo Therefore, number of cache blocks is N # of blocks (enter only a number for N here)? 2 Note: Typo corrections for this question are in red, due to these errors there was flexibility in grading based on the properties of the cache and demonstration of understanding from student answers. To gain more context of the cache and how to go about this problem see below: Access 1 & 2 only have the first two bits the same- telling students that tag bits are at least two bits; however, with the block offset being 6 bits this would exceed the size of the cache specs given Access 3 tells us the tag can be at most 7 bits long (block offset being only 1 bit if this is true) Access 4 tells students that the tag must be at least 4 bits (if the tag were 3 bits or less there would be a hit given access 2) Access 5 tells students that the block offset has to be at least two bits for this to be a hit with access 4 (meaning the tag can be at most 6 bits) Access 7 tells us that there cannot be more than two blocks OR that the tag is 6 bits- also this being 0x81 again and being a miss tells students this cache block was evicted Access 8 tells students the block offset must be 3 bits for this to be a hit Access 9 & 10- this being a miss confirms that the block offset must be 3 bits and that the tag must be 5 bits
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
16 Question 6: 3 C’s (2 Points) Consider a scenario where we increase the cache block size in a set-associative cache. Cache size and associativity is kept unchanged. Choose the answer for each of the following 4 sub-questions. 6.1) Number of sets: increases decreases remains the same 6.2) Number of cache blocks: increases decreases remains the same 6.3) Compulsory misses: increases decreases remains the same 6.4) Exploits spatial locality: better worse about the same Question 7: 3 C’s Part II (6 Points) Consider a byte-addressable architecture with the following cache: Cache size: 64 bytes Cache block size: 16 bytes Associativity: 2-way set-associative Assume that the cache is initially empty, and uses LRU replacement policy. Lower way is chosen when the set is empty. Answer the following questions for the list of memory references show in this table. (Empty columns are just workspace). Reference Tag Set Index Hit/Miss Type (if miss) Tag of evicted
17 block (if applicable) 0x0DD5 0x6E 0x1 miss compulsory - 0x1F07 0xF8 0x0 miss compulsory - 0x02AE 0x15 0x0 miss compulsory - 0x0509 0x28 0x0 miss compulsory 0xF8 0x2EDF 0x176 0x1 miss compulsory - 0x1F0F 0xF8 0x0 miss conflict 0x15 0x1F01 0xF8 0x0 hit - - 0x48D7 0x246 0x1 miss compulsory 0x6E 0x0DDA 0x6E 0x1 miss capacity 0x176 0x0380 0x1C 0x0 miss compulsory 0x28 7.1) List all the memory references that incur conflict misses in the cache. State each memory reference in the following format: ROW, address (0xZZZZ), tag (0xZZZ), set-index (0xZ), tag-of-replaced-block-if-any (0xZZZ) Each letter "Z" above stands for one hexadecimal digit. (e.g.,: ROW-X, 0x1234, 0x120, 0x1, 0x000) ROW-F, 0x1F0F, 0xF8, 0x0, 0x15 7.2) List all the memory references that incur capacity misses in the cache. State each memory reference in the following format: ROW, address (0xZZZZ), tag (0xZZZ), set-index (0xZ), tag-of-replaced-block-if-any (0xZZZ) Each letter "Z" above stands for one hexadecimal digit. (e.g.,: ROW-X, 0x1234, 0x120, 0x1, 0x000) ROW-I, 0x0DDA, 0x6E, 0x1, 0x176 7.3) At the end of simulating the above address sequence, specify the final state of the
18 2-way set-associative cache using the tag of the cache blocks stored in the cache. Way 0, Set 0 Tag: 0x01C Way 0, Set 1 Tag: 0x246 Way 1, Set 0 Tag: 0x0F8 Way 1, Set 1 Tag: 0x06E Question 8: Hierarchical Page Tables (5 Points) Assume 48-bit byte-addressable ISA system that supports virtual memory with the following specifications: 3-level page table Page size: 4 KB Physical memory size: 16 GB Size of each 3rd level page table: 8 pages Size of each 2nd level page table: 1 page Page table entry size: 4 bytes Determine the following: Page Offset Size # bits (enter only the number here): 12 Bits Size of physical page number (PPN) # bits (enter only the number here): 22 Bits Size of 3rd level page table index # bits (enter only the number here): 13 Bits Size of each 3rd level page table, 2^ N Bytes (enter only the exponent number N here): 2^15 Bytes Size of 2nd level page table index # bits (enter only the number here): 10 bits
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
19 Size of each 2nd level page table 2^ N Bytes (enter only the exponent number N here): 2^12 Bytes Size of 1st level page table index # bits (enter only the number here): 13 Bits Size of 1st level page table 2^ N Bytes (enter only the exponent number N here): 32 KB In the worst case, how much memory would this 3-level page table occupy? State your answer in the form 2^1 + 2^2 + … + 2^n bytes. 2 15 + 2 25 + 2 38 Bytes For the same page size, if this system used a single-level page table instead of a three- level page table, how much memory would it occupy in the worst case? State your answer in the form 2^1 + 2^2 + … + 2^n bytes. 2 38 Bytes Question 9: Hierarchical Page Tables (6.5 Points) Assume 16-bit ISA that uses a 3-level page table. Each virtual memory address is split into following fields: 1st level index size: 4 bits 2nd level index size: 6 bits 3rd level index size: 4 bits Physical page offset size: 2 bits Consider the following virtual addresses accessed in the order given below. Determine the 1st, 2nd and 3rd level indices for each access. Virtual Address 1st level index 2nd level index 3rd level index 0xA7E4 1010 011111 1001 0xB09B 1011 000010 0110 0xA78F 1010 011110 0011 0xC78F 1100 011110 0011 0xB098 1011 000010 0110 What goes in the three empty boxes for Row A, Virtual Address 0xA7E4? 1010 011111 1001
20 What goes in the three empty boxes for Row B, Virtual Address 0xB09B? 1011 000010 0110 What goes in the three empty boxes for Row C, Virtual Address 0xA78F? 1010 011110 0011 What goes in the three empty boxes for Row D, Virtual Address 0xC78F? 1100 011110 0011 What goes in the three empty boxes for Row E, Virtual Address 0xB098? 1011 000010 0110 How many first-level page tables will have been allocated after these 5 memory accesses? (enter only the number here) 1 How many second-level page tables will have been allocated after these 5 memory accesses? (enter only the number here) 3 How many third-level page tables will have been allocated after these 5 memory accesses? (enter only the number here) 4 Question 10: Virtual Memory (11 Points) Assume 16-bit ISA and the following system configuration. Cache: Physically addressed 2-way set associative Block size: 4 bytes Cache size: 16 bytes Latency for each memory component:
21 Disk: 1000 ns Physical memory: 50 ns TLB: 2 ns Cache: 1 ns Memory system: Physical memory size: 4KB Single level page table. The TLB has 2 entries and is fully associative Assume the following: The first 8 pages (PPN: 0 - 7) in the physical memory are empty and free to use. And the rest are reserved for the OS. The page table is pinned in the physical memory. It is not cached, except in TLB. Pages that are not in memory are on disk All updates are in parallel. Upon retrieval from cache, main memory or disk, the data is sent immediately to the CPU, while other updates occur in parallel A process accesses the following virtual addresses in order. Latency for each memory access is given. Based on the access latencies, determine the outcomes of cache and TLB access, and whether it is a page fault or not. Access Number Virtual Address Latency (ns) TLB(H/M/NA) Cache (H/M/NA) Page Fault (Y/N) 1 0x126C 1052 M NA Y 2 0x122F 1052 M NA Y 3 0x122B 53 H M N 4 0x126D 3 H H N 5 0x35AC 1052 M NA Y 6 0x122A 53 M H N 6 0x1220 103 M M N 7 0x125B 103 M M N Based on the above accesses and their latencies, determine the page size by answering the following questions. Page size is greater than or equal to (>=) 2^ N Bytes. Determine the maximum value for N, and enter only the exponent number N here: 6 Your choice of Page size (>=) in Question 11.8 is known because of which access
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
22 numbers from the table (choose exactly two): because of access # ___1 / 4____ and access # ____7___ Page size is less than or equal to (<=) 2^ N Bytes. Determine the minimum value for N, and enter only the exponent number N here: 6 Your choice of Page size (<=) in Question 11.10 is known because of which access numbers from the table (choose exactly two): because of access # ___1 / 4 / 7____ and access # ____2 / 3 / 6____ Therefore, page size is 2^ N Bytes (enter only the exponent number N here): 6 Question 11: Stack-Based ISA (12 Points) Consider a new Stack-ISA with the following instructions. Its semantics are defined by the following two ISA-visible components: A stack memory A special register called COMPARE. It can store a value of 1 or 0. Assembly Instruction Execution semantics pushi immediate stack.push(immediate) (Labels resolve to their addresses) push addr stack.push(mem[addr]) less first = stack.pop(); second = stack.pop() if (first < second): COMPARE = 1 else: COMPARE = 0 greater first = stack.pop(); second = stack.pop() if (first > second): COMPARE = 1 else: COMPARE = 0 bcmp destination = stack.pop()
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
23 if (COMPARE == 1): COMPARE = 0 PC = destination 11.1) Use the instructions in Stack-ISA to implement "Unconditional branch to address 20" Branch to PC = 20 ____________ pushi 20 pushi 1 pushi 0 ____________ less ____________ bcmp 11.2) Use the instructions in Stack-ISA to implement "conditional branch to 20 if mem[5] is equal to mem[8]". (Hint: you should use part a in your implementation) if (mem[5] == mem[8]) { Branch to PC = 20 } pushi neq SOLN 1 SOLN 2 ____________ push 5 push 8 ____________ push 8 push 5 less ____________ bcmp ____________ pushi neq ____________ push 5 push 8 ____________ push 8 push 5 greater ____________ bcmp eq ____________ pushi 20 pushi 1 pushi 0 ____________ less ____________ bcmp neq noop
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help