Final F20 Solution
docx
keyboard_arrow_up
School
University of Michigan *
*We aren’t endorsed by this school
Course
370
Subject
Computer Science
Date
Dec 6, 2023
Type
docx
Pages
23
Uploaded by PresidentScienceKomodoDragon18
© EECS 370, University of Michigan
EECS 370 Final Exam
Fall 2020 Solution
This exam is presented with modifications.
This exam was originally released through gradescope in a virtual semester. Some comments have been added to reflect modifications for a paper exam.
Question 1: Short Answer T/F
Specify whether statements a-j are True or False. (On the exam a random set of 10 questions was chosen)
True/False
a.
There are 32-bit two's complement integers that cannot be represented in a 32-bit IEEE-754 single precision number.
T
b.
Deeper pipelines will generally have a shorter clock period.
T
c.
It is possible for a write-back and a write-through cache to perform the same number of writes.
T
d.
For a given cache size and block size, increasing associativity will reduce cache misses for any program.
F
e.
Resolving data hazards with detect and stall and control hazards with speculate and squash will produce the least number of errors
T
f.
Direction prediction for jalr is worse than beq.
F
g.
Fetch stage can read an instruction from memory when there is a lw/sw in the MEM stage.
F
h.
Physical memory and disk is typically large enough to hold all of a process’s virtual memory.
F
i.
DRAM is a non-volatile memory.
F
j.
You have scrolled through the whole exam to understand point distribution and allocate your time as such.
T
k.
If a program accesses the same cache block repeatedly, it is has higher temporal locality
T
2
l.
Multi-level page tables will always take up less space than single-
level page tables.
F
m.
Cache is about the same size as main memory.
F
n.
Dirty bits are needed for write-through caches but not for write-
back caches.
F
o.
Assuming we have an allocate-on-write store policy, an infinitely large cache would have only one miss per cache block.
T
p.
The number of block offset bits is equal to log
2
(size of the cache).
F
q.
For a given cache configuration, a write-through cache may write fewer bytes to memory than a write-back cache running the same program.
T
r.
Virtual memory offers the illusion of infinite memory space.
T
s.
A disadvantage of having a deeper pipeline is that there will be more stalls from data and control hazards.
T
t.
For a given cache size and block size, increasing associativity will reduce tag size.
F
3
Question 2: Points _/8
Assume the following LC2K program is executed on the 5-stage pipeline from lecture until it halts.
V1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
loop
end
neg1
pos1
five
count
lw
lw
lw
lw
add
add
sw
beq
beq
halt
.fill
.fill
.fill
.fill
0
0
0
0
4
3
0
0
0
1
2
3
4
2
1
4
3
0
neg1
pos1
five
count
4
3
count
end
loop
-1
1
5
0
V2
1
2
3
4
5
loop
lw
lw
lw
lw
add
0
0
0
0
2
1
2
3
4
4
four
count
neg1
pos1
2
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
4
6
7
8
9
10
11
12
13
14
end
neg1
pos1
four
count
add
sw
beq
beq
halt
.fill
.fill
.fill
.fill
1
0
0
0
3
2
1
0
1
count
end
loop
-1
1
4
0
V3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
loop
end
neg1
pos1
four
count
lw
lw
lw
lw
add
add
sw
beq
beq
halt
.fill
.fill
.fill
.fill
0
0
0
0
4
3
0
0
0
1
2
3
4
2
1
4
3
0
neg1
pos1
four
count
42q
3
count
end
loop
-1
1
4
0
V4
1
2
3
4
5
6
7
loop
lw
lw
lw
lw
add
add
sw
0
0
0
0
2
1
0
1
2
3
4
4
3
2
six
count
neg1
pos1
2
1
count
5
8
9
10
11
12
13
14
end
neg1
pos1
six
count
beq
beq
halt
.fill
.fill
.fill
.fill
0
0
1
0
end
loop
-1
1
6
0
V5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
loop
end
neg1
pos1
five
count
lw
lw
lw
lw
add
add
sw
beq
beq
halt
.fill
.fill
.fill
.fill
0
0
0
0
2
1
0
0
0
1
2
3
4
4
3
2
1
0
five
count
neg1
pos1
2
1
count
end
loop
-1
1
5
0
A.
Select all registers that cause a RAW data hazard in a detect and stall pipeline.
a.
reg0
b.
reg1
c.
reg2
d.
reg3 e.
reg4
f.
reg5
g.
reg6
Lines 4-5 need 2 stalls Lines 5-7 need 1 stall
V1: reg4
V2: reg4 and reg2
6
V3: reg4
V4: reg4 and reg2
V5: reg4 and reg2
B.
Using detect-and-stall to resolve data hazards, what is the number of stalls that occur due to data hazards?
V1
7 = 2 + 5 * 1. 2 stalls from lines 4-5. 1 stall for lines 5-7 (Lines 6-8 is NOT a dependency)
V2
:
6 = 2 + 4 * 1. 2 stalls from lines 4-5. 1 stall for lines 5-7 (Lines 6-8 is NOT a dependency)
V3
6 = 2 + 4 * 1. 2 stalls from lines 4-5. 1 stall for lines 5-7 (Lines 6-8 is NOT a dependency)
V4
8 = 2 + 6 * 1. 2 stalls from lines 4-5. 1 stall for lines 5-7 (Lines 6-8 is NOT a dependency)
V5
7 = 2 + 5 * 1. 2 stalls from lines 4-5. 1 stall for lines 5-7 (Lines 6-8 is NOT a dependency)
C.
Using detect-and-forward to resolve data hazards, what is the number of stalls that occur
due to data hazards?
All versions : 1 stall is required to resolve the lw-dependency on lines 4 - 5.
D.
Using speculate-and-squash, along with predicting branches always-not-taken to resolve
control hazards, what is the number of stalls that occur due to branch mispredictions?
Each misprediction costs 3 cycles. You can calculate the number of mispredictions by looking at how many times each branch is taken.
V1
15 = 5 * 3. Line 9, beq 0 0, is taken 4 times. Line 8, beq 0 3, is taken 1 time. V2
:
12 = 4 * 3. Line 9, beq 0 0, is taken 3 times. Line 8, beq 0 3, is taken 1 time.
V3
12 = 4 * 3. Line 9, beq 0 0, is taken 3 times. Line 8, beq 0 3, is taken 1 time.
V4
18 = 6 * 3. Line 9, beq 0 0, is taken 5 times. Line 8, beq 0 3, is taken 1 time.
V5
15 = 5 * 3. Line 9, beq 0 0, is taken 4 times. Line 8, beq 0 3, is taken 1 time.
Question 3: Pipeline Design (11 Points)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
7
Consider the LC2K pipeline with the following change:
●
Execution (EX) stage split into two (EX1, EX2).
●
EX1 calculates the branch destination and performs the equality check of beq
. ●
EX2 utilizes the ALU for the add/nor/lw/sw
instructions. EX2 also updates the PC for branches, if necessary. ●
Assume all values are forwarded to EX1.
Control hazards resolved using speculate and squash
, where branches are predicted to be not-taken
. Data hazards resolved using detect and forward
.
Consider the following LC2K program:
1.
lw 0
1
0
2.
lw
0
2
1
3.
beq
2
2
1
4.
add
1
1
1
5.
add
1
2
1
6.
add
2
2
3
7.
halt
You are a hacker that has somehow managed to take the following snapshot of the pipeline registers while executing the above program before cycle C
. Your goal is to determine a secret number that is stored in virtual memory address 0
.
IF/ID
ID/EX1
instr:
X X X X
instr:
X X X X
PC+1:
X
PC+1:
X
regA val:
regB val:
offset:
EX1/EX2
EX2/MEM
Instr: (A)
_ X X X
Instr: (B)
_ _ _ _
PC Plus 1:
X
AluResult:
36
regA val:
X
regB val: (C)
_
regB val:
X
branchTarget:
X
MEM/WB
WB/END
Instr: (D)
add _ _ _
instr:
noop X X X
writeData:
60
writeData:
X
8
List all the data hazards. Specify each data hazard in the following format:
line #X, line #Y, number-of-cycles-stalled
where X, Y are lines in the LC2K program that have a RAW dependency causing the data hazard.
Line #2, Line #3, 2 cycles
List all the control hazards. Specify each control hazard in the following format: line #X, number-of-cycles-stalled
where X is a line in the LC2K program that causes the control hazard.
Line #3, 3 cycles
How many instructions does the LC2K program execute? (enter only the number here)
6 instructions
Fill in the missing blanks in the highlighted boxes from the pipeline snapshot above.
EX1/EX2 - (A): halt
EX2/MEM - (B): add 2 2 3
EX2/MEM - (C): 18
MEM/WB - (D): (add) 1 2 1
If we assume that the first instruction of the program is in the fetch stage during Cycle 0, before which cycle (C) was the pipeline snapshot taken? (show your work)
13
2 data hazard stalls + 3 control hazard stalls + 6 instructions + 2 cycles for halt to get to EX1
9
What is the secret number at memory location 0? (enter only the number here)
42
Question 4: Pipeline Performance (6 points) V1
Consider a normal 5-stage LC2K pipeline as discussed in class with the following features:
●
Using detect-and-forward
to handle data hazards.
●
Using speculate-and-squash
to handle control hazards and always predict
"Not Taken".
●
Branches are resolved in the MEM stage.
●
Data memory access (and the critical timing path in the MEM stage) is 10 ns, while the critical path in every other stage is 6 ns
Assume a benchmark with the following characteristics will be run on this pipeline:
●
add/nor:
50%
●
beq:
15%
●
lw:
30%
●
sw:
5%
●
40% of all branches are Taken
●
20% of lw instructions are immediately followed by a dependent instruction.
●
10% of lw instructions (disjoint from the previous percentage) are followed by a dependent instruction, but have a single non-dependent instruction between the lw and the dependent instruction.
What is the CPI of this pipeline when running this benchmark?
1 + 0.25 * 0.40 * 3 (beq) + 0.20 * 0.10 * 1 (lw) = 1.32 CPI
Now, the MEM stage is split into two stages. It reduces the cycle time by splitting the data memory access latency equally between the MEM stages.
Branches are resolved in the first MEM stage.
What is the new CPI?
1 + 0.25 * 0.40 * 3 (beq) + 0.20 * 0.10 * 2 (lw imm dep) + 0.20 * 0.20 * 1 (lw) = 1.38 CPI
Assuming 10 billion instructions execute, what is the total execution time for both the original (subquestion 1 above) and modified (subquestion 2 above) pipeline? Show your work for credit.
Old execution time: 1.32 * 100 million * 20 ns = 26.4 * 10^-9 * 10^8 = 2.64 s
New execution time: 1.38 * 100 million * 10 ns = 1.38 s
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
10
Question 4: Pipeline Performance (6 points) V2
Consider a normal 5-stage LC2K pipeline as discussed in class with the following features:
●
Using detect-and-forward
to handle data hazards.
●
Using speculate-and-squash
to handle control hazards and always predict
"Not Taken".
●
Branches are resolved in the MEM stage.
●
Data memory access (and the critical timing path in the MEM stage) is 10 ns, while the critical path in every other stage is 6 ns
Assume a benchmark with the following characteristics will be run on this pipeline:
●
add/nor:
50%
●
beq:
25%
●
lw:
15%
●
sw:
10%
●
30% of all branches are Taken
●
15% of lw instructions are immediately followed by a dependent instruction.
●
10% of lw instructions (disjoint from the previous percentage) are followed by a dependent instruction, but have a single non-dependent instruction between the lw and the dependent instruction.
What is the CPI of this pipeline when running this benchmark?
1 + 0.25 * 0.3 * 3 (beq) + 0.15 * 0.15 * 1 (lw) = 1.2475 CPI
Now, the MEM stage is split into two stages. It reduces the cycle time by splitting the data memory access latency equally between the MEM stages.
Branches are resolved in the first MEM stage.
What is the new CPI?
1 + 0.25 * 0.3 * 3 (beq) + 0.15 * 0.15 * 2 (lw) + 0.10 * 0.15 * 1 (lw) = 1.285 CPI
Assuming 10 billion instructions execute, what is the total execution time for both the original (subquestion 1 above) and modified (subquestion 2 above) pipeline? Show your work for credit.
Old execution time: 1.2475 * 10 billion * 10 ns = 12.925 * 10
9
* 10
-8
= 124.75 s
New execution time: 1.285 * 10 billion * 6 ns = 12.85 * 10
9
* 6 * 10
-9
= 77.1 s
Question 4: Pipeline Performance (6 points) V3
Consider a normal 5-stage LC2K pipeline as discussed in class with the following features:
●
Using detect-and-forward
to handle data hazards.
●
Using speculate-and-squash
to handle control hazards and always predict
"Not Taken".
11
●
Branches are resolved in the MEM stage.
●
Data memory access (and the critical timing path in the MEM stage) is 10 ns, while the critical path in every other stage is 6 ns
Assume a benchmark with the following characteristics will be run on this pipeline:
●
add/nor:
40%
●
beq:
30%
●
lw:
20%
●
sw:
10%
●
30% of all branches are Taken
●
20% of lw instructions are immediately followed by a dependent instruction.
●
10% of lw instructions (disjoint from the previous percentage) are followed by a dependent instruction, but have a single non-dependent instruction between the lw and the dependent instruction.
What is the CPI of this pipeline when running this benchmark?
1 + 0.3 * 0.3 * 3 (beq) + 0.20 * 0.20 * 1 (lw) = 1.31 CPI
Now, the MEM stage is split into two stages. It reduces the cycle time by splitting the data memory access latency equally between the MEM stages.
Branches are resolved in the first MEM stage.
What is the new CPI?
1 + 0.3 * 0.3 * 3 (beq) + 0.20 * 0.20 * 2 (lw) + 0.10 * 0.20 * 1 (lw) = 1.37 CPI
Assuming 10 billion instructions execute, what is the total execution time for both the original (subquestion 1 above) and modified (subquestion 2 above) pipeline? Show your work for credit.
Old execution time: 1.31 * 10 million * 10 ns = 0.131 s
New execution time: 1.37 * 10 million * 6 ns = 0.0822 s
Question 4: Pipeline Performance (6 points) V4
Consider a normal 5-stage LC2K pipeline as discussed in class with the following features:
●
Using detect-and-forward
to handle data hazards.
●
Using speculate-and-squash
to handle control hazards and always predict
"Not Taken".
●
Branches are resolved in the MEM stage.
●
Data memory access (and the critical timing path in the MEM stage) is 15 ns, while the critical path in every other stage is 12 ns
Assume a benchmark with the following characteristics will be run on this pipeline:
●
add/nor:
50%
●
sw:
5%
●
lw:
30%
●
beq:
15%
●
40% of all branches are Not Taken
●
20% of lw instructions are immediately followed by a dependent instruction.
●
10% of lw instructions (disjoint from the previous percentage) are followed by a dependent instruction, but have a single non-dependent instruction between the lw and the dependent instruction.
12
What is the CPI of this pipeline when running this benchmark?
1 + 0.15 * 0.60 * 3 (beq) + 0.30 * 0.20 * 1 (lw) = 1.33 CPI
Now, the MEM stage is split into two stages. It reduces the cycle time by splitting the data memory access latency equally between the MEM stages.
Branches are resolved in the MEM stage.
What is the new CPI?
(resolved in first MEM) 1 + 0.15 * 0.60 * 3 (beq) + 0.30 * 0.20 * 2 (lw imm dep) + 0.30 * 0.10 * 1 (lw) = 1.42 CPI
(resolved in second MEM) 1 + 0.15 * 0.60 * 4 (beq) + 0.30 * 0.20 * 2 (lw imm dep) + 0.30 * 0.10 * 1 (lw) = 1.51 CPI
Assuming 10 billion instructions execute, what is the total execution time for both the original (subquestion 1 above) and modified (subquestion 2 above) pipeline? Show your work for credit.
(resolved in first MEM) New execution time: 1.42 * 1 million * 12 ns = 0.01704 s
(resolved in second MEM) New execution time: 1.51 * 1 million * 12 ns = 0.01812 s
Question 4: Pipeline Performance (6 points) V5
Consider a normal 5-stage LC2K pipeline as discussed in class with the following features:
●
Using detect-and-forward
to handle data hazards.
●
Using speculate-and-squash
to handle control hazards and always predict
"Not Taken".
●
Branches are resolved in the MEM stage.
●
Data memory access (and the critical timing path in the MEM stage) is 10 ns, while the critical path in every other stage is 6 ns
Assume a benchmark with the following characteristics will be run on this pipeline:
●
add/nor:
55%
●
beq:
15%
●
lw:
20%
●
sw:
10%
●
30% of all branches are Taken
●
20% of lw instructions are immediately followed by a dependent instruction.
●
10% of lw instructions (disjoint from the previous percentage) are followed by a dependent instruction, but have a single non-dependent instruction between the lw and the dependent instruction.
What is the CPI of this pipeline when running this benchmark?
CPI = 1 + 0.15 * 0.3 * 3 (beq) + 0.2 * 0.2 * 1 (lw) = 1.175
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
13
Now, the MEM stage is split into two stages. It reduces the cycle time by splitting the data memory access latency equally between the MEM stages.
Branches are resolved in the first MEM stage.
What is the new CPI?
CPI = 1 + 0.15 * 0.3 * 3 + 0.2 * 0.2 * 2 + 0.2 * 0.1 * 1 = 1.235
Assuming 10 billion instructions execute, what is the total execution time for both the original (subquestion 1 above) and modified (subquestion 2 above) pipeline? Show your work for credit.
old exec time = 1.175 * 10^-8 * 10^10 = 117.5s new exec time = 1.235 * 6 * 10^-9 * 10^10 = 74.1s
Question 5: Reverse Engineering the Cache (14 points)
Assume the following:
●
8-bit byte-addressable ISA
●
Cache size:
<= 32B (not including overhead).
●
Associativity:
Fully-associative cache
●
Cache block size and number of sets are both powers of two.
●
Cache is initially empty.
Given the following cache outcomes (hit/miss), determine the cache block size and number of cache blocks by answering the following questions. Access #
Address Tag
Cache Hit/ Miss
1
0x80
M
2
0xB4
M
3
0x81
H
4
0xAF
M
5
0xAC
H
6
0xB3
M
7
0x81
M
14
8
0x87
H
9
0x75
M
10
0x79
M
5.1)
Cache block size
is greater than or equal to (>=) N
Bytes. Determine maximum value for N and enter only a number for N
here:
8B - maximum referenced the largest lower bound possible (ie 8 is a larger lower bound
than 2) partial credit was awarded when applicable due to confusion in wording
Your choice of (>=) cache block size above is known because of which access numbers
from the table (choose exactly two)?
access # 7 & 8
Cache block size
is less than (<) N
Bytes. Determine minimum value for N and enter only a number for N
here:
16 B - minimum referenced the smallest upper bound possible (ie 32 is a larger upper bound than 16) partial credit was awarded when applicable due to confusion in wording
Your choice of (<) cache block size above is known because of which access numbers from the table (choose exactly two)?
access 9 & 10
Therefore, cache block size is N
Bytes (enter only a number for N
here):
8B
5.2)
Number of cache blocks
is greater than (>=) N
# of blocks
(enter only a number for N
here): question was intended to match wording for 5.1 and should have said greater than or equal to
2 blocks >=
●
1 was also accepted due to typo
Your choice of (>=) number of cache blocks above is known because of which access numbers from the table? Choose exactly two accesses with the same tag:
access # 1 & 3
15
Number of cache blocks
is less than or equal to
(<=) N
# of blocks
(enter only a number for N
here): question was intended to match wording of 5.1 should have been less than <
< 4 blocks ●
also accept 3 even if not power of 2
●
2 also accepted due to typo saying <=
Your choice of (<=) number of cache blocks above is known because of which access numbers from the table. Choose exactly two accesses with the same tag:
access # 3 & 7 or # 2 & 6 (same tag req.)
●
access 1&3 or the ones above were accepted for students who chose 2 blocks bc of typo
Therefore, number of cache blocks is N
# of blocks
(enter only a number for N
here)?
2
Note: Typo corrections for this question are in red, due to these errors there was flexibility in grading based on the properties of the cache and demonstration of understanding from student answers. To gain more context of the cache and how to go about this problem see below:
●
Access 1 & 2 only have the first two bits the same- telling students that tag bits are at least two bits; however, with the block offset being 6 bits this would exceed
the size of the cache specs given
●
Access 3 tells us the tag can be at most 7 bits long (block offset being only 1 bit if
this is true)
●
Access 4 tells students that the tag must be at least 4 bits (if the tag were 3 bits or less there would be a hit given access 2)
●
Access 5 tells students that the block offset has to be at least two bits for this to be a hit with access 4 (meaning the tag can be at most 6 bits)
●
Access 7 tells us that there cannot be more than two blocks OR that the tag is 6 bits- also this being 0x81 again and being a miss tells students this cache block was evicted
●
Access 8 tells students the block offset must be 3 bits for this to be a hit
●
Access 9 & 10- this being a miss confirms that the block offset must be 3 bits and
that the tag must be 5 bits
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
16
Question 6: 3 C’s (2 Points)
Consider a scenario where we increase the cache block size in a set-associative cache.
Cache size and associativity is kept unchanged. Choose the answer for each of the following 4 sub-questions.
6.1) Number of sets:
❏
increases
❏
decreases
❏
remains the same
6.2) Number of cache blocks:
❏
increases
❏
decreases
❏
remains the same
6.3) Compulsory misses:
❏
increases
❏
decreases
❏
remains the same
6.4) Exploits spatial locality:
❏
better
❏
worse
❏
about the same
Question 7: 3 C’s Part II (6 Points)
Consider a byte-addressable architecture with the following cache:
Cache size:
64 bytes
Cache block size:
16 bytes
Associativity: 2-way set-associative
Assume that the cache is initially empty, and uses LRU replacement policy. Lower way is chosen when the set is empty.
Answer the following questions for the list of memory references show in this table. (Empty columns are just workspace).
Reference
Tag
Set Index
Hit/Miss
Type (if miss)
Tag of evicted
17
block (if
applicable) 0x0DD5
0x6E
0x1
miss
compulsory
-
0x1F07
0xF8
0x0
miss
compulsory
-
0x02AE
0x15
0x0
miss
compulsory
-
0x0509
0x28
0x0
miss
compulsory
0xF8
0x2EDF
0x176
0x1
miss
compulsory
-
0x1F0F
0xF8
0x0
miss
conflict
0x15
0x1F01
0xF8
0x0
hit
-
-
0x48D7
0x246
0x1
miss
compulsory
0x6E
0x0DDA
0x6E
0x1
miss
capacity
0x176
0x0380
0x1C
0x0
miss
compulsory
0x28
7.1) List all the memory references that incur conflict
misses
in the cache. State each memory reference in the following format:
ROW, address (0xZZZZ), tag (0xZZZ), set-index (0xZ), tag-of-replaced-block-if-any (0xZZZ)
Each letter "Z" above stands for one hexadecimal digit. (e.g.,: ROW-X, 0x1234, 0x120, 0x1, 0x000)
ROW-F, 0x1F0F, 0xF8, 0x0, 0x15
7.2) List all the memory references that incur capacity misses
in the cache. State each
memory reference in the following format:
ROW, address (0xZZZZ), tag (0xZZZ), set-index (0xZ), tag-of-replaced-block-if-any (0xZZZ)
Each letter "Z" above stands for one hexadecimal digit. (e.g.,: ROW-X, 0x1234, 0x120, 0x1, 0x000)
ROW-I, 0x0DDA, 0x6E, 0x1, 0x176
7.3) At the end of simulating the above address sequence, specify the final state of the
18
2-way set-associative cache using the tag of the cache blocks stored in the cache.
Way 0, Set 0 Tag: 0x01C
Way 0, Set 1 Tag: 0x246
Way 1, Set 0 Tag: 0x0F8
Way 1, Set 1 Tag: 0x06E
Question 8: Hierarchical Page Tables (5 Points)
Assume 48-bit byte-addressable ISA system that supports virtual memory with the following specifications:
●
3-level page table
●
Page size: 4 KB
●
Physical memory size:
16 GB ●
Size of each 3rd level page table: 8 pages
●
Size of each 2nd level page table: 1 page ●
Page table entry size:
4 bytes Determine the following:
Page Offset Size # bits (enter only the number here): 12 Bits
Size of physical page number (PPN) # bits (enter only the number here):
22 Bits
Size of 3rd level page table index # bits (enter only the number here):
13 Bits
Size of each 3rd level page table, 2^
N
Bytes (enter only the exponent number N
here):
2^15 Bytes
Size of 2nd level page table index # bits (enter only the number here):
10 bits
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
19
Size of each 2nd level page table 2^
N
Bytes (enter only the exponent number N
here):
2^12 Bytes
Size of 1st level page table index # bits (enter only the number here):
13 Bits
Size of 1st level page table 2^
N
Bytes (enter only the exponent number N
here):
32 KB
In the worst case, how much memory would this 3-level page table occupy? State your answer in the form 2^1 + 2^2 + … + 2^n bytes.
2
15 + 2
25 + 2
38 Bytes
For the same page size, if this system used a single-level page table instead of a three-
level page table, how much memory would it occupy in the worst case? State your answer in the form 2^1 + 2^2 + … + 2^n bytes.
2
38 Bytes
Question 9: Hierarchical Page Tables (6.5 Points)
Assume 16-bit ISA that uses a 3-level page table. Each virtual memory address is split into following fields:
1st level index size:
4 bits
2nd level index size: 6 bits
3rd level index size:
4 bits
Physical page offset size:
2 bits
Consider the following virtual addresses accessed in the order given below. Determine the 1st, 2nd and 3rd level indices for each access.
Virtual Address
1st level index
2nd level index
3rd level index
0xA7E4
1010 011111
1001
0xB09B
1011
000010
0110
0xA78F
1010
011110
0011
0xC78F
1100
011110
0011
0xB098
1011
000010
0110
What goes in the three empty boxes for Row A, Virtual Address 0xA7E4?
1010 011111
1001
20
What goes in the three empty boxes for Row B, Virtual Address 0xB09B?
1011
000010
0110
What goes in the three empty boxes for Row C, Virtual Address 0xA78F?
1010
011110
0011
What goes in the three empty boxes for Row D, Virtual Address 0xC78F?
1100
011110
0011
What goes in the three empty boxes for Row E, Virtual Address 0xB098?
1011
000010
0110
How many first-level page tables will have been allocated after these 5 memory accesses? (enter only the number here)
1
How many second-level page tables will have been allocated after these 5 memory accesses? (enter only the number here)
3
How many third-level page tables will have been allocated after these 5 memory accesses? (enter only the number here)
4
Question 10: Virtual Memory (11 Points)
Assume 16-bit ISA and the following system configuration.
Cache:
Physically addressed
2-way set associative
Block size:
4 bytes
Cache size:
16 bytes
Latency for each memory component:
21
Disk: 1000 ns
Physical memory: 50 ns
TLB: 2 ns
Cache: 1 ns
Memory system:
Physical memory size:
4KB
Single level page table.
The TLB has 2 entries and is fully associative
Assume the following:
●
The first 8 pages (PPN: 0 - 7) in the physical memory are empty and free to use. And the rest are reserved for the OS. The page table is pinned in the physical memory. It is not cached, except in TLB.
●
Pages that are not in memory are on disk ●
All updates are in parallel. Upon retrieval from cache, main memory or disk, the data is sent immediately to the CPU, while other updates occur in parallel
A process accesses the following virtual addresses in order. Latency for each memory access is given. Based on the access latencies, determine the outcomes of cache and TLB access, and whether it is a page fault or not.
Access Number
Virtual Address
Latency (ns)
TLB(H/M/NA)
Cache (H/M/NA)
Page Fault (Y/N)
1
0x126C
1052
M
NA
Y
2
0x122F
1052
M
NA
Y
3
0x122B
53
H
M
N
4
0x126D
3
H
H
N
5
0x35AC
1052
M
NA
Y
6
0x122A
53
M
H
N
6
0x1220
103
M
M
N
7
0x125B
103
M
M
N
Based on the above accesses and their latencies, determine the page size by answering the following questions.
Page size
is greater than or equal to (>=) 2^
N
Bytes. Determine the maximum value for N, and enter only the exponent number
N
here:
6
Your choice of Page size (>=) in Question 11.8 is known because of which access
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
22
numbers from the table (choose exactly two):
because of access # ___1 / 4____ and access # ____7___
Page size
is less than or equal to (<=) 2^
N
Bytes. Determine the minimum value for N, and enter only the exponent number
N
here:
6
Your choice of Page size (<=) in Question 11.10 is known because of which access numbers from the table (choose exactly two):
because of access # ___1 / 4 / 7____ and access # ____2 / 3 / 6____
Therefore, page size is 2^
N
Bytes (enter only the exponent number
N
here):
6
Question 11: Stack-Based ISA (12 Points)
Consider a new Stack-ISA with the following instructions.
Its semantics are defined by the following two ISA-visible components:
●
A stack memory
●
A special register called COMPARE. It can store a value of 1 or 0. Assembly Instruction
Execution semantics
pushi immediate
stack.push(immediate)
(Labels resolve to their addresses)
push addr
stack.push(mem[addr])
less
first = stack.pop(); second = stack.pop()
if (first < second):
COMPARE = 1
else:
COMPARE = 0
greater
first = stack.pop(); second = stack.pop()
if (first > second):
COMPARE = 1
else:
COMPARE = 0
bcmp
destination = stack.pop()
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
23
if (COMPARE == 1):
COMPARE = 0
PC = destination
11.1) Use the instructions in Stack-ISA to implement "Unconditional branch to address 20"
Branch to PC = 20
____________ pushi 20
pushi 1
pushi 0
____________ less
____________ bcmp
11.2)
Use the instructions in Stack-ISA to implement "conditional branch to 20 if mem[5] is equal to mem[8]".
(Hint: you should use part a in your implementation)
if (mem[5] == mem[8]) {
Branch to PC = 20
}
pushi neq SOLN 1 SOLN 2
____________ push 5 push 8
____________ push 8 push 5
less
____________ bcmp
____________ pushi neq
____________ push 5 push 8
____________ push 8 push 5
greater
____________ bcmp
eq ____________ pushi 20
pushi 1
pushi 0
____________ less
____________ bcmp
neq noop
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Recommended textbooks for you
Programming Logic & Design Comprehensive
Computer Science
ISBN:9781337669405
Author:FARRELL
Publisher:Cengage

C++ Programming: From Problem Analysis to Program...
Computer Science
ISBN:9781337102087
Author:D. S. Malik
Publisher:Cengage Learning

EBK JAVA PROGRAMMING
Computer Science
ISBN:9781337671385
Author:FARRELL
Publisher:CENGAGE LEARNING - CONSIGNMENT

C++ for Engineers and Scientists
Computer Science
ISBN:9781133187844
Author:Bronson, Gary J.
Publisher:Course Technology Ptr

Microsoft Visual C#
Computer Science
ISBN:9781337102100
Author:Joyce, Farrell.
Publisher:Cengage Learning,

Programming with Microsoft Visual Basic 2017
Computer Science
ISBN:9781337102124
Author:Diane Zak
Publisher:Cengage Learning
Recommended textbooks for you
- Programming Logic & Design ComprehensiveComputer ScienceISBN:9781337669405Author:FARRELLPublisher:CengageC++ Programming: From Problem Analysis to Program...Computer ScienceISBN:9781337102087Author:D. S. MalikPublisher:Cengage LearningEBK JAVA PROGRAMMINGComputer ScienceISBN:9781337671385Author:FARRELLPublisher:CENGAGE LEARNING - CONSIGNMENT
- C++ for Engineers and ScientistsComputer ScienceISBN:9781133187844Author:Bronson, Gary J.Publisher:Course Technology PtrMicrosoft Visual C#Computer ScienceISBN:9781337102100Author:Joyce, Farrell.Publisher:Cengage Learning,Programming with Microsoft Visual Basic 2017Computer ScienceISBN:9781337102124Author:Diane ZakPublisher:Cengage Learning
Programming Logic & Design Comprehensive
Computer Science
ISBN:9781337669405
Author:FARRELL
Publisher:Cengage

C++ Programming: From Problem Analysis to Program...
Computer Science
ISBN:9781337102087
Author:D. S. Malik
Publisher:Cengage Learning

EBK JAVA PROGRAMMING
Computer Science
ISBN:9781337671385
Author:FARRELL
Publisher:CENGAGE LEARNING - CONSIGNMENT

C++ for Engineers and Scientists
Computer Science
ISBN:9781133187844
Author:Bronson, Gary J.
Publisher:Course Technology Ptr

Microsoft Visual C#
Computer Science
ISBN:9781337102100
Author:Joyce, Farrell.
Publisher:Cengage Learning,

Programming with Microsoft Visual Basic 2017
Computer Science
ISBN:9781337102124
Author:Diane Zak
Publisher:Cengage Learning