Computer Systems: A Programmer's Perspective (3rd Edition)
3rd Edition
ISBN: 9780134092669
Author: Bryant, Randal E. Bryant, David R. O'Hallaron, David R., Randal E.; O'Hallaron, Bryant/O'hallaron
Publisher: PEARSON
expand_more
expand_more
format_list_bulleted
Expert Solution & Answer
Chapter 6.6, Problem 6.21PP
Explanation of Solution
Estimating time in CPU cycles:
It is given that the sustained throughput using large strides from L1 is: 12,000 MB/s.
Clock frequency= 2,100 MHz
Individual read accesses= 8-byte long
Hence, one can estimate that it takes roughly
Expert Solution & Answer
Want to see the full answer?
Check out a sample textbook solutionStudents have asked these similar questions
1. We wish to compare the performance of two different machines: M1 and M2. The following measurements have been made on these machines:
Program
Time on M1
Time on M2
1
10 seconds
5 seconds
2
3 seconds
4 seconds
Which machine is faster for each program, and by how much?
2. For M1 and M2 of problem 1, the following additional measurements are made:. Find the instruction execution rate (instructions per second) for each machine when running program 1.
Program
Instructions executed on M1
Instructions executed on M2
1
200 x 106
160 x 106
3. For M1 and M2 of problem 1, if the clock rates are 200 MHz and 300 MHz, respectively, find the CPI for program 1 on both machines using the data provided in problems 1 and 2.
4. You are going to enhance a machine, and there are two possible improvements: either make multiply instructions run four times faster than before or make memory access instructions run two times faster than before. You…
1. Develop a mathematical model for measuring performance based on overall memory
access time with a neat diagram for the following memory design and derive the
formula to calculate the Overall Memory Access Time.
Main Memory : 1
Internal Cache : 1
External Cache: 1
Register S and Register B have fastest access time:
Data Search order [ Registers – Internal Cache – External Cache – Memory]
[Hint: Register access time is considered negligible]
4.19.16: [5] <COD §4.6>.
In this exercise, we examine how pipelining affects the clock cycle time of the processor. Problems in this exercise assume that individual stages of the datapath have the following latencies:
Also, assume that instructions executed by the processor are broken down as follows:
(a)
What is the clock cycle time in a pipelined and non-pipelined processor?
(b)
What is the total latency of an lw instruction in a pipelined and non-pipelined processor?
(c)
If we can split one stage of the pipelined datapath into two new stages, each with half the latency of the original stage, which stage would you split and what is the new clock cycle time of the processor?
(d)
Assuming there are no stalls or hazards, what is the utilization of the data memory?
(e)
Assuming there are no stalls or hazards, what is the utilization of the write-register port of the "Registers" unit?
No hand written and fast answer with explanation
Chapter 6 Solutions
Computer Systems: A Programmer's Perspective (3rd Edition)
Ch. 6.1 - Prob. 6.1PPCh. 6.1 - Prob. 6.2PPCh. 6.1 - Prob. 6.3PPCh. 6.1 - Prob. 6.4PPCh. 6.1 - Prob. 6.5PPCh. 6.1 - Prob. 6.6PPCh. 6.2 - Prob. 6.7PPCh. 6.2 - Prob. 6.8PPCh. 6.4 - Prob. 6.9PPCh. 6.4 - Prob. 6.10PP
Ch. 6.4 - Prob. 6.11PPCh. 6.4 - Prob. 6.12PPCh. 6.4 - Prob. 6.13PPCh. 6.4 - Prob. 6.14PPCh. 6.4 - Prob. 6.15PPCh. 6.4 - Prob. 6.16PPCh. 6.5 - Prob. 6.17PPCh. 6.5 - Prob. 6.18PPCh. 6.5 - Prob. 6.19PPCh. 6.5 - Prob. 6.20PPCh. 6.6 - Prob. 6.21PPCh. 6 - Prob. 6.22HWCh. 6 - Prob. 6.23HWCh. 6 - Suppose that a 2 MB file consisting of 512-byte...Ch. 6 - The following table gives the parameters for a...Ch. 6 - The following table gives the parameters for a...Ch. 6 - Prob. 6.27HWCh. 6 - This problem concerns the cache in Practice...Ch. 6 - Suppose we have a system with the following...Ch. 6 - Suppose we have a system with following...Ch. 6 - Suppose that a program using the cache in Problem...Ch. 6 - Repeat Problem 6.31 for memory address0x16E8 A....Ch. 6 - Prob. 6.33HWCh. 6 - Prob. 6.34HWCh. 6 - Prob. 6.35HWCh. 6 - Prob. 6.36HWCh. 6 - Prob. 6.37HWCh. 6 - Prob. 6.38HWCh. 6 - Prob. 6.39HWCh. 6 - Given the assumptions in Problem 6.38, determine...Ch. 6 - You are writing a new 3D game that you hope will...Ch. 6 - Prob. 6.42HWCh. 6 - Prob. 6.43HWCh. 6 - Prob. 6.45HWCh. 6 - Prob. 6.46HW
Knowledge Booster
Similar questions
- Section 1.0 cites as a pitfall the utilization of a subset of the performace equation as a performance metric. To illustrate this, consider the following two processors. P1 has a clock rate of 4GHz, average CPI of 0.9, and requires the execution of 5.0E9 instructions. P2 has a clock rate of 3GHz, an average CPI of 0.75, and requires the execution of 1.0E9 instructions. (1) A common fallacy is to use MIPS to compare the performace of two different processors, and consider that the processor with the largest MIPS has the largest performance. Check if this is true for P1 and P2. (2) Another common performace figure is MFLOPS, defined as MFLOPS = No. FP operations / (execution time x 1E6) but this figure has the same problems as MIPS. Assume that 40% of the instructions executed on both P1 and P2 are floating-point instructions. Find the MFLOPS figures for the processors.arrow_forwardProblem 4: Give a block diagram for a 8M x 32 memory using 512K x 8 memory ch book] [Hints: Figure 5.10 in thearrow_forwardI ONLY NEED 3 AND 4 Suppose memory has 256KB, OS use low address 20KB, there is one program sequence: Prog1 request 80KB, prog2 request 16KB, Prog3 request 140KB Prog1 finish, Prog3 finish; Prog4 request 80KB, Prog5 request 120kb Use first match and best match to deal with this sequence (from high address when allocated) (1)Draw allocation state when prog1,2,3 are loaded into memory? (2)Draw allocation state when prog1, 3 finish? (3)use these two algorithms to draw the structure of free queue after prog1 , 3 finish (4) Which algorithm is suitable for this sequence ? Describe the allocation process?arrow_forward
- 1.BL=00, after instruction DEC BL is executed, CF =? 2.CH=80H; after ROL CH, 1; CH=?arrow_forward4.22 [5] <§4.5> Consider the fragment of LEGv8 assembly below: STUR X16, [X6, #12] LDUR X16, [X6, #8] SUB X7, X5, X4 CBZ X7, Label ADD X5, X1, X4 SUB X5, X15, X4 Suppose we modify the pipeline so that it has only one memory (that handles both instructions and data). In this case, there will be a structural hazard every time a program needs to fetch an instruction during the same cycle in which another instruction accesses data. 4.22.1 [5] <§4.5> Draw a pipeline diagram to show were the code above will stall. 4.22.2 [5] <§4.5> In general, is it possible to reduce the number of stalls/NOPs resulting from this structural hazard by reordering code? 4.22.3 [5] <§4.5> Must this structural hazard be handled in hardware? We have seen that data hazards can be eliminated by adding NOPs to the code. Can you do the same with this structural hazard? If so, explain how. If not, explain why not. 4.22.4 [5] <§4.5> Approximately how many stalls would you expect this…arrow_forward4.18 [5] <COD §4.5> Assume that X1 is initialized to 11 and X2 is initialized to 22. Suppose you executed the code below on a version of the pipeline from COD Section 4.5 (An overview of pipelining) that does not handle data hazards (i.e., the programmer is responsible for addressing data hazards by inserting NOP instructions where necessary). What would the final values of registers X3 and X4 be? ADDI X1, X2, #5 ADD X3, X1, X2 ADDI X4, X1, #15arrow_forward
- A program has the following breakdown: 25% ld (50% of them directly followed by a dependent instruction),25% sd, 30% r_type, 20% beq (80% of them are taken. Branches are calculated in the third cycle. What is the average CPI of the program when run on the pipelined RISC V implementation in the textbook?arrow_forward15.a) Consider an instruction pipeline with four stages with the stage delays 5 nsec, 6 nsec, 11 nsec, and 8 nsec respectively. The delay of an inter-stage register stage of the pipeline is 1 nsec. What is the approximate speedup of the pipeline in the steady-state underideal conditions as compared to the corresponding non-pipelined implementation? b) Discuss structural hazards and control hazards with examples pls answer both subparrts,,its urgent thanksarrow_forwardI am trying to better understand memory access in computers, please answer the sample question below. Assume that the page table can held in registers of the MMU. It takes 8 ms (milliseconds)to service a page fault if there is an empty frame or if the replaced page is not altered, and20 ms if the replaced page is altered. Memory access time is 100 ns (nanoseconds). It has been empirically measured that the page to be replaced is altered 75% of the time.Obtain the maximum probability of page fault for an effective memory access time ≤ 200ns.arrow_forward
- (15pt) Assume that instruction cache miss rate is 2%, data cache miss rate is 10%, CPI (clock cycle per instruction) is 2 without any memory stall, and miss penalty is 100 cycles. In addition, assume that the frequency of loads/stores is 30%. (a) Compute CPI with memory stall. (b) When CPI without any memory stall becomes 1, compute CPI with memory stall. (c) If the CPU clock rate is doubled with the same memory when CPI without memory stall is 2, compute CPI with memory stall.arrow_forwardIn this exercise, we examine how resource hazards, control hazards, and Instruction Set Architecture (ISA) design can affect pipelined execution. Problems in this exercise refer to the following fragment of MIPS code: sw r16,12(r6)lw r16,8(r6)beq r5,r4,Label # Assume r5!=r4add r5,r1,r4slt r5,r15,r4 Assume that individual pipeline stages have the following latencies: IF ID EX MEM WB200ps 120ps 150ps 190ps 100ps 1.1, For this problem, assume that all branches are perfectly predicted (this eliminates all control hazards) and that no delay slots are used. If we only have one memory (for both instructions and data), there is a structural hazard every time we need to fetch an instruction in the same cycle in which another instruction accesses data. To guarantee forward progress, this hazard must always be resolved in favor of the instruction that accesses data. What is the total execution time of this instruction sequence in the 5-stage pipeline that only…arrow_forward1. In this exercise we examine in detail how an instruction is executed in a single cycle datapath. Problems in this exercise refer to a clock cycle in which the processor fetches the following instruction word: 10001100101001100000000000111000 Assume that the data memory is all zeros and that the processor’s registers have the following values at the beginning of the cycle in which the above instruction word is fetched: R0 R1 R2 R3 R4 R5 R6 R8 R12 R3 1 0 1 -2 4 -6 4 -10 -12 -14 31 a. What are the outputs of the sign-extend and the jump “Shift-Left-2” (near the top of the following Figure) for this instruction word? b. What are the values of ALU control unit’s inputs (ALUOp and Instruction operation) for this instruction? c. For the ALU and the two add units, what are their data input values? ALU Add (PC+4) Add (Branch) Input#1 Input#2 Input#1 Input#2 Input#1 Input#2arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- C++ for Engineers and ScientistsComputer ScienceISBN:9781133187844Author:Bronson, Gary J.Publisher:Course Technology Ptr
C++ for Engineers and Scientists
Computer Science
ISBN:9781133187844
Author:Bronson, Gary J.
Publisher:Course Technology Ptr