Please written by computer source   Question 1:   Suppose we wish to write a procedure that computes the inner product of two vectors u and v. An abstract version of the function has a CPE of 14–18 with x86- 64 for different types of integer and floating-point data. By doing the same sort of transformations we did to transform the abstract program combine1 into the more efficient combine4, we get the following code:     Our measurements show that this function has CPEs of 1.50 for integer data and 3.00 for floating-point data. For data type double, the x86-64 assembly code for the inner loop is as follows:   Assume that the functional units have the characteristics listed in Figure 5.12.   **See last page for figures   A. Diagram how this instruction sequence would be decoded into operations and show how the data dependencies between them would create a critical path of operations, in the style of textbook Figures 5.13 and 5.14.         vmovsd vmovsd vmulsd vaddsd           | | | |           V V V V   Get udata(i) Load vdata(i) Multiply Add to sum                      |                                  V |                    5. Increment i |                                    V                                   6. Compare limit     B. For data type double, what lower bound on the CPE is determined by the critical path?   C. Assuming similar instruction sequences for the integer code as well, what lower bound on the CPE is determined by the critical path for integer data?   D. Explain how the floating-point versions can have CPEs of 3.00, even though the multiplication operation requires 5 clock cycles.   The processor can issue one multiplication per cycle if there are no data dependencies between the multiplications. Processors also have multiple functional units for performing floating-point operations, which can further increase the parallelism and reduce the latency of the critical path.   Question 2:   Write a version of the inner product procedure described in Question 1 that uses 6 × 6 loop unrolling. Our measurements for this function with x86-64 give a CPE of 1.06 for integer data and 1.01 for floating-point data.   What factor limits the performance to a CPE of 1.00?   Question 3:   Write a version of the inner product procedure described in Question 1 that uses 6 × 1a loop unrolling to enable greater parallelism. Our measurements for this function give a CPE of 1.10 for integer data and 1.05 for floating-point data.

C++ Programming: From Problem Analysis to Program Design
8th Edition
ISBN:9781337102087
Author:D. S. Malik
Publisher:D. S. Malik
Chapter15: Recursion
Section: Chapter Questions
Problem 8SA
icon
Related questions
Question

Please written by computer source

 

Question 1:

 

Suppose we wish to write a procedure that computes the inner product of two vectors u and v. An abstract version of the function has a CPE of 14–18 with x86- 64 for different types of integer and floating-point data. By doing the same sort of transformations we did to transform the abstract program combine1 into the more efficient combine4, we get the following code:

 

 

Our measurements show that this function has CPEs of 1.50 for integer data and 3.00 for floating-point data. For data type double, the x86-64 assembly code for the inner loop is as follows:

 

Assume that the functional units have the characteristics listed in Figure 5.12.

 

**See last page for figures

 

A. Diagram how this instruction sequence would be decoded into operations and show how the data dependencies between them would create a critical path of operations, in the style of textbook Figures 5.13 and 5.14.

 

      vmovsd vmovsd vmulsd vaddsd

 

        | | | |

 

        V V V V

 

Get udata(i) Load vdata(i) Multiply Add to sum

 

                   |            

 

                   V |

 

                 5. Increment i |

 

                                 V

 

                                6. Compare limit

 

 

B. For data type double, what lower bound on the CPE is determined by the critical path?

 

C. Assuming similar instruction sequences for the integer code as well, what lower bound on the CPE is determined by the critical path for integer data?

 

D. Explain how the floating-point versions can have CPEs of 3.00, even though the multiplication operation requires 5 clock cycles.

 

The processor can issue one multiplication per cycle if there are no data dependencies between the multiplications. Processors also have multiple functional units for performing floating-point operations, which can further increase the parallelism and reduce the latency of the critical path.

 

Question 2:

 

Write a version of the inner product procedure described in Question 1 that uses 6 × 6 loop unrolling. Our measurements for this function with x86-64 give a CPE of 1.06 for integer data and 1.01 for floating-point data.

 

What factor limits the performance to a CPE of 1.00?

 

Question 3:

 

Write a version of the inner product procedure described in Question 1 that uses 6 × 1a loop unrolling to enable greater parallelism. Our measurements for this function give a CPE of 1.10 for integer data and 1.05 for floating-point data.

Expert Solution
trending now

Trending now

This is a popular solution!

steps

Step by step

Solved in 4 steps

Blurred answer
Knowledge Booster
Functions
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.
Similar questions
  • SEE MORE QUESTIONS
Recommended textbooks for you
C++ Programming: From Problem Analysis to Program…
C++ Programming: From Problem Analysis to Program…
Computer Science
ISBN:
9781337102087
Author:
D. S. Malik
Publisher:
Cengage Learning