In this article we will discuss the parallel matrix product, a simple yet efficient parallel algorithm for the product of two matrices. This algorithm is used a lot so its a good idea to make it parallel.
Here we can see the code:.
This fpr introduces more complexity than the codes analyzed in the pdogram articles about parallell computation of PI and parallel computation of matrix-vector product.
The write a parallel program for matrix multiplication issues come from memory accesses that do not exploit data locality. More specifically, the computation of PI uses only scalar variables, homework guidelines nea is not any use of large arrays.
The matrix-vector product uses arrays that are accessed preserving memory locality. Here in the matrix-matrix product, we are addressing arrays of two dimensions that are not accessed preserving data locality, and thus exhibit the following memory problems that impact on performance.
In the code presented above we notice that we are implementing the basic matrix product accumulating the products of rows and columns into the result matrix a. As we multiply rows and columns the major problem on this code appears. How can we perform an optimized memory access ffor the same time for matrices b and c? Assume that the matrices are stored in memory in row-major order.
We access matrix a by rows and matrix b by columns, so we use antagonic memory accesses here. The key point is that if we optimize one matrix memory access, we get a bad memory access in the other matrix. The same holds for column-major storage.
Table of Contents
Apart from memory and locality issues, how can we obtain better performance from this code? The answer is make it parallel. Here we can see the parallel implementation using OpenMP progrzm.
Code performance tips for your inbox.]