## October 30, 2009

### Mini-project: Basic Systolic Algorithms

Here are two very basic simulation applications for two systolic algorithms (aka. Instruction Systolic Arrays). Their main purpose is to illustrate how such systolic algorithms work on a given data set. Both simulators have been designed for the Systolic Algorithms course I've attended at my university.

Application no.1: Systolic Array Sorting

The first algorithm I have implemented aims to solve the simple problem of Systolic Array Sorting. The idea of the problem is the following: given an uni-dimensional systolic system large enough to store the entire contents of an arbitrary array, devise a method of sorting the array in linear time (i.e. algorithm complexity of O(n)).
The systolic system that solves this problem is extremely simple:

1.Each systolic cell can store at most two values (a base value and an auxiliary transport value).
2.Each cell communicates with its left and right neighbors (it can receive a value from the left neighbor and can pass a value to its right neighbor = data flow model). When a cell receives a value from the left it stores it as a transport value.
3.At each step (system tact) each systolic cell (that holds both a base and a transport value) takes a simple decision: it passes to its left neighbor the lowest value between the base value and the transport value and stores the highest of the two.If the cell holds only a base value then it does not take any action.
The systolic system is initialized by loading the initial unsorted array (each array element is loaded as the base value of a cell cell). The output of the system (i.e. the right-most cell) should be connected to a data structure where we wish to store the sorted array.
.
At each step we simply input into the systolic system a predefined maximal value (i.e. a value greater than any value in the unsorted array) by "communicating" it to the left-most cell of the systolic system.
.
The entire key (and idea behind any systolic algorithm) is that the "communication" process is parallel.
.
If we have an array of size n the sorting process will take exactly 2*n steps which translates into a complexity of O(n).

Application no.2: Systolic Matrix Multiplication

The second application simulates systolic matrix multiplication. The systolic architecture needed to solve this problem is a bit more complicated than the one used in the first example. The cells are organized as a matrix (the size of which is equal to the one of the expected result matrix). Each cell can communicate with its 4 neighbors (receives values from the cells to the left and above and sends them to the cells to the right and below). Also, the individual organization of each cell is more complicated, as each cell can hold 3 values and execute 2 operations (addition and multiplication). For a detailed explanation of the systolic matrix multiplication algorithm please see this page.
.
The complexity of this systolic algorithm is O(n) as it would take 3*n steps to multiply two matrices of size nxn. In order to simplify the observation of the process, in the simulation application, each step is further divided into 3 sub-operations.

Although fairly simple, these two examples can easily show the power of systolic architectures as they offer linear solutions to problems which normally have a higher complexity (O(n*log(n)) for sorting and O(n^2.376) for matrix multiplication)).
.