ExDataScientia

Three ways of implementing a loop

Loops are an essential part of many programming applications, from simple file-operation algorithms to complex numerical models. While inefficient, some operations clearly depend on the use of loops.

Loops may be written in a diverse range of ways, and finding a method that is convenient for you can make everyday programming business a lot easier.

The most classic way of writing a (nested) loop consists of writing several paragraphs of the initializing statement of the nested loops, with the nesting levels indicated via indentation. This is followed by one or more paragraphs that perform an operation on the accordingly indexed objects. In case local objects need to be created on a lower nesting level, the generating operations are of course inserted between the initializing paragraphs at the specific nesting position. In some programming languages, like R, several paragraphs of "closing statements" follow the final line of operations: These are simply the closing brackets corresponding to the opening brackets of the initializing statements. If lower-level operations are required to follow the highest-level operations, these are inserted at the according position between the closing statements (or simply added as new paragraphs at the correct indentation level, as in Python).

In the following example the cells of an empty three-dimensional array are filled step by step with the product of two scalars (indexed elements of two corresponding vectors) with a vector:

The logic behind this is easy to understand: "For each element indexed with "i" of the first dimension and for each element indexed with "j" on the second dimension do something". This line of argumentation is very well represented by the paragraph-based writing style. However, it is also a style that requires a lot of space, and can be tedious to write.

A much more elegant, though somewhat less intuitive way of writing loops is supported by Python: Here, we simply work with nested lists of indexed operations. We first name the operation to be done, and then supply the "initializing statement" known from the paragraph-based style. The inner-most list contains the operations to be performed at the highest level of nesting, while the outer-most list performs operations at the lowest level of nesting. The logic behind this approach can be expressed as follows: "do something for each element (indexed with j) within the subset indexed with i for each subset (indexed with i) within the full set". This approach has the disadvantage that we can't perform assignments within the loop (that is, we are not allowed to use the "=" in the loop). But we can generate a whole numpy array from the "loop object":, which formally is just a nested list. If you have ever looked at a multi-dimensional array in the variable explorer of the Spyder editor, you may have noticed that these are always structured as nested lists!

Again, an array is filled with the products of two scalars and one vector. As described above, in this case, we don't declare an empty array before applying the loop. Instead, we simply transform the resulting arrangement of nested lists into a numpy array. If you replicate the code, you will notice that it is identical to the arrays generated with the loops above:

There is no direct equivalent to this in R. The closest analogue is the application of some of the many "apply" functions. Vapply returns a vector as a result of a looping operation. The logic here is: "here we have a list of indices, and for each of these, apply a function that incorporates an indexed object". Indeed, the first argument passed to vapply is a vector of indices, and the second argument is a function that requires as input an index. This function contains the operation or sequence of operations that we want to apply on the indexed object. Vapply requires one further argument, the "FUN.VALUE", where we have to state the length (vapply can only return vectors, not matrices or arrays) and the data type returned by the function. This can be checked easily by running the function once outside of the vapply context with a random index value. As with the nested-list loops in Python, we can't use an assignment operator within vapply, which limits its use to some extent.

If we want to create our usual array using vapply, we have to nest several vapply calls, as we did with the nested lists in Python. A higher-level vapply call is then the function of the next-lower-level vapply call, and so on. As with the nested lists, where each list operated with its specific index letter, here every instance of vapply is connected to a specific index letter. Nested vapply calls still return one long vector. We can reshape this vector by changing the "dim" attribute of this vector, but in order to get the values in the correct arrangement, we have to build or nestings in the correct order. Here, the highest array dimension (or "axis") corresponds to the inner-most vapply call, and the lowest dimension to the outer-most call. Remember that when working with matrices (two-dimensional arrays), the rows represent the lower dimension, and the columns the higher dimension. When implementing it like the following, the resulting array will be identical to the array generated above:

There are also "apply" functions that return lists ("lapply") or matrices ("mapply"), but these are not of interest to us for the moment.

One final way of implementing a loop can be achieved in Python by writing self-referential functions. These are functions that require an index as input, perform one or more operations on indexed objects, and then increase the provided index value by one. Then they continue by calling themselves, providing the updated index value as the required argument. The result is that the function will now repeat the operation(s) just completed, but now with the differently-indexed object. Without incorporating some kind of break, such a self-referential function might, in some cases, run indefinitely (or until some internal control puts an end), or throw an error message, depending on the context of its application. If we declare an empty array of fixed dimensions to be filled, we will get an error break when the index created by the function goes beyond the limits of the corresponding array dimension. To avoid the printing of this error message, we can simply add a "try-except-pass" statement to the function.

To generate our usual array using self-referential functions, we actually require a function nested within a function: One iterates along index j until the end of the dimension is reached (and "pass" is called instead of the index-based operation), the other iterates along i. To start the entire complex, we simply have to call the outermost function and passing 0 as the first index value. The inner function is called like this, as well, as the first operation of the outer function following the declaration of the inner function, before the update of the outer index and the self-call:

my_vec_1 = np.random.choice(np.arange(0,100,1), 20)
my_vec_2 = np.random.choice(np.arange(0,100,1), 20)
my_vec_3 = np.random.choice(np.arange(0,100,1), 3)

my_arr = np.zeros([20,20,3])

def self_ref_out(i):
    def self_ref_in(j):
        my_arr[i,j] = my_vec_1[i] * my_vec_2[j] * my_vec_3
        j = j+1
        try:self_ref_in(j)
        except:pass
    self_ref_in(0)
    i += 1
    try:self_ref_out(i)
    except:pass

self_ref_out(0)

Ex Data, Scientia