index = i0 + i1 * N0 + i2 * N0*N1 + i3 * N0*N1*N2 so if iterating over dimension 2, for i2 = 0, 1 2, ... index = i0 + i1*N0 + 0 + i3 * N0*N1*N2 = i0 + i1*N0 + 1*N0*N1 + i3 * N0*N1*N2 = i0 + i1*N0 + 2*N0*N1 + i3 * N0*N1*N2 so the step size is N0*N1, or more formally, the step size is: prod( N(1:(n-1)) ) but the ``base address'', i0 + i1*N0 + i3*N0*N1*N2 doesn't occupy some range 0 .. N, it occupies a pain in the arse range 0 .. N0*N1-1, N0*N1*N2 .. N0*N1*N2 + N0*N1-1, 2*N0*N1*N2 .. 2*N0*N1*N2 + N0*N1-1, etc or more formally i_end*prod(N(1:(end-1)) + (0 .. prod(N(1:(n-1)) big bit twiddly bit but the maximum value the base address takes is: N0-1 + (N1-1)*N0 + (N3-1)*N0*N1*N2 = N0*N1*N2*N3 - N0*N1*N2 + N0*N1 - 1 so how can we easily iterate over all these? we know there are 1 .. N0*N1*N3 rows to iterate over take j = 0 .. N0*N1*N3-1 then (j % (N0*N1)) can correspond to the twiddly bit and floor(j / (N0*N1)) * N0*N1*N2 can correspond to the big bit floor(j / (N0*N1)) is in the range 0 ... (N3-1) so floor(j / (N0*N1))*N0*N1*N2 is in the range 0 .. (N3-1)*N0*N1*N2 = 0 .. N0*N1*N2*N3 - N0*N1*N2 so floor(j / (N0*N1))*N0*N1*N2 + (j%(N0*N1)) is in the range 0 .. N0*N1*N2*N3 - N0*N1*N2 + N0*N1 - 1 so we should use base address = floor(j/(N0*N1))*N0*N1*N2 + (j%(N0*N1))