Optimizing tensor multiplication functions including - adjusting the memory access according to the storage format for better spatial data locality - parallelizing with openmp for parallel execution