Efficient Calculation of Upper Triangle from Vector
Using R, and Rcpp, efficiently calculate the upper-triangle of the matrix created by taking the dot-product of a single vector by its transposed self.
Given the following vector and its transposition
$$A = \begin{bmatrix}1 & 2 & 3 & 4\end{bmatrix}$$
$$B = A^T = \begin{bmatrix}1\\ 2\\ 3\\ 4\end{bmatrix}$$
The result of the dot-product:
$$A \bullet B = \begin{bmatrix}
1 & \pmb{2} & \pmb{3} & \pmb{4}\\
2 & 4 & \pmb{6} & \pmb{8}\\
3 & 6 & 9 & \pmb{12}\\
4 & 8 & 12 & 16\end{bmatrix}$$
The upper-triangle of the resulting matrix is indicating by those numbers in bold.
$$U = \begin{bmatrix}2 & 3 & 6 & 4 & 8 & 12\end{bmatrix}$$
Now, there are two common, base functions in R to compute this matrix and return its upper-triangle: %*%
and tcrossprod(x)
Here are a couple wrapper functions, that return the resulting upper-triangle (using upper.tri()
):
R.upper.tri <- function(v) {
matrix = v %*% t(v)
matrix[upper.tri(matrix)]
}
R.upper.tri.2 <- function(v) {
matrix = tcrossprod(v)
matrix[upper.tri(matrix)]
}
And the results:
> R.upper.tri(c(1,2,3,4))
[1] 2 3 6 4 8 12
> R.upper.tri.2(c(1,2,3,4))
[1] 2 3 6 4 8 12
You see we get exactly what we were looking for as U
.
Now, both of these are very straight-forward and perfect for standard use cases. However, when it comes to performing this operation frequently, over larger vectors, the potential for performance improvements becomes increasingly noticeable.
# Using random numbers in the range 1:10, of increasing lengths, tested 100 times each:
> microbenchmark::microbenchmark(
+ R.upper.tri(runif(3, 1, 10)),
+ R.upper.tri(runif(10, 1, 10)),
+ R.upper.tri(runif(100, 1, 10)),
+ R.upper.tri(runif(300, 1, 10))
+ ,times = 100
+ )
Unit: microseconds
expr min lq mean median uq max neval cld
R.upper.tri(runif(3, 1, 10)) 15.261 17.4010 32.96587 20.572 37.4495 146.260 100 a
R.upper.tri(runif(10, 1, 10)) 17.245 19.3895 37.44702 23.358 46.8425 140.080 100 a
R.upper.tri(runif(100, 1, 10)) 89.000 95.2420 223.28620 114.420 170.8850 6603.911 100 a
R.upper.tri(runif(300, 1, 10)) 785.662 1913.4350 2201.49373 2028.131 2123.3885 8881.646 100 b
The first place I tried was to implement a simple Rcpp function using Armadillo’s matrix functions:
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
#include <Rcpp.h>
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]
arma::rowvec vector_to_ut_mul(arma::vec v) {
arma::mat mat = v * trans(v);
mat.diag().zeros();
return(trans(mat.elem(find(trans(trimatl(mat))))));
}
Test our vector A
:
> vector_to_ut_mul(c(1,2,3,4))
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 2 3 6 4 8 12
Now the same performance tests:
> microbenchmark::microbenchmark(
+ vector_to_ut_mul(runif(3, 1, 10)),
+ vector_to_ut_mul(runif(10, 1, 10)),
+ vector_to_ut_mul(runif(100, 1, 10)),
+ vector_to_ut_mul(runif(300, 1, 10)),
+ times = 100
+ )
Unit: microseconds
expr min lq mean median uq max neval cld
vector_to_ut_mul(runif(3, 1, 10)) 3.449 4.0235 5.36643 4.7860 5.6395 34.781 100 a
vector_to_ut_mul(runif(10, 1, 10)) 4.688 5.7665 7.47304 6.6505 7.7330 28.154 100 a
vector_to_ut_mul(runif(100, 1, 10)) 41.205 44.6345 49.28298 47.0500 50.9950 143.942 100 b
vector_to_ut_mul(runif(300, 1, 10)) 359.396 535.3850 580.06845 567.2695 602.2450 1000.952 100 c
So there is some noticeable improvements, but in the larger vectors we see the performance, expectedly, start to degrade.
So to make it faster, we can actually skip performing the full multiplication, rather iterate the vector itself only multiplying the necessary indices that make up the upper-triangle of the would-be matrix.
#include <RcppArmadillo.h>
using namespace Rcpp;
using namespace arma;
//' Calculates the upper triangle of a vector of integers if it
//' were converted to a matrix. This actually skips creating the
//' matrix, by only multiplying the necesseary indices of the
//' vector.
//'
//' @param v - A vector of integers
//' @export
// [[Rcpp::export]]
std::vector<int> vector_to_ut(std::vector<int> v) {
int vL = v.size();
int vS = ( (vL * (vL + 1)) / 2) - vL ;
int s = 0;
std::vector<int> vR( vS );
for( int i = 2; i <= vL; i++ ) {
for (int j = 0; j < i-1; j++ ) {
vR[s] = v[j] * v[i-1];
s++;
}
}
return vR;
}
Again, test against our vector A
:
> vector_to_ut(c(1,2,3,4))
[1] 2 3 6 4 8 12
Again with the same performance tests:
Unit: microseconds
expr min lq mean median uq max neval cld
vector_to_ut(runif(3, 1, 10)) 3.369 4.0515 10.44103 5.4200 7.6670 199.673 100 ab
vector_to_ut(runif(10, 1, 10)) 3.738 4.4430 8.00061 6.0465 9.9895 40.958 100 a
vector_to_ut(runif(100, 1, 10)) 10.069 12.4600 22.76412 17.4190 24.0350 112.748 100 b
vector_to_ut(runif(300, 1, 10)) 124.615 139.1790 175.37466 153.3485 195.0290 689.874 100 c
And the results of the larger operations, all together for comparison:
benchmark_ut(1:300,1000)
Unit: microseconds
expr min lq mean median uq max neval
R.upper.tri(v) 668.556 1291.3745 2483.8589 1616.2400 2126.235 112633.59 1000
R.upper.tri.2(v) 674.230 1306.0820 2428.9174 1645.2025 2224.076 69641.66 1000
vector_to_ut_mul(v) 342.695 628.1725 1375.3058 1101.8775 1358.649 16224.55 1000
vector_to_ut(v) 42.420 118.7545 237.3842 190.1145 224.825 17368.39 1000