i mean, before we consider hardware based implementations for bigger
matrices, this change seems like a very easy win
On Fri, Apr 17, 2015 at 5:26 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
> Spent an hour on this today.
>
> What i am doing: simply reimplementing pairwise dotproduct algorithm in
> stock dense matrix times().
>
> However, equipping every matrix with structure "flavor" (i.e. dense(...)
> reports rowwise , and dense(...).t reports column wise, dense().t.t
> reports rowwise again, etc.)
>
> Next, wrote a binary operator that switches on combination of operand
> orientation and flips misaligned operand(s) (if any) to match most "speedy"
> orientation RWCW. here are result for 300x300 dense matrix pairs:
>
> Ad %*% Bd: (107.125,46.375)
> Ad' %*% Bd: (206.475,39.325)
> Ad %*% Bd': (37.2,42.65)
> Ad' %*% Bd': (100.95,38.025)
> Ad'' %*% Bd'': (120.125,43.3)
>
> these results are for transpose combinations of original 300x300 dense
> random matrices, averaged over 40 runs (so standard error should be well
> controlled), in ms. First number is stock times() application (i.e. what
> we'd do with %*% operator now), and second number is ms with rewriting
> matrices into RWCW orientation.
>
> For example, AB reorients B only, just like A''B'', AB' reorients nothing,
> and worst case A'B reorients both (I also tried to run sum of outer
> products for A'B case without reorientation  apparently L1 misses far
> outweigh costs of reorientation there, i got very bad results there for
> outer product sum).
>
> as we can see, stock times() version does pretty bad for even dense
> operands for any orientation except for the optimal.
>
> Given that, i am inclined just to add orientationdriven structure
> optimization here and replace all stock calls with just orientation
> adjustment.
>
> Of course i will need to append this matrix with sparse and sparse row
> matrix combination (quite a bit of those i guess) and see what happens
> compared to stock sparse multiplications.
>
> But even that seems like a big win to me (basically, just doing
> reorientation optimization seems to give 3x speed up on average in
> matrixmatrix multiplication in 3 cases out of 4, and ties in 1 case).
>
