I have an Accumulo table and I want to efficiently compute a
listranking of the rowIds, more precisely, let r(i) for i=0 to n1 be
the ordered (sorted) rowids in the table, I want to create a column,
say numId, such that for each row r(i) the value in column numId is i.
I would like to do this in one map round. To do so I need to get the
total number of rows in each tablet and an id for the tablet that has
an ordering (sorting) consistent with the rowIds ordering. Assume
there are t tablets and (tabId(j),size(j)) for j=0 to t1 is the
tablet id and row size, then I can compute a prefix sum of size(j),
such as (tabId(0), 0), (tabId(1), size(0), (tabId(2),
size(0)+size(1)), and so on; let offset(j) be the offset computed for
tabId(j). So I can then use t mappers to set the numId in the rows to
offset(j), offset(j) + 1, …
I can do all of this in two map rounds by using the first rowId in
each tablet as the tabId and simply counting the number of rows, but I
was hoping to avoid having to do this.
Can I get the size of each tablet and an id for it consistent with the
row ordering? If so, how?
