accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Kubina <jeff.kub...@gmail.com>
Subject How to efficiently compute a list-ranking of rows in Accumulo?
Date Fri, 12 Oct 2012 15:38:25 GMT
I have an Accumulo table and I want to efficiently compute a
list-ranking of the rowIds, more precisely, let r(i) for i=0 to n-1 be
the ordered (sorted) rowids in the table, I want to create a column,
say numId, such that for each row r(i) the value in column numId is i.

I would like to do this in one map round. To do so I need to get the
total number of rows in each tablet and an id for the tablet that has
an ordering (sorting) consistent with the rowIds ordering. Assume
there are t tablets and (tabId(j),size(j)) for j=0 to t-1 is the
tablet id and row size, then I can compute a prefix sum of size(j),
such as (tabId(0), 0), (tabId(1), size(0), (tabId(2),
size(0)+size(1)), and so on; let offset(j) be the offset computed for
tabId(j). So I can then use t mappers to set the numId in the rows to
offset(j), offset(j) + 1, …

I can do all of this in two map rounds by using the first rowId in
each tablet as the tabId and simply counting the number of rows, but I
was hoping to avoid having to do this.

Can I get the size of each tablet and an id for it consistent with the
row ordering? If so, how?

Mime
View raw message