accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dylan Hutchison <dhutc...@cs.washington.edu>
Subject Re: Making a RowCounterIterator
Date Fri, 15 Jul 2016 20:27:38 GMT
Hi Mario,
  You can reuse or adapt the RowCountingIterator
<https://github.com/Accla/graphulo/blob/master/src/main/java/edu/mit/ll/graphulo/skvi/RowCountingIterator.java>
code here.

The main trick is understanding how each tablet needs to emit a row within
its seek range.  An iterator should not emit an entry whose row lies
outside the seek range of the tablet the iterator is running on.  Instead,
you can emit *partial sums* whose row stays within the seek range.  Each
tablet server communicates one partial sum.  Then sum the partial sums at
the client.  (I am probably mixing up tablet vs. tablet server.)

Cheers, Dylan


On Fri, Jul 15, 2016 at 1:02 PM, William Slacum <wslacum@gmail.com> wrote:

> The iterator in the gist also counts cells/entries/KV pairs, not unique
> rows. You'll want to have some way to skip to the next row value if you
> want the count to be reflective of the number of rows being read.
>
> On Fri, Jul 15, 2016 at 3:34 PM, Shawn Walker <accumulo@shawn-walker.net>
> wrote:
>
>> My read is that you're mistaking the sequence of calls Accumulo will be
>> making to your iterator.  The sequence isn't quite the same as a Java
>> iterator (initially positioned "before" the first element), and is more
>> like a C++ iterator:
>>
>> 0. Accumulo calls seek(...)
>> 1. Is there more data? Accumulo calls hasTop(). You return yes.
>> 2. Ok, so there's data.  Accumulo calls getTopKey(), getTopValue() to
>> retrieve the data. You return a key indicating 0 columns seen (since next()
>> hasn't yet been called)
>> 3. First datum done, Accumulo calls next()
>> ...
>>
>> I imagine that if you pull the second item out of your scan result, it'll
>> have the number you expect.  Alternately, you might consider performing the
>> count computation during an override of the seek(...) method, instead of in
>> the next(...) method.
>>
>> --
>> Shawn Walker
>>
>>
>>
>> On Fri, Jul 15, 2016 at 2:24 PM, Mario Pastorelli <
>> mario.pastorelli@teralytics.ch> wrote:
>>
>>> I'm trying to create a RowCounterIterator that counts all the rows and
>>> returns only one key-value with the counter inside. The problem is that I
>>> can't get it work. The Scala code is available in the gist
>>> <https://gist.github.com/melrief/5f2ca248f1a980ddead2f2eeb19e6389>
>>> together with some pseudo-code of a test. The problem is that if I add an
>>> entry to my table, this iterator will return 0 instead of 1 and apparently
>>> the reason is that super.hasTop() is always false. I've tried without the
>>> iterator and the scanner returns 1 elements. Any idea of what I'm doing
>>> wrong here? Is WrappingIterator the right class to extend for this kind of
>>> behaviour?
>>>
>>> Thanks,
>>> Mario
>>>
>>> --
>>> Mario Pastorelli | TERALYTICS
>>>
>>> *software engineer*
>>>
>>> Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
>>> phone: +41794381682
>>> email: mario.pastorelli@teralytics.ch
>>> www.teralytics.net
>>>
>>> Company registration number: CH-020.3.037.709-7 | Trade register Canton
>>> Zurich
>>> Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz,
>>> Yann de Vries
>>>
>>> This e-mail message contains confidential information which is for the
>>> sole attention and use of the intended recipient. Please notify us at once
>>> if you think that it may not be intended for you and delete it immediately.
>>>
>>
>>
>

Mime
View raw message