accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher <ctubb...@apache.org>
Subject Re: Making a RowCounterIterator
Date Fri, 15 Jul 2016 22:18:05 GMT
Dylan, that would make a great contribution to Accumulo :)

On Fri, Jul 15, 2016, 16:28 Dylan Hutchison <dhutchis@cs.washington.edu>
wrote:

> Hi Mario,
>   You can reuse or adapt the RowCountingIterator
> <https://github.com/Accla/graphulo/blob/master/src/main/java/edu/mit/ll/graphulo/skvi/RowCountingIterator.java>
> code here.
>
> The main trick is understanding how each tablet needs to emit a row within
> its seek range.  An iterator should not emit an entry whose row lies
> outside the seek range of the tablet the iterator is running on.  Instead,
> you can emit *partial sums* whose row stays within the seek range.  Each
> tablet server communicates one partial sum.  Then sum the partial sums at
> the client.  (I am probably mixing up tablet vs. tablet server.)
>
> Cheers, Dylan
>
>
> On Fri, Jul 15, 2016 at 1:02 PM, William Slacum <wslacum@gmail.com> wrote:
>
>> The iterator in the gist also counts cells/entries/KV pairs, not unique
>> rows. You'll want to have some way to skip to the next row value if you
>> want the count to be reflective of the number of rows being read.
>>
>> On Fri, Jul 15, 2016 at 3:34 PM, Shawn Walker <accumulo@shawn-walker.net>
>> wrote:
>>
>>> My read is that you're mistaking the sequence of calls Accumulo will be
>>> making to your iterator.  The sequence isn't quite the same as a Java
>>> iterator (initially positioned "before" the first element), and is more
>>> like a C++ iterator:
>>>
>>> 0. Accumulo calls seek(...)
>>> 1. Is there more data? Accumulo calls hasTop(). You return yes.
>>> 2. Ok, so there's data.  Accumulo calls getTopKey(), getTopValue() to
>>> retrieve the data. You return a key indicating 0 columns seen (since next()
>>> hasn't yet been called)
>>> 3. First datum done, Accumulo calls next()
>>> ...
>>>
>>> I imagine that if you pull the second item out of your scan result,
>>> it'll have the number you expect.  Alternately, you might consider
>>> performing the count computation during an override of the seek(...)
>>> method, instead of in the next(...) method.
>>>
>>> --
>>> Shawn Walker
>>>
>>>
>>>
>>> On Fri, Jul 15, 2016 at 2:24 PM, Mario Pastorelli <
>>> mario.pastorelli@teralytics.ch> wrote:
>>>
>>>> I'm trying to create a RowCounterIterator that counts all the rows and
>>>> returns only one key-value with the counter inside. The problem is that I
>>>> can't get it work. The Scala code is available in the gist
>>>> <https://gist.github.com/melrief/5f2ca248f1a980ddead2f2eeb19e6389>
>>>> together with some pseudo-code of a test. The problem is that if I add an
>>>> entry to my table, this iterator will return 0 instead of 1 and apparently
>>>> the reason is that super.hasTop() is always false. I've tried without the
>>>> iterator and the scanner returns 1 elements. Any idea of what I'm doing
>>>> wrong here? Is WrappingIterator the right class to extend for this kind of
>>>> behaviour?
>>>>
>>>> Thanks,
>>>> Mario
>>>>
>>>> --
>>>> Mario Pastorelli | TERALYTICS
>>>>
>>>> *software engineer*
>>>>
>>>> Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
>>>> phone: +41794381682
>>>> email: mario.pastorelli@teralytics.ch
>>>> www.teralytics.net
>>>>
>>>> Company registration number: CH-020.3.037.709-7 | Trade register Canton
>>>> Zurich
>>>> Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz,
>>>> Yann de Vries
>>>>
>>>> This e-mail message contains confidential information which is for the
>>>> sole attention and use of the intended recipient. Please notify us at once
>>>> if you think that it may not be intended for you and delete it immediately.
>>>>
>>>
>>>
>>
>

Mime
View raw message