accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mario Pastorelli <mario.pastore...@teralytics.ch>
Subject Re: Making a RowCounterIterator
Date Fri, 15 Jul 2016 22:20:19 GMT
I'm actually using this after a wholerowiterator, which is used to filter
rows with the same rowId.

On Fri, Jul 15, 2016 at 10:02 PM, William Slacum <wslacum@gmail.com> wrote:

> The iterator in the gist also counts cells/entries/KV pairs, not unique
> rows. You'll want to have some way to skip to the next row value if you
> want the count to be reflective of the number of rows being read.
>
> On Fri, Jul 15, 2016 at 3:34 PM, Shawn Walker <accumulo@shawn-walker.net>
> wrote:
>
>> My read is that you're mistaking the sequence of calls Accumulo will be
>> making to your iterator.  The sequence isn't quite the same as a Java
>> iterator (initially positioned "before" the first element), and is more
>> like a C++ iterator:
>>
>> 0. Accumulo calls seek(...)
>> 1. Is there more data? Accumulo calls hasTop(). You return yes.
>> 2. Ok, so there's data.  Accumulo calls getTopKey(), getTopValue() to
>> retrieve the data. You return a key indicating 0 columns seen (since next()
>> hasn't yet been called)
>> 3. First datum done, Accumulo calls next()
>> ...
>>
>> I imagine that if you pull the second item out of your scan result, it'll
>> have the number you expect.  Alternately, you might consider performing the
>> count computation during an override of the seek(...) method, instead of in
>> the next(...) method.
>>
>> --
>> Shawn Walker
>>
>>
>>
>> On Fri, Jul 15, 2016 at 2:24 PM, Mario Pastorelli <
>> mario.pastorelli@teralytics.ch> wrote:
>>
>>> I'm trying to create a RowCounterIterator that counts all the rows and
>>> returns only one key-value with the counter inside. The problem is that I
>>> can't get it work. The Scala code is available in the gist
>>> <https://gist.github.com/melrief/5f2ca248f1a980ddead2f2eeb19e6389>
>>> together with some pseudo-code of a test. The problem is that if I add an
>>> entry to my table, this iterator will return 0 instead of 1 and apparently
>>> the reason is that super.hasTop() is always false. I've tried without the
>>> iterator and the scanner returns 1 elements. Any idea of what I'm doing
>>> wrong here? Is WrappingIterator the right class to extend for this kind of
>>> behaviour?
>>>
>>> Thanks,
>>> Mario
>>>
>>> --
>>> Mario Pastorelli | TERALYTICS
>>>
>>> *software engineer*
>>>
>>> Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
>>> phone: +41794381682
>>> email: mario.pastorelli@teralytics.ch
>>> www.teralytics.net
>>>
>>> Company registration number: CH-020.3.037.709-7 | Trade register Canton
>>> Zurich
>>> Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz,
>>> Yann de Vries
>>>
>>> This e-mail message contains confidential information which is for the
>>> sole attention and use of the intended recipient. Please notify us at once
>>> if you think that it may not be intended for you and delete it immediately.
>>>
>>
>>
>


-- 
Mario Pastorelli | TERALYTICS

*software engineer*

Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
phone: +41794381682
email: mario.pastorelli@teralytics.ch
www.teralytics.net

Company registration number: CH-020.3.037.709-7 | Trade register Canton
Zurich
Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz, Yann
de Vries

This e-mail message contains confidential information which is for the sole
attention and use of the intended recipient. Please notify us at once if
you think that it may not be intended for you and delete it immediately.

Mime
View raw message