accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher <ctubb...@apache.org>
Subject Re: Making a RowCounterIterator
Date Fri, 15 Jul 2016 22:29:07 GMT
It'd be more efficient to use the FirstEntryInRowIterator to just grab one
each, rather than the WholeRowIterator which could use up a lot of memory
unnecessarily.

On Fri, Jul 15, 2016 at 6:20 PM Mario Pastorelli <
mario.pastorelli@teralytics.ch> wrote:

> I'm actually using this after a wholerowiterator, which is used to filter
> rows with the same rowId.
>
> On Fri, Jul 15, 2016 at 10:02 PM, William Slacum <wslacum@gmail.com>
> wrote:
>
>> The iterator in the gist also counts cells/entries/KV pairs, not unique
>> rows. You'll want to have some way to skip to the next row value if you
>> want the count to be reflective of the number of rows being read.
>>
>> On Fri, Jul 15, 2016 at 3:34 PM, Shawn Walker <accumulo@shawn-walker.net>
>> wrote:
>>
>>> My read is that you're mistaking the sequence of calls Accumulo will be
>>> making to your iterator.  The sequence isn't quite the same as a Java
>>> iterator (initially positioned "before" the first element), and is more
>>> like a C++ iterator:
>>>
>>> 0. Accumulo calls seek(...)
>>> 1. Is there more data? Accumulo calls hasTop(). You return yes.
>>> 2. Ok, so there's data.  Accumulo calls getTopKey(), getTopValue() to
>>> retrieve the data. You return a key indicating 0 columns seen (since next()
>>> hasn't yet been called)
>>> 3. First datum done, Accumulo calls next()
>>> ...
>>>
>>> I imagine that if you pull the second item out of your scan result,
>>> it'll have the number you expect.  Alternately, you might consider
>>> performing the count computation during an override of the seek(...)
>>> method, instead of in the next(...) method.
>>>
>>> --
>>> Shawn Walker
>>>
>>>
>>>
>>> On Fri, Jul 15, 2016 at 2:24 PM, Mario Pastorelli <
>>> mario.pastorelli@teralytics.ch> wrote:
>>>
>>>> I'm trying to create a RowCounterIterator that counts all the rows and
>>>> returns only one key-value with the counter inside. The problem is that I
>>>> can't get it work. The Scala code is available in the gist
>>>> <https://gist.github.com/melrief/5f2ca248f1a980ddead2f2eeb19e6389>
>>>> together with some pseudo-code of a test. The problem is that if I add an
>>>> entry to my table, this iterator will return 0 instead of 1 and apparently
>>>> the reason is that super.hasTop() is always false. I've tried without the
>>>> iterator and the scanner returns 1 elements. Any idea of what I'm doing
>>>> wrong here? Is WrappingIterator the right class to extend for this kind of
>>>> behaviour?
>>>>
>>>> Thanks,
>>>> Mario
>>>>
>>>> --
>>>> Mario Pastorelli | TERALYTICS
>>>>
>>>> *software engineer*
>>>>
>>>> Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
>>>> phone: +41794381682
>>>> email: mario.pastorelli@teralytics.ch
>>>> www.teralytics.net
>>>>
>>>> Company registration number: CH-020.3.037.709-7 | Trade register Canton
>>>> Zurich
>>>> Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz,
>>>> Yann de Vries
>>>>
>>>> This e-mail message contains confidential information which is for the
>>>> sole attention and use of the intended recipient. Please notify us at once
>>>> if you think that it may not be intended for you and delete it immediately.
>>>>
>>>
>>>
>>
>
>
> --
> Mario Pastorelli | TERALYTICS
>
> *software engineer*
>
> Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
> phone: +41794381682
> email: mario.pastorelli@teralytics.ch
> www.teralytics.net
>
> Company registration number: CH-020.3.037.709-7 | Trade register Canton
> Zurich
> Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz, Yann
> de Vries
>
> This e-mail message contains confidential information which is for the
> sole attention and use of the intended recipient. Please notify us at once
> if you think that it may not be intended for you and delete it immediately.
>

Mime
View raw message