accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mario Pastorelli <mario.pastore...@teralytics.ch>
Subject Re: Making a RowCounterIterator
Date Fri, 15 Jul 2016 22:35:16 GMT
The WholeRowIterator is for filtering: I need all the columns that the
filter requires so that the filter can see if the row matches or not the
query. That's the only proper way I found to implement logic operators on
predicated over columns of the same row.

Actually I do have a question about WholeRowIterator, while we are talking
about them. Do they make sense when used with a BatchScanner? My guess is
yes because while the BatchScanner can return data non-sorted to the
client, when it is scanning a single tablet the data is sorted. Because the
data of the same rowId is never split (right?) then there is no problem in
using a WholeRowIterator with a BatchScanner. Is this correct? I really
can't find much documentation for Accumulo and the book doesn't help enough.

On Sat, Jul 16, 2016 at 12:29 AM, Christopher <ctubbsii@apache.org> wrote:

> It'd be more efficient to use the FirstEntryInRowIterator to just grab one
> each, rather than the WholeRowIterator which could use up a lot of memory
> unnecessarily.
>
> On Fri, Jul 15, 2016 at 6:20 PM Mario Pastorelli <
> mario.pastorelli@teralytics.ch> wrote:
>
>> I'm actually using this after a wholerowiterator, which is used to filter
>> rows with the same rowId.
>>
>> On Fri, Jul 15, 2016 at 10:02 PM, William Slacum <wslacum@gmail.com>
>> wrote:
>>
>>> The iterator in the gist also counts cells/entries/KV pairs, not unique
>>> rows. You'll want to have some way to skip to the next row value if you
>>> want the count to be reflective of the number of rows being read.
>>>
>>> On Fri, Jul 15, 2016 at 3:34 PM, Shawn Walker <accumulo@shawn-walker.net
>>> > wrote:
>>>
>>>> My read is that you're mistaking the sequence of calls Accumulo will be
>>>> making to your iterator.  The sequence isn't quite the same as a Java
>>>> iterator (initially positioned "before" the first element), and is more
>>>> like a C++ iterator:
>>>>
>>>> 0. Accumulo calls seek(...)
>>>> 1. Is there more data? Accumulo calls hasTop(). You return yes.
>>>> 2. Ok, so there's data.  Accumulo calls getTopKey(), getTopValue() to
>>>> retrieve the data. You return a key indicating 0 columns seen (since next()
>>>> hasn't yet been called)
>>>> 3. First datum done, Accumulo calls next()
>>>> ...
>>>>
>>>> I imagine that if you pull the second item out of your scan result,
>>>> it'll have the number you expect.  Alternately, you might consider
>>>> performing the count computation during an override of the seek(...)
>>>> method, instead of in the next(...) method.
>>>>
>>>> --
>>>> Shawn Walker
>>>>
>>>>
>>>>
>>>> On Fri, Jul 15, 2016 at 2:24 PM, Mario Pastorelli <
>>>> mario.pastorelli@teralytics.ch> wrote:
>>>>
>>>>> I'm trying to create a RowCounterIterator that counts all the rows and
>>>>> returns only one key-value with the counter inside. The problem is that
I
>>>>> can't get it work. The Scala code is available in the gist
>>>>> <https://gist.github.com/melrief/5f2ca248f1a980ddead2f2eeb19e6389>
>>>>> together with some pseudo-code of a test. The problem is that if I add
an
>>>>> entry to my table, this iterator will return 0 instead of 1 and apparently
>>>>> the reason is that super.hasTop() is always false. I've tried without
the
>>>>> iterator and the scanner returns 1 elements. Any idea of what I'm doing
>>>>> wrong here? Is WrappingIterator the right class to extend for this kind
of
>>>>> behaviour?
>>>>>
>>>>> Thanks,
>>>>> Mario
>>>>>
>>>>> --
>>>>> Mario Pastorelli | TERALYTICS
>>>>>
>>>>> *software engineer*
>>>>>
>>>>> Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
>>>>> phone: +41794381682
>>>>> email: mario.pastorelli@teralytics.ch
>>>>> www.teralytics.net
>>>>>
>>>>> Company registration number: CH-020.3.037.709-7 | Trade register
>>>>> Canton Zurich
>>>>> Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz,
>>>>> Yann de Vries
>>>>>
>>>>> This e-mail message contains confidential information which is for the
>>>>> sole attention and use of the intended recipient. Please notify us at
once
>>>>> if you think that it may not be intended for you and delete it immediately.
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Mario Pastorelli | TERALYTICS
>>
>> *software engineer*
>>
>> Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
>> phone: +41794381682
>> email: mario.pastorelli@teralytics.ch
>> www.teralytics.net
>>
>> Company registration number: CH-020.3.037.709-7 | Trade register Canton
>> Zurich
>> Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz,
>> Yann de Vries
>>
>> This e-mail message contains confidential information which is for the
>> sole attention and use of the intended recipient. Please notify us at once
>> if you think that it may not be intended for you and delete it immediately.
>>
>


-- 
Mario Pastorelli | TERALYTICS

*software engineer*

Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
phone: +41794381682
email: mario.pastorelli@teralytics.ch
www.teralytics.net

Company registration number: CH-020.3.037.709-7 | Trade register Canton
Zurich
Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz, Yann
de Vries

This e-mail message contains confidential information which is for the sole
attention and use of the intended recipient. Please notify us at once if
you think that it may not be intended for you and delete it immediately.

Mime
View raw message