accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: Abnormal behaviour of custom iterator in getting entries
Date Fri, 19 Jun 2015 18:26:09 GMT
Also, apparently I wrote something similar to your problem a long time ago:

The above implementation does assume large contiguous ranges. Thought it 
might be helpful anyways.

Josh Elser wrote:
> Good, I'm glad you found it useful.
> The important thing to always remember is that your data is split across
> many tablet servers and that Iterators run local to each tablet server.
> As such, you cannot compute a single sum via an iterator, you can, at
> best, compute N intermediate sums -- one of each tabletserver the
> batchscanner had to talk to.
> Also ignore my previous comment about a second iterator. I had assumed
> you were doing something fancier than selecting a single column
> qualifier from a row.
> Since you're passing in what are likely multiple, disjoint ranges, I'm
> not sure you're going to get much of a performance optimization out of a
> custom iterator in this case. After each seek, your iterator would need
> to return the entries that it summed in the provided Range (the Iterator
> framework isn't designed to know the overall state of the scan -- you
> might have more data to read or you might be done. You must return the
> data when the data you're reading moves outside of the current range).
> The way that you'd see the real optimization an Iterator provides is if
> you are scanning over a large, contiguous set of rows specified by a
> single Range (you can get the reduction of reading many key/values into
> a single pair returned).
> If I mis-stated your situation, please do let me know.
> madhvi wrote:
>> Hi,
>> Thanks for the blog you shared.I found it quite useful for my
>> requirement.
>> "How are you passing these IDs to the batch scanner?"
>> I am passing row ids received as a previous query result from another
>> table as 'new Range(entry.getKey().getRow())' in a Range type list and
>> passing that list to batch Scanner.
>> "Are you trying to sum across all rows that you queried? "
>> Yes we need to sum a particular column qualifier across the rows ids
>> passed to batch scanner.How the summation can be done across the rows as
>> you said "you can put a second iterator "above" the first"?
>> Thanks
>> Madhvi
>> On Wednesday 17 June 2015 08:43 PM, Josh Elser wrote:
>>> Madhvi,
>>> Understood. A few more questions..
>>> How are you passing these IDs to the batch scanner? Are you providing
>>> individual Ranges for each ID (e.g. `new Range(new Key("row1", "",
>>> "id1"), true, new Key("row1", "", "id1\x00"), false))`)? Or are you
>>> providing an entire row (or set of rows) and using the
>>> fetchColumns(Text,Text) method (or similar) on the BatchScanner?
>>> Are you trying to sum across all rows that you queried? Or is your sum
>>> per-row? If the former, that is going to cause you problems. The quick
>>> explanation is that you can't reliably know the tablet boundaries so
>>> you should try to perform an initial sum, per row. If you want, you
>>> can put a second iterator "above" the first and do a summation across
>>> all rows to reduce the amount of data sent to a client. However, if
>>> you use a BatchScanner, you will still have to perform a final
>>> summation at the client.
>>> Check out
>>> for more details on that..
>>> madhvi wrote:
>>>> Hi Josh,
>>>> Sorry, my company policy doesn't allow me to share full source.What we
>>>> are tryng to do is summing over a unique field stored in column
>>>> qualifier for IDs passed to batch scanner.Can u suggest how it can be
>>>> done in accumulo.
>>>> Thanks
>>>> Madhvi
>>>> On Wednesday 17 June 2015 10:32 AM, Josh Elser wrote:
>>>>> You put random values in the family and qualifier? Do I misunderstand
>>>>> you?
>>>>> Also, if you can put up the full source for the iterator, that will be
>>>>> much easier if you need help debugging it. It's hard for us to guess
>>>>> at why your code might not be working as you expect.
>>>>> madhvi wrote:
>>>>>> Hi Josh,
>>>>>> I have changed HashMap to TreeMap which sorts lexicographically and
>>>>>> have inserted random values in column family and qualifier.Value
>>>>>> TreeMap in value.
>>>>>> Used scanner and batch scanner but getting results only with scanner.
>>>>>> Thanks
>>>>>> Madhvi
>>>>>> On Tuesday 16 June 2015 08:42 PM, Josh Elser wrote:
>>>>>>> Additionally, you're placing the Value into the ColumnQualifier
>>>>>>> dropping the ColumnFamily completely. Granted, that may not be
>>>>>>> problem for the specific data in your table, but it's not going
>>>>>>> work for any data.
>>>>>>> Christopher wrote:
>>>>>>>> You're iterating over a HashMap. That's not sorted.
>>>>>>>> --
>>>>>>>> Christopher L Tubbs II
>>>>>>>> On Tue, Jun 16, 2015 at 1:58 AM, madhvi<>
>>>>>>>> wrote:
>>>>>>>>> Hi Josh,
>>>>>>>>> Thanks for replying. I will enable remote debugger on
my Accumulo
>>>>>>>>> server.
>>>>>>>>> However I am slightly confused with your statement "you
are not
>>>>>>>>> returning
>>>>>>>>> your data in sorted order". Can you point the part in
my iterator
>>>>>>>>> code which
>>>>>>>>> seems innapropriate and any possible solution for that?
>>>>>>>>> Thanks
>>>>>>>>> Madhvi
>>>>>>>>> On Tuesday 16 June 2015 11:07 AM, Josh Elser wrote:
>>>>>>>>>> //matched the condition and put values to holder

View raw message