accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dylan Hutchison <>
Subject Re: Abnormal behaviour of custom iterator in getting entries
Date Wed, 24 Jun 2015 08:56:10 GMT
Chiming in on one of Josh's comments

Since you're passing in what are likely multiple, disjoint ranges, I'm not
> sure you're going to get much of a performance optimization out of a custom
> iterator in this case. After each seek, your iterator would need to return
> the entries that it summed in the provided Range (the Iterator framework
> isn't designed to know the overall state of the scan -- you might have more
> data to read or you might be done. You must return the data when the data
> you're reading moves outside of the current range).
> The way that you'd see the real optimization an Iterator provides is if
> you are scanning over a large, contiguous set of rows specified by a single
> Range (you can get the reduction of reading many key/values into a single
> pair returned).

FYI, it is possible to obtain better custom iterator performance in the
case of scanning with multiple, disjoint ranges.  The trick is to call
BatchScanner's setRanges() with an infinite range, causing Accumulo to run
your iterator on every tablet.  Then, pass your desired ranges to the
iterator directly via iterator options, and let the iterator control
seeking itself.  This is kind of advanced and needs more detailed study,
but you can see a prototype of how I do it in the Graphulo
<> library:


Cheers, Dylan

On Tue, Jun 23, 2015 at 6:53 AM, madhvi <> wrote:

> Thanks Josh. It really worked for me.
> On Wednesday 17 June 2015 08:43 PM, Josh Elser wrote:
>> Madhvi,
>> Understood. A few more questions..
>> How are you passing these IDs to the batch scanner? Are you providing
>> individual Ranges for each ID (e.g. `new Range(new Key("row1", "", "id1"),
>> true, new Key("row1", "", "id1\x00"), false))`)? Or are you providing an
>> entire row (or set of rows) and using the fetchColumns(Text,Text) method
>> (or similar) on the BatchScanner?
>> Are you trying to sum across all rows that you queried? Or is your sum
>> per-row? If the former, that is going to cause you problems. The quick
>> explanation is that you can't reliably know the tablet boundaries so you
>> should try to perform an initial sum, per row. If you want, you can put a
>> second iterator "above" the first and do a summation across all rows to
>> reduce the amount of data sent to a client. However, if you use a
>> BatchScanner, you will still have to perform a final summation at the
>> client.
>> Check out
>> for more details on that..
>> madhvi wrote:
>>> Hi Josh,
>>> Sorry, my company policy doesn't allow me to share full source.What we
>>> are tryng to do is summing over a unique field stored in column
>>> qualifier for IDs passed to batch scanner.Can u suggest how it can be
>>> done in accumulo.
>>> Thanks
>>> Madhvi
>>> On Wednesday 17 June 2015 10:32 AM, Josh Elser wrote:
>>>> You put random values in the family and qualifier? Do I misunderstand
>>>> you?
>>>> Also, if you can put up the full source for the iterator, that will be
>>>> much easier if you need help debugging it. It's hard for us to guess
>>>> at why your code might not be working as you expect.
>>>> madhvi wrote:
>>>>> Hi Josh,
>>>>> I have changed HashMap to TreeMap which sorts lexicographically and I
>>>>> have inserted random values in column family and qualifier.Value of
>>>>> TreeMap in value.
>>>>> Used scanner and batch scanner but getting results only with scanner.
>>>>> Thanks
>>>>> Madhvi
>>>>> On Tuesday 16 June 2015 08:42 PM, Josh Elser wrote:
>>>>>> Additionally, you're placing the Value into the ColumnQualifier and
>>>>>> dropping the ColumnFamily completely. Granted, that may not be a
>>>>>> problem for the specific data in your table, but it's not going to
>>>>>> work for any data.
>>>>>> Christopher wrote:
>>>>>>> You're iterating over a HashMap. That's not sorted.
>>>>>>> --
>>>>>>> Christopher L Tubbs II
>>>>>>> On Tue, Jun 16, 2015 at 1:58 AM, madhvi<>
>>>>>>> wrote:
>>>>>>>> Hi Josh,
>>>>>>>> Thanks for replying. I will enable remote debugger on my
>>>>>>>> server.
>>>>>>>> However I am slightly confused with your statement "you are
>>>>>>>> returning
>>>>>>>> your data in sorted order". Can you point the part in my
>>>>>>>> code which
>>>>>>>> seems innapropriate and any possible solution for that?
>>>>>>>> Thanks
>>>>>>>> Madhvi
>>>>>>>> On Tuesday 16 June 2015 11:07 AM, Josh Elser wrote:
>>>>>>>>> //matched the condition and put values to holder map.

View raw message