accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Abnormal behaviour of custom iterator in getting entries
Date Thu, 18 Jun 2015 16:41:42 GMT
Good, I'm glad you found it useful.

The important thing to always remember is that your data is split across 
many tablet servers and that Iterators run local to each tablet server. 
As such, you cannot compute a single sum via an iterator, you can, at 
best, compute N intermediate sums -- one of each tabletserver the 
batchscanner had to talk to.

Also ignore my previous comment about a second iterator. I had assumed 
you were doing something fancier than selecting a single column 
qualifier from a row.

Since you're passing in what are likely multiple, disjoint ranges, I'm 
not sure you're going to get much of a performance optimization out of a 
custom iterator in this case. After each seek, your iterator would need 
to return the entries that it summed in the provided Range (the Iterator 
framework isn't designed to know the overall state of the scan -- you 
might have more data to read or you might be done. You must return the 
data when the data you're reading moves outside of the current range).

The way that you'd see the real optimization an Iterator provides is if 
you are scanning over a large, contiguous set of rows specified by a 
single Range (you can get the reduction of reading many key/values into 
a single pair returned).

If I mis-stated your situation, please do let me know.

madhvi wrote:
> Hi,
>
> Thanks for the blog you shared.I found it quite useful for my requirement.
> "How are you passing these IDs to the batch scanner?"
> I am passing row ids received as a previous query result from another
> table as 'new Range(entry.getKey().getRow())' in a Range type list and
> passing that list to batch Scanner.
>
> "Are you trying to sum across all rows that you queried? "
> Yes we need to sum a particular column qualifier across the rows ids
> passed to batch scanner.How the summation can be done across the rows as
> you said "you can put a second iterator "above" the first"?
>
> Thanks
> Madhvi
> On Wednesday 17 June 2015 08:43 PM, Josh Elser wrote:
>> Madhvi,
>>
>> Understood. A few more questions..
>>
>> How are you passing these IDs to the batch scanner? Are you providing
>> individual Ranges for each ID (e.g. `new Range(new Key("row1", "",
>> "id1"), true, new Key("row1", "", "id1\x00"), false))`)? Or are you
>> providing an entire row (or set of rows) and using the
>> fetchColumns(Text,Text) method (or similar) on the BatchScanner?
>>
>> Are you trying to sum across all rows that you queried? Or is your sum
>> per-row? If the former, that is going to cause you problems. The quick
>> explanation is that you can't reliably know the tablet boundaries so
>> you should try to perform an initial sum, per row. If you want, you
>> can put a second iterator "above" the first and do a summation across
>> all rows to reduce the amount of data sent to a client. However, if
>> you use a BatchScanner, you will still have to perform a final
>> summation at the client.
>>
>> Check out
>> https://blogs.apache.org/accumulo/entry/thinking_about_reads_over_accumulo
>> for more details on that..
>>
>> madhvi wrote:
>>> Hi Josh,
>>>
>>> Sorry, my company policy doesn't allow me to share full source.What we
>>> are tryng to do is summing over a unique field stored in column
>>> qualifier for IDs passed to batch scanner.Can u suggest how it can be
>>> done in accumulo.
>>>
>>> Thanks
>>> Madhvi
>>> On Wednesday 17 June 2015 10:32 AM, Josh Elser wrote:
>>>> You put random values in the family and qualifier? Do I misunderstand
>>>> you?
>>>>
>>>> Also, if you can put up the full source for the iterator, that will be
>>>> much easier if you need help debugging it. It's hard for us to guess
>>>> at why your code might not be working as you expect.
>>>>
>>>> madhvi wrote:
>>>>> Hi Josh,
>>>>>
>>>>> I have changed HashMap to TreeMap which sorts lexicographically and I
>>>>> have inserted random values in column family and qualifier.Value of
>>>>> TreeMap in value.
>>>>> Used scanner and batch scanner but getting results only with scanner.
>>>>>
>>>>> Thanks
>>>>> Madhvi
>>>>>
>>>>> On Tuesday 16 June 2015 08:42 PM, Josh Elser wrote:
>>>>>> Additionally, you're placing the Value into the ColumnQualifier and
>>>>>> dropping the ColumnFamily completely. Granted, that may not be a
>>>>>> problem for the specific data in your table, but it's not going to
>>>>>> work for any data.
>>>>>>
>>>>>> Christopher wrote:
>>>>>>> You're iterating over a HashMap. That's not sorted.
>>>>>>>
>>>>>>> --
>>>>>>> Christopher L Tubbs II
>>>>>>> http://gravatar.com/ctubbsii
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jun 16, 2015 at 1:58 AM, madhvi<madhvi.gupta@orkash.com>
>>>>>>> wrote:
>>>>>>>> Hi Josh,
>>>>>>>> Thanks for replying. I will enable remote debugger on my
Accumulo
>>>>>>>> server.
>>>>>>>>
>>>>>>>> However I am slightly confused with your statement "you are
not
>>>>>>>> returning
>>>>>>>> your data in sorted order". Can you point the part in my
iterator
>>>>>>>> code which
>>>>>>>> seems innapropriate and any possible solution for that?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Madhvi
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tuesday 16 June 2015 11:07 AM, Josh Elser wrote:
>>>>>>>>> //matched the condition and put values to holder map.
>>>>>>>>
>>>>>
>>>
>

Mime
View raw message