accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sven Hodapp <sven.hod...@scai.fraunhofer.de>
Subject Re: IntersectingIterator and Ranges
Date Fri, 18 Dec 2015 13:34:26 GMT
Hi Billie,

I've read in the source code documentation the following:

    This iterator is commonly used with BatchScanner or AccumuloInputFormat, to parallelize
the search over all shardIDs.

This means key1 and key2 (the shradIDs) should be searched? Or is this a misunderstanding?
The IndexedDocIterator should have also search in all shradIDs?

Thanks!

Regards,
Sven

-- 
Sven Hodapp, M.Sc.,
Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
Department of Bioinformatics
Schloss Birlinghoven, 53754 Sankt Augustin, Germany
sven.hodapp@scai.fraunhofer.de
www.scai.fraunhofer.de

----- Urspr√ľngliche Mail -----
> Von: "Billie Rinaldi" <billie.rinaldi@gmail.com>
> An: "user" <user@accumulo.apache.org>
> Gesendet: Mittwoch, 18. November 2015 15:57:15
> Betreff: Re: IntersectingIterator and Ranges

> Yes, that is the correct behavior. The IntersectingIterator intersects
> columns within a row, on a single tablet server. To get the results you
> want, you should make sure all the terms for a document are inserted with
> the same key / row. In this case, all the doc1 entries should have key1 as
> their row.
> 
> Billie
> On Nov 18, 2015 7:08 AM, "Sven Hodapp" <sven.hodapp@scai.fraunhofer.de>
> wrote:
> 
>> Hello together,
>>
>> Currently I'm using Accumulo 1.7 (currently single a node) with the
>> IntersectingIterator.
>> The current index schema for the IntersectingIterator looks like this, for
>> example:
>>
>>     key1 : term1 : doc1
>>     key1 : term2 : doc1
>>     key2 : term3 : doc1
>>
>> I've noticed that I can't intersect terms which are in distinct key-ranges.
>> Is that a correct behavior, or I'm doing something wrong?
>>
>> Extract of my code (Scala) as example:
>>
>>     val bs = conn.createBatchScanner(tableName, authorizations,
>> numQueryThreads)
>>     val terms = List(new Text("term1"), new Text("term2")).toArray
>>
>>     val ii = new IteratorSetting(priority, name, iteratorClass)
>>     IntersectingIterator.setColumnFamilies(ii, terms)
>>     bs.addScanIterator(ii)
>>
>>     bs.setRanges(Collections.singleton(new Range()))  // all ranges
>>
>>     for (entry <- bs.asScala.take(100)) yield {
>>       entry.getKey.getColumnQualifier.toString
>>     }
>>
>> This will yield "doc1" as expected.
>>
>> But if I'll choose the terms like this:
>>
>>     // ...
>>     val terms = List(new Text("term1"), new Text("term3")).toArray
>>     // ...
>>
>> It will yield "null" but I would expect here also "doc1".
>> I've also tried this with setting a list of Range.exact,
>> but I'll get also "null".
>>
>> I'm doing something wrong?
>>
>> Thank you in advance!
>>
>> Regards,
>> Sven
>>
>> --
>> Sven Hodapp, M.Sc.,
>> Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
>> Department of Bioinformatics
>> Schloss Birlinghoven, 53754 Sankt Augustin, Germany
>> sven.hodapp@scai.fraunhofer.de
>> www.scai.fraunhofer.de

Mime
View raw message