accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Billie Rinaldi <billie.rina...@gmail.com>
Subject Re: IntersectingIterator and Ranges
Date Fri, 18 Dec 2015 15:12:01 GMT
Yes, all shardIDs will be searched to find documents containing term1 and
term2.  Data will not be passed from one shardID to another, so each
document must appear in only one shard.  You can read more about
document-partitioned indexing at [1] and [2].

[1]:
https://accumulo.apache.org/1.7/accumulo_user_manual.html#_document_partitioned_indexing
[2]:
http://nlp.stanford.edu/IR-book/html/htmledition/distributing-indexes-1.html


On Fri, Dec 18, 2015 at 5:34 AM, Sven Hodapp <sven.hodapp@scai.fraunhofer.de
> wrote:

> Hi Billie,
>
> I've read in the source code documentation the following:
>
>     This iterator is commonly used with BatchScanner or
> AccumuloInputFormat, to parallelize the search over all shardIDs.
>
> This means key1 and key2 (the shradIDs) should be searched? Or is this a
> misunderstanding?
> The IndexedDocIterator should have also search in all shradIDs?
>
> Thanks!
>
> Regards,
> Sven
>
> --
> Sven Hodapp, M.Sc.,
> Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
> Department of Bioinformatics
> Schloss Birlinghoven, 53754 Sankt Augustin, Germany
> sven.hodapp@scai.fraunhofer.de
> www.scai.fraunhofer.de
>
> ----- Urspr√ľngliche Mail -----
> > Von: "Billie Rinaldi" <billie.rinaldi@gmail.com>
> > An: "user" <user@accumulo.apache.org>
> > Gesendet: Mittwoch, 18. November 2015 15:57:15
> > Betreff: Re: IntersectingIterator and Ranges
>
> > Yes, that is the correct behavior. The IntersectingIterator intersects
> > columns within a row, on a single tablet server. To get the results you
> > want, you should make sure all the terms for a document are inserted with
> > the same key / row. In this case, all the doc1 entries should have key1
> as
> > their row.
> >
> > Billie
> > On Nov 18, 2015 7:08 AM, "Sven Hodapp" <sven.hodapp@scai.fraunhofer.de>
> > wrote:
> >
> >> Hello together,
> >>
> >> Currently I'm using Accumulo 1.7 (currently single a node) with the
> >> IntersectingIterator.
> >> The current index schema for the IntersectingIterator looks like this,
> for
> >> example:
> >>
> >>     key1 : term1 : doc1
> >>     key1 : term2 : doc1
> >>     key2 : term3 : doc1
> >>
> >> I've noticed that I can't intersect terms which are in distinct
> key-ranges.
> >> Is that a correct behavior, or I'm doing something wrong?
> >>
> >> Extract of my code (Scala) as example:
> >>
> >>     val bs = conn.createBatchScanner(tableName, authorizations,
> >> numQueryThreads)
> >>     val terms = List(new Text("term1"), new Text("term2")).toArray
> >>
> >>     val ii = new IteratorSetting(priority, name, iteratorClass)
> >>     IntersectingIterator.setColumnFamilies(ii, terms)
> >>     bs.addScanIterator(ii)
> >>
> >>     bs.setRanges(Collections.singleton(new Range()))  // all ranges
> >>
> >>     for (entry <- bs.asScala.take(100)) yield {
> >>       entry.getKey.getColumnQualifier.toString
> >>     }
> >>
> >> This will yield "doc1" as expected.
> >>
> >> But if I'll choose the terms like this:
> >>
> >>     // ...
> >>     val terms = List(new Text("term1"), new Text("term3")).toArray
> >>     // ...
> >>
> >> It will yield "null" but I would expect here also "doc1".
> >> I've also tried this with setting a list of Range.exact,
> >> but I'll get also "null".
> >>
> >> I'm doing something wrong?
> >>
> >> Thank you in advance!
> >>
> >> Regards,
> >> Sven
> >>
> >> --
> >> Sven Hodapp, M.Sc.,
> >> Fraunhofer Institute for Algorithms and Scientific Computing SCAI,
> >> Department of Bioinformatics
> >> Schloss Birlinghoven, 53754 Sankt Augustin, Germany
> >> sven.hodapp@scai.fraunhofer.de
> >> www.scai.fraunhofer.de
>

Mime
View raw message