lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Howe <silentgun...@gmail.com>
Subject Re: Lucene 4 getSpans not retrieving spans
Date Wed, 25 Jan 2012 23:28:47 GMT
Thanks for the reply, wrapping with the SlowMultiReaderWrapper worked.
Also, thanks for the overview on the direction of index readers!

On Wed, Jan 25, 2012 at 5:21 PM, Uwe Schindler <uwe@thetaphi.de> wrote:

> Hi,
>
> > Goofing off with my index, I ran across this example
> >
> http://www.lucidimagination.com/blog/2009/05/26/accessing-words-around-a-
> > positional-match-in-lucene/
> > for
> > using span queries to see what else is around a word that hits.
> Noticeably,
> > there's a nice getSpans(IndexReader) method that just takes in the index
> reader
> > and returns all the span objects, something not present in Lucene 4.
> > I'm trying to replicate this in Lucene 4.0 to see how viable it is and
> despite
> > having my span query hit on 10 documents, I cannot retrieve any spans.
> The
> API
> > for doing this got remarkably more complex!
> >
> > My code reads as follows:
> > IndexReader ir = search.getIndexReader(); TermContext tmctxt =
> > TermContext.build(ir.getTopReaderContext(),
> > testSpan.getTerm(), false);
> > Map termMap = new HashMap();
> > termMap.put(testSpan.getTerm(), tmctxt); AtomicReaderContext ac = new
> > IndexReader.AtomicReaderContext(ir);
>
> Don't do this, to get a top level IndexReader context, use
> IR.getTopReaderContext(). What you do here is creating an atomic context on
> an index reader that might not be atomic, this can be the reason for
> failures. Should also throw random exceptions.
>
> BTW: There is currently lot's of work done refactoring IndexReaders in two
> separate classes (CompositeIndexReader and AtomicIndexReader, so the many
> UnsupportedOperationEx methods will go away; see
> https://issues.apache.org/jira/browse/LUCENE-2858). You can then only get
> and execute spans/queries/filters/termsenum/docsenum on AtomicIndexReader
> and the corresponding contexts will be type safe. Currently this is one of
> the parts in the Lucene API that's very inconsistent and programmer
> unfriendly, because most IndexReaders in Lucene (like DirectoryReader or
> MultiReader) are composite readers that no longer have low-level
> terms/postings APIs. The new API will separate both types strictly. Also
> stuff like reopen will move away from the abstract IndexReader interface.
>
> The above code will completely fail to compile after the IR refactoring :-)
> The problem is here that you get the IndexReader that's a composite reader
> from the IndexSearcher but you try to execute Queries on it. This is no
> longer possible. You have to ask the reader for the index segments and do
> the search on the low-level atomic SegmentReaders separately. Alternatively
> wrap your IR with SlowMultiReaderWrapper that creates an atomic "view" on
> an
> index, but its simply slow, but emulates the behavior still possible in
> Lucene 3.x [but also slow there] :-)
>
> > Bits bits = new Bits.MatchAllBits(0);
> > Spans spans = testSpan.getSpans(ac, bits, termMap);
>
> This asks for spans with no deleted documents and an Index of size 0 ->
> cannot work.
>
> > However, spans never returns a spans object, spans.next() always returns
> false.
> >
> > Am I missing anything?
> >
> > Thanks!
> > Stephen
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message