lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <>
Subject RE: Statically store sub-collections for search (faceted search?)
Date Mon, 15 Apr 2013 11:43:16 GMT

> Hi again,
> >>> You are somehow "misusing" acceptDocs and DocIdSet here, so you
> have
> >> to take care, semantics are different:
> >>> - For acceptDocs "null" means "all documents allowed" -> no deleted
> >>> documents
> >>> - For DocIdSet "null" means "no documents matched"
> >>
> >> Okay, as described above, I would now pass either the result of
> >> getLiveDocs() or Bits.MatchAllDocuments() as the acceptDocs argument
> >> to
> >> getDocIdSet():
> >>
> >> Map<Term, TermContext> termContexts = new HashMap<>();
> >> AtomicReaderContext atomic = ...
> >> ChainedFilter filter = ...
> >
> > You just pass getLiveDocs(), no null check needed. Using your code would
> bring a slowdown for indexes without deletions.
> This makes sense to me, but now I get zero matches in all searches using the
> filter. I am pondering this remark in the documentation of
> Filter.getDocIdSet(AtomicReaderContext context, Bits acceptDocs):
> "acceptDocs - Bits that represent the allowable docs to match (typically
> deleted docs but possibly filtering other documents)"

This just means, you can pass liveDocs as got from AtomicReader (live == inverse deleted docs),
but you can pass also any other Bits implementation that may remove more documents from results.
This is what you are dowing with spans.

Passing NULL means all documents are allowed, if this would not be the case, whole Lucene
queries and filters would not work at all, so if you get 0 docs, you must have missed something
else. If this is not the case, your filter may behave wrong. Look at e.g. FilteredQuery, IndexSearcher
or any other query in Lucene that handles acceptDocs - those pass getLiveDocs() down. If they
are null, that means all documents are allowed. The javadocs on Scorer/Filter/... should be
more clear about this. Can you open an issue about Javadocs?

> I understand that getLiveDocs() returns the document bits set that represent
> NON-deleted documents which seems to match the first part of the
> description (allowable docs). However, why does it say in brackets "typically
> deleted docs"? I had ignored this so far, but as I get zero results now, this
> might be relevant.

See above.

> I am also thinking about how to possibly make use of a BitsFilteredDocIdSet
> in the following kind:
> ChainFilter filter = ...
> AtomicReaderContext = ...
> Bits alldocs = atomic.reader().getLiveDocs(); DocIdSet docids =
> filter.getDocIdSet(atomic, alldocs); BitsFilteredDocIdSet filtered = new
> BitsFilteredDocIdSet(docids, alldocs); Spans luceneSpans =
> sq.getSpans(atomic, filtered.bits(), termContexts);
> However, the documentation of the constructor public
> BitsFilteredDocIdSet(DocIdSet innerSet, Bits acceptDocs) does not make it
> clear to me whether I am applying the arguments correcty. I fails especially to
> understand the acceptDocs argument again:
> "acceptDocs - Allowed docs, all docids not in this set will not be returned by
> this DocIdSet"

You should use BitsFilteredDocIdSet.wrap(), the ctor does not do null checks.

> Would this be the correct way to apply a filter on a SpanQuery?

new FilteredQuery(SpanQuery,Filter)?

> Thanks!
> Carsten
> --
> Institut für Deutsche Sprache |
> Projekt KorAP                 |
> Tel. +49-(0)621-43740789      |
> Korpusanalyseplattform der nächsten Generation Next Generation Corpus
> Analysis Platform
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message