lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aberdee...@yahoo.com
Subject Re: QueryWrapperFilter and DocIdSetIterator
Date Tue, 20 Sep 2011 18:49:37 GMT
I've created https://issues.apache.org/jira/browse/LUCENE-3442 to document this.

Thanks for your help,
Dan



----- Original Message -----
From: Uwe Schindler <uwe@thetaphi.de>
To: java-user@lucene.apache.org
Cc: 
Sent: Tuesday, September 20, 2011 11:01 AM
Subject: RE: QueryWrapperFilter and DocIdSetIterator

I investigated your problem:

It's a 3.x bug in an optimization in TermQuery. All other queries work.

TermQuery assumes that the IndexReader passed into it's Scorer method is
atomic (means is a segment reader). This is not the case for your example
code. It uses a hash-based cache to cache document frequencies, but this
cache is only.

Searching on top-level searchers is no longer be done in Lucene since 2.9,
but the 3.x API still supports this (trunk aka 4.0 does no longer).

Can you open an issue for the 3.4 version? I already have a fix for
TermQuery.java, it contains a wrong assumption.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Uwe Schindler [mailto:uwe@thetaphi.de]
> Sent: Tuesday, September 20, 2011 7:33 PM
> To: java-user@lucene.apache.org; aberdeen61@yahoo.com
> Subject: RE: QueryWrapperFilter and DocIdSetIterator
> 
> Hi,
> 
> I don't see a problem in your code:
> If you look at the source code of QueryWrapperFilter, it will never return
> NULL, so it returns always a DocIdSet theat itself returns the Scorer of
the
> query as Iterator.
> 
>   @Override
>   public DocIdSet getDocIdSet(final IndexReader reader) throws IOException
{
>     final Weight weight = new
> IndexSearcher(reader).createNormalizedWeight(query);
>     return new DocIdSet() {
>       @Override
>       public DocIdSetIterator iterator() throws IOException {
>         return weight.scorer(reader, true, false);
>       }
>       @Override
>       public boolean isCacheable() { return false; }
>     };
>   }
> 
> The only reason the DISI returned by iterator() is null is the case, when
> the underlying query returns a null scorer (which can happen if no
documents
> match the query).
> 
> One thing is different in your type of execution: Since Lucene 2.9,
> IndexSearcher executes the query per-segment, but you are executing the
> filter on the top-level IndexReader (not separately for each segment).
This
> should not be an issue in Lucene 3.x, but with Lucene trunk this will
throw
> UnsupportedOperationException.
> 
> I think, your query seems to really return no documents, I have no idea,
> why.
> 
> Uwe
> 
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
> 
> 
> > -----Original Message-----
> > From: aberdeen61@yahoo.com [mailto:aberdeen61@yahoo.com]
> > Sent: Tuesday, September 20, 2011 7:09 PM
> > To: java-user@lucene.apache.org
> > Subject: QueryWrapperFilter and DocIdSetIterator
> >
> > I've been trying to use the QueryWrapperFilter as part of composing a
set
> of
> > filters. Are there limitations on the types of queries it can wrap?
 When
> I try to
> > get the DocSetIdIterator for the filter it comes up null. This happens
> even when
> > the query is a simple TermQuery.
> >
> > The following code shows that the iterator for a QueryWrapperFilter
> returns
> > null rather than an iterator with the same document as a search using
the
> > query.
> > This was run using lucene-core-3.4.0.jar on java 1.6.0_27
> > Am I using this incorrectly? Are there constraints or additional
> information on
> > how a reader is supposed to be passed to the method to get a DocIdSet?
> >
> > On a related note, I examined the TestQueryWrapperFilter source code in
> > lucene 3.4.0 which indicates that the QueryWrapperFilter can be used
with
> > primitive, complex primitive and non primitive Queries. I did note that
> the test
> > for complex primitive query generates a BooleanQuery, but doesn't use it
> in
> > the test. However, even when I corrected that it passed the test, so I'm
> unclear
> > on the difference in the usage in the published test case and my example
> > below.
> >
> > Thanks,
> > Dan
> >
> > ==============================
> > import java.io.IOException;import
> > org.apache.lucene.analysis.WhitespaceAnalyzer;import
> > org.apache.lucene.document.Document;
> > import org.apache.lucene.document.Field;
> > import org.apache.lucene.document.Field.Index;
> > import org.apache.lucene.document.Field.Store;
> > import org.apache.lucene.index.IndexReader;
> > import org.apache.lucene.index.IndexWriter;
> > import org.apache.lucene.index.IndexWriterConfig;
> > import org.apache.lucene.index.Term;
> > import org.apache.lucene.store.RAMDirectory;
> > import org.apache.lucene.util.Version;
> > import org.apache.lucene.search.DocIdSet;
> > import org.apache.lucene.search.DocIdSetIterator;
> > import org.apache.lucene.search.Filter;
> > import org.apache.lucene.search.IndexSearcher;
> > import org.apache.lucene.search.QueryWrapperFilter;
> > import org.apache.lucene.search.TermQuery;
> > import org.apache.lucene.search.TopDocs;
> >
> > public class TestQueryWrapperFilterIterator {
> > public static void main(String[] args) {
> > try {
> > IndexWriterConfig iwconfig = new IndexWriterConfig(Version.LUCENE_34,
> new
> > WhitespaceAnalyzer(Version.LUCENE_34));
> > iwconfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
> > RAMDirectory dir = new RAMDirectory();
> > IndexWriter writer = new IndexWriter(dir, iwconfig);
> > Document d = new Document();
> > d.add(new Field("id", "1001", Store.YES, Index.NOT_ANALYZED));
> > d.add(new Field("text", "headline one group one", Store.YES,
> Index.ANALYZED));
> > d.add(new Field("group", "grp1", Store.YES, Index.NOT_ANALYZED));
> > writer.addDocument(d);
> > writer.commit();
> > writer.close();
> > IndexReader rdr = IndexReader.open(dir);
> > IndexSearcher searcher = new IndexSearcher(rdr);
> > TermQuery tq = new TermQuery(new Term("text", "headline"));
> > TopDocs results = searcher.search(tq, 5);
> > System.out.println("Number of search results: " + results.totalHits);
> > Filter f = new QueryWrapperFilter(tq);DocIdSet dis =
> > f.getDocIdSet(rdr);DocIdSetIterator it = dis.iterator();
> > if (it != null) {
> > int docId = it.nextDoc();
> > while (docId != DocIdSetIterator.NO_MORE_DOCS) {
> > Document doc = rdr.document(docId);
> > System.out.println("Iterator doc: " + doc.get("id"));
> > docId = it.nextDoc();
> > }
> > } else {
> > System.out.println("Iterator was null: ");
> > }
> > searcher.close();
> > rdr.close();
> > } catch (IOException ioe) {
> > ioe.printStackTrace();
> > }
> >
> > }
> > }
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message