lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Function writing using lucene
Date Wed, 05 Jul 2006 13:02:12 GMT
Amit:

You can make arbitrarily complex boolean clauses, see BooleanQuery. For
that, you don't need a filter. You can add boolean clauses with MUST, SHOULD
and MUST NOT (AND, OR,  NOT). Filters are for restricting queries that
create (under the covers) a large BooleanQuery. You shouldn't think about
filters unless there is a demonstrated need. You'll know that when you get a
TooManyClauses exception <G>. There is a discussion of this in the book
Lucene In Action, a book that I highly recommend.

For instance, something like should get the documents that have both
"lucene" and "apache" in the contents field:
BooleanQuery bq = new BooleanQuery;
bq.add(new TermQuery(new Term("content", "lucene")),
BooleanClause.Occur.MUST);
bq.add(new TermQuery(new Term("content", "apache")),
BooleanClause.Occur.MUST);
Hits hits = Searcher.search(bq);

For that matter, the QueryParser will do all this for you if you give it the
right string. It's up to you whether you let QueryParser construct the
clause for you or construct the BooleanQuery yourself.

Note that you can sort however you want using the Searcher.search(Query,
Sort) form of the search call. Sorting takes time, though.....

I don't understand why you care about "filtering out document IDs". You
submit a query and get a Hits object back. Then you fetch the Document from
the Hits object that contains your data. Document IDs are an integral part
of the document (and assigned by Lucene at index time). I have a hard time
imagining why you'd even want to try to filter them out, I'm not sure the
question makes sense in the Lucene context.

The default return order is by relevance, doc id has nothing to do with it.

You can certainly write a custom filter, but be really sure you need it
first. Wildcard queries and prefix queries are prime candidates for filters
in my experience.

If you haven't gotten a copy of Lucene In Action, I strongly recommend that
you do. It explains a LOT about how Lucene works and should be used.

Also, get a copy of Luke to examine your indexes and allow you to play
around with the query syntax. It'll save you a LOT of time and effort.

Hope this helps
Erick

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message