lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <..@incani.com>
Subject RE: Catching BooleanQuery.TooManyClauses
Date Mon, 17 Apr 2006 12:45:44 GMT
Thanks Erick & Paul,

I also found a great example of a custom filter in LIA (6.4 Using a custom
filter)

Here's my updated testcase if anybody is interested...

===== QueryParserTest.java ================================================ 
...
public class QueryParserTest extends LuceneTestCase {
	...
	private static int MAX_HITS = 10;

	public void testCatchTooManyClauses() throws Exception {
		System.out.println("===>testCatchTooManyClauses");
		Vector docList = null;
		try {
			causeTooManyClauses();
		}
		catch(BooleanQuery.TooManyClauses ex) {
			Term term = new Term(field, queryStr);
			final BitSet bs = new BitSet(reader.maxDoc());
			TermDocs termDocs = reader.termDocs();
			WildcardTermEnum wte = new WildcardTermEnum(reader,
term);
			int cnt = 0;
			docList = new Vector(MAX_HITS);
			/*
			the methods termDocs.next() and reader.document() go
to different places in 
			the Lucene index so this will send the disk head up
and down.
			see
http://lucene.apache.org/java/docs/fileformats.html
			*/
			for (term = null; (term = wte.term()) != null && cnt
< MAX_HITS; wte.next()) {
				// get doc ids from .frq file
				termDocs.seek(term);
				while (termDocs.next() && cnt++ < MAX_HITS)
{
					bs.set(termDocs.doc());
				}
			}
			termDocs.close();
			// retrieve the Document's in numerical order
			for(int i=bs.nextSetBit(0); i>=0;
i=bs.nextSetBit(i+1)) {
				docList.add(reader.document(i));
			}
		}
		System.out.println("found:" + docList.size());
		assertTrue(docList.size() == MAX_HITS);
	} 
...
===== QueryParserTest.java ================================================
 

> -----Original Message-----
> From: Paul Elschot [mailto:paul.elschot@xs4all.nl] 
> Sent: Sunday, 16 April 2006 5:13 AM
> To: java-user@lucene.apache.org
> Subject: Re: Catching BooleanQuery.TooManyClauses
> 
> 
> On Saturday 15 April 2006 13:44, Erick Erickson wrote:
> > With the warning that I'm not the most experienced Lucene 
> user in the
> > world...
> > 
> > I *think*, that rather than search for each term, it's more 
> efficient to
> > just use IndexReader.termDocs..... i.e.
> > 
> > Indexreader ir = <whatever>;
> > TermDocs termDocs = ir.TermDocs();
> > WildcardTermEnum wildEnum = <whatever>;
> > 
> > for (Term term = null; (term = wildEnum.term()) != null; 
> wildEnum.next()) {
> >       termDocs.seek(term);
> 
> This avoids the buffer space needed for each TermDocs by 
> using each term
> separately. A BooleanQuery over all the terms will use the 
> termDocs.next() and
> termDocs.doc() for all terms at the same time. It has to, 
> because more terms
> might match each document and it has to compute the query 
> score for each
> document.
> 
> >       while (termDocs.next()) {
> >             Document doc = reader.document(termDocs.doc())
> 
> The methods termDocs.next() and reader.document()
> go to different places in the Lucene index (see the index format),
> so this will send the disk head up and down.
> It's better to collect the termDocs.doc() values first,  for 
> example in a
> BitSet, and then retrieve the Document's in numerical order.
> Btw., this is what the ConstantScoreRangeQuery does to avoid 
> using all terms
> at the same time.
> 
> >       }
> > }
> > 
> > I know that for loop looks odd, but I just peeked at the 
> source code for the
> > TermEnum classes and see why it works.
> > 
> > One warning, as the folks on the board have pointed out to 
> me is that the
> > Hits object is not entirely efficient when you fetch lots 
> of docs (more than
> > 100 has been mentioned) and you should think about TopDocs 
> or some such.
> > 
> > Also, if you can avoid fetching the document (i.e. get 
> everything you want
> > from the index) you'll add efficiency. I have no clue how 
> much you're
> > returning to the user, so I don't know whether that would 
> work for you.....
> 
> In other words, one can use the above BitSet in a Filter lateron
> during an IndexSearcher.search() (or in a ConstantScoreQuery),
> and use Hits or TopDocs for document retrieval.
> 
> Regards,
> Paul Elschot.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> -- 
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.1.385 / Virus Database: 268.4.1/313 - Release 
> Date: 15/04/2006
>  
> 

-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.385 / Virus Database: 268.4.2/314 - Release Date: 16/04/2006
 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message