lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Meyer <jakmcb...@gmail.com>
Subject Re: Weird time results doing wildcard queries
Date Thu, 08 Sep 2005 22:15:40 GMT
The issue isn't with multiple wildcards exactly. Specifically, the problem 
is if the query starts with a wildcard. In the case where it starts with a 
wildcard, lucene has no option but to linearly go over every term in the 
index to see if it matches your pattern. It must visit every singe term in 
the index. If it doesn't start with a wildcard, lucene can skip to the 
relevant part of the index and only visit the relevant terms. For this 
reason, many people that use Lucene choose to disable having wildcard at the 
start of a search term. This is discussed in the "Lucene in Action" book.

~Jack~

>>Hello All,
>>I am getting some weird time results when retrieving documents back from a 
hits object. I am just timing this bit of code:
>>Hits hits = searcher.search(query);
>>long startTime = System.currentTimeMillis(); for (int i = 0; i < 
hits.length(); i++) { Document doc = hits.doc(i); String field =
doc.get(defaultField);
} System.out.println("Cycle Time: "+(System.currentTimeMillis()-startTime));
>>
>>It seems when I have a wilcard query like *abcd* vs weqrew*, the *abcd* 
query will always take longer to retrieve the documents even if they are of 
simular result sizes. We are talking a big difference 1 second vs 16. It is 
consistent no matter >>what order I run the queries in, terms with multiple 
wildcards always take longer to retrieve the documents. I am not counting 
the time of the query.
>>
>>The index is 2.18 GB, 9 fields per document, 10,694,190 documents,
>>25,538,793 terms and has been optimized.
>>
>>I am not sure if this is a real or just a percieved issue. We cannot 
figure out why the type of query would affect the speed it takes to retrieve 
each document. We have run this on both Windows XP and Linux. With the same 
results. Also to >>note we did watch GC and this did not have any 
significant impact that we could se.
>>
>>We are trying to figure out what could cause this and how we can work 
around it.
>>
>>
>>Thanks,
>>Richard

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message