lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Krenek <richard.kre...@gmail.com>
Subject Re: Weird time results doing wildcard queries
Date Thu, 08 Sep 2005 22:31:47 GMT
I understand that for the query, but why does it matter once you have the 
Hits object? That is the part I'm baffled on. The query with the wildcard in 
the front takes a lot longer, but we expected that.

On 9/8/05, Jeremy Meyer <jakmcbane@gmail.com> wrote:
> 
> The issue isn't with multiple wildcards exactly. Specifically, the problem 
> is if the query starts with a wildcard. In the case where it starts with a 
> wildcard, lucene has no option but to linearly go over every term in the 
> index to see if it matches your pattern. It must visit every singe term in 
> the index. If it doesn't start with a wildcard, lucene can skip to the 
> relevant part of the index and only visit the relevant terms. For this 
> reason, many people that use Lucene choose to disable having wildcard at the 
> start of a search term. This is discussed in the "Lucene in Action" book.
> 
> ~Jack~
> 
> >>Hello All,
> >>I am getting some weird time results when retrieving documents back from 
> a hits object. I am just timing this bit of code:
> >>Hits hits = searcher.search(query);
> >>long startTime = System.currentTimeMillis(); for (int i = 0; i < 
> hits.length(); i++) { Document doc = hits.doc(i); String field = doc.get(defaultField);

> } System.out.println("Cycle Time: "+(System.currentTimeMillis
> ()-startTime));
> >>
> >>It seems when I have a wilcard query like *abcd* vs weqrew*, the *abcd* 
> query will always take longer to retrieve the documents even if they are of 
> simular result sizes. We are talking a big difference 1 second vs 16. It is 
> consistent no matter >>what order I run the queries in, terms with multiple 
> wildcards always take longer to retrieve the documents. I am not counting 
> the time of the query.
> >>
> >>The index is 2.18 GB, 9 fields per document, 10,694,190 documents,
> >>25,538,793 terms and has been optimized.
> >>
> >>I am not sure if this is a real or just a percieved issue. We cannot 
> figure out why the type of query would affect the speed it takes to retrieve 
> each document. We have run this on both Windows XP and Linux. With the same 
> results. Also to >>note we did watch GC and this did not have any 
> significant impact that we could se.
> >>
> >>We are trying to figure out what could cause this and how we can work 
> around it.
> >>
> >>
> >>Thanks,
> >>Richard 
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message