lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <ysee...@gmail.com>
Subject Re: Weird time results doing wildcard queries
Date Thu, 08 Sep 2005 22:57:54 GMT
The Hits class collects the document ids from the query in batches. If you 
iterate beyond what was collected, the query is re-executed to collect more 
ids.

You can use the expert level search methods on IndexSearcher if this isn't 
what you want.

-Yonik

On 9/8/05, Richard Krenek <richard.krenek@gmail.com> wrote:
> 
> I understand that for the query, but why does it matter once you have the
> Hits object? That is the part I'm baffled on. The query with the wildcard 
> in
> the front takes a lot longer, but we expected that.
> 
> On 9/8/05, Jeremy Meyer <jakmcbane@gmail.com> wrote:
> >
> > The issue isn't with multiple wildcards exactly. Specifically, the 
> problem
> > is if the query starts with a wildcard. In the case where it starts with 
> a
> > wildcard, lucene has no option but to linearly go over every term in the
> > index to see if it matches your pattern. It must visit every singe term 
> in
> > the index. If it doesn't start with a wildcard, lucene can skip to the
> > relevant part of the index and only visit the relevant terms. For this
> > reason, many people that use Lucene choose to disable having wildcard at 
> the
> > start of a search term. This is discussed in the "Lucene in Action" 
> book.
> >
> > ~Jack~
> >
> > >>Hello All,
> > >>I am getting some weird time results when retrieving documents back 
> from
> > a hits object. I am just timing this bit of code:
> > >>Hits hits = searcher.search(query);
> > >>long startTime = System.currentTimeMillis(); for (int i = 0; i <
> > hits.length(); i++) { Document doc = hits.doc(i); String field = doc.get
> (defaultField);
> > } System.out.println("Cycle Time: "+(System.currentTimeMillis
> > ()-startTime));
> > >>
> > >>It seems when I have a wilcard query like *abcd* vs weqrew*, the 
> *abcd*
> > query will always take longer to retrieve the documents even if they are 
> of
> > simular result sizes. We are talking a big difference 1 second vs 16. It 
> is
> > consistent no matter >>what order I run the queries in, terms with 
> multiple
> > wildcards always take longer to retrieve the documents. I am not 
> counting
> > the time of the query.
> > >>
> > >>The index is 2.18 GB, 9 fields per document, 10,694,190 documents,
> > >>25,538,793 terms and has been optimized.
> > >>
> > >>I am not sure if this is a real or just a percieved issue. We cannot
> > figure out why the type of query would affect the speed it takes to 
> retrieve
> > each document. We have run this on both Windows XP and Linux. With the 
> same
> > results. Also to >>note we did watch GC and this did not have any
> > significant impact that we could se.
> > >>
> > >>We are trying to figure out what could cause this and how we can work
> > around it.
> > >>
> > >>
> > >>Thanks,
> > >>Richard
> >
> 
> 


-- 
-Yonik
Now hiring -- http://tinyurl.com/7m67g

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message