lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Weird time results doing wildcard queries
Date Thu, 08 Sep 2005 22:40:17 GMT
: is if the query starts with a wildcard. In the case where it starts with a
: wildcard, lucene has no option but to linearly go over every term in the
: index to see if it matches your pattern. It must visit every singe term in

That would explain why the search itself takes a while, but not why
accessing the hits after the call to search would take a while.  note
where the timing code is in his example.

There are two possible explanations i can think of...

: >>It seems when I have a wilcard query like *abcd* vs weqrew*, the *abcd*
: query will always take longer to retrieve the documents even if they are of
: simular result sizes. We are talking a big difference 1 second vs 16. It is

1) How similar, and how many? ... If i remember correctly, the Hits
constructor does some work to pre-fetch the first 100 results.  So if you
are iterating over all of the results, the first 100 are free.  On the
101st iteration the prefetching method is called again to fetch N more (i
don't remember what N is off the top of my head.

what this means is that if you are only timing the method calls on Hits,
then the first 100 documents are free -- if one wildcard search returns 99
results, and the other returns 105 results, those numbers may not seemthat
different, but in the first case the code you are timing is accessing
nothing but memory, and in the second case it has to read from disk.

2) The second idea also requires you to answer a question" the number of
results returned for each query might be identicle, but are the
results themselves identical?

I'm guessing that either the documents from the "slow" case are either
much bigger (ie: larger stored fields) or the results from the fast case
are all documents that are "near" eachother on disk, so fetching back all
of hte stored fields would require less IO then if the results are stored
farther apart.  If i remember correctly, the stored fields of documents
are kept in order that the documents are added, so hypothetically, the
query you did was on a "name" field, and the documents were added to the
index in alphabetical order by "name" then by definition the results for
"weqrew*' will all be close together, while the results for "*abcd*" will
be spread out throughout the index.

an easy way to disprove that 2nd theory would be to change your timing
code to this and see what happens...


  Hits hits = searcher.search(query);
  long startTime = System.currentTimeMillis();
  for (int i = 0; i < hits.length(); i++) {
     int id = hits.id(i);
  }



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message