lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vikas Gupta <vgu...@cs.utexas.edu>
Subject Exact search algorithm used in lucene
Date Thu, 09 Dec 2004 10:17:09 GMT
Hi developers,

    I am a new user/developer of lucene. I read Doug Cutting's paper
"Space Optimizations for Total Ranking". It has a number of algorithms for
searching in an index.

    I was curious which one(s) does lucene implement.

    Does it have something like "parallel merge" (Figure 4 in the paper)?
I think it wouldn't use simple inverted index search(Fig 2 in the paper)
because it is costlier (it takes O(N) space where N is number of documents
in the collection)?

    I have been able to drive lucene by nutch (thru eclipse java
debugger). I am trying to find out the search algorithm used for phrase
queries and regular queries. Which file has the code to read in the next
posting? If I could find the function which actually reads the indices
during search, then I could a breakpoint there and understand that code. I
have been able to follow a search upto this point

----    IndexSearch.java file---------
  public TopDocs search(Query query, Filter filter, final int nDocs)
       throws IOException {
    Scorer scorer = query.weight(this).scorer(reader);
    ...
    return new TopDocs(totalHits[0], scoreDocs);
  }


I realized that the first line actually does the core search - i.e.
getting the list of relevant documents.

Scorer scorer = query.weight(this).scorer(reader);

Is that correct? Things get a little hazy after I step into this function.
Can you point me to what's happening with buckets, scorers, weights? Can
someone write a small paragraph about the basic strategy being used here?

Since, I am not fully sure about the big picture, i.e. the exact algorithm
being used - it is difficult to follow the code.


Also, I was curious if there is some sort of Getting started guide for
developers? The Gettting started docs and FAQs for lucene users is very
extensive.

Thanks for reading this and your time.

 ____________________________________________________________________
 Vikas Gupta                   Email: vgupta@cs.utexas.edu
 Masters Student (Graduating in 2 weeks)
 Dept. of Computer Sciences,   http://www.cs.utexas.edu/users/vgupta
 Univ. of Texas at Austin, USA
 ____________________________________________________________________

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message