lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <>
Subject Re: Using Lucene for searching tokens, not storing them.
Date Sat, 15 Apr 2006 19:32:18 GMT
On Saturday 15 April 2006 19:25, karl wettin wrote:
> 14 apr 2006 kl. 18.31 skrev Doug Cutting:
> > karl wettin wrote:
> >> I would like to store all in my application rather than using the   
> >> Lucene persistency mechanism for tokens. I only want the search   
> >> mechanism. I do not need the IndexReader and IndexWriter as that  
> >> will  be a natural part of my application. I only want to use the  
> >> Searchable.
> >
> > Implement the IndexReader API, overriding all of the abstract  
> > methods. That will enable you to search your index using Lucene's  
> > search code.
> This was not even half as tough I thought it would be. I'm however  
> not certain about a couple of methods:
> 1. TermPositions. It returns the next position of *what* in the  
> document? It would make sence to me if it returned a start/end  
> offset, but this just confuses me.
> implements TermPositions {
>          /** Returns next position in the current document.  It is an  
> error to call
>           this more than {@link #freq()} times
>           without calling {@link #next()}<p> This is
>           invalid until {@link #next()} is called for
>           the first time.
>           */
>          public int nextPosition() throws IOException {
>              return 0; // todo
>          }

This enumerates all positions of the Term in the document
as returned by the Tokenizer used by the Analyzer (as normally
used by IndexWriter). The Tokenizer provides all terms as
analyzed, but here only the position of one term are enumerated.
Btw. this is why the index is called an inverted term index.

> 2. Norms. I've been looking in other code, but I honestly don't  
> understand what data they are storing, thus it's really hard for me  
> to implement :-) I read it as it contains the boost of each document  
> per field? So what does the byte represent then?

What is stored is a byte representing the inverse of the number of
indexed terms in a field of a document, as returned by a Tokenizer.

Paul Elschot

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message