lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Indexing documents through a set of (word, weight) pairs
Date Tue, 06 Jul 2004 15:30:01 GMT
Bok Igor,

For the first use case, you are really looking for what's called a
Forward Index.  If my memory serves me well, there was a project that
used Lucene at MIT called Haystack, and its author developed code that
worked with Lucene and created forward indices.  I never actually tried
it.

If that doesn't do it for you, see this:
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReader.html#getTermFreqVectors(int)

For the second use case - well, that is precisely what Lucene does. :)

It sounds like you are going in the right direction.  I suggest reading
a few Lucene articles (links on the Lucene Wiki) to get started.

Otis



--- Igor Perisic <iperisic@entopia.com> wrote:
> Hi Lucene experts:
> 
>    We are trying to build a simple document index on top of Lucene. 
> 
>    We have: 
> 	Given a document, there is a list of terms (e.g. word and weight
> pairs). 
> 	The queries we want to be able to handle are:
> 		* Given a document, what are the terms?
> 		* Given some terms, what are the documents? 
> 		Note here that the above weights are used for our own customized
> scoring.
> 
> We want to use Lucene as much as possible (not wanting to reinvent
> the wheel), what are our options?
> 
> We can reuse some of the classes such as TermInfosWriter/Reader to
> store our lexicon, but is there more stuff in Lucene we can take
> advantage of? Are we going in the right direction?
> 
> 
> 
> Cheers,
> 
> 		Igor
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message