lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: AW: Special field values
Date Wed, 13 Oct 2004 07:50:22 GMT
Michael,

On Wednesday 13 October 2004 08:45, Michael Hartmann wrote:
> | -----Urspr√ľngliche Nachricht-----
> | Von:
> | lucene-user-return-10665-michael.hartmann=web.de@jakarta.apach
> | e.org
> | [mailto:lucene-user-return-10665-michael.hartmann=web.de@jakar
>
> ta.apache.org] Im Auftrag von Paul Elschot
>
...
> |
> | A Lucene index can easily be used to determine whether or not
> | a term is in a field of a document:
> |
> | IndexReader.open(indexName).termDocs(new Term(term,
> | field)).skipTo(documentNr)
> |
> | returns the boolean indicating that.

(See also the correction I sent later).

> | What do you need the {0,1} values for?
> |
> | Regards,
> | Paul Elschot.
>
> Hi Paul,
>
> Thanks for your answer. The field should store a "vector" of values that
> indicate whether or not a term exists in a document or not. Just pure
> vanilla vector space model. I've read that Lucene has some kind of VSM but
> currently I don't understand how to handle that.

You may be able to use a TermDocs, which access the index by term
and walks the documents while providing the in document term frequencies.

In case you need to access the terms by document, 
there are term vectors in the development version of Lucene.
The getTermFreqVector() method of IndexReader returns
a TermFreqVector which gives access to the terms in a document field
and their frequencies, not just {0,1}. It has an even better taste than vanilla.
To enable this, indicate an option from Field.TermVector
in Lucene's Document Field when the document is added to the index by
IndexWriter.addDocument().


Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message