lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Li Li <fancye...@gmail.com>
Subject Re: Question about Field.setOmitTermFreqAndPositions(true)
Date Mon, 31 May 2010 10:45:40 GMT
What about TermVector? it says in "lucene in action":

Term vectors are something a mix of between an indexed field and
a stored field. They are similar to a stored field because you can
quickly retrieve all term vector fields for a
given document: term vectors are keyed first by document ID. But then,
they are keyed secondarily by
term, meaning they store a miniature inverted index for that one
document. Unlike a stored field, where
the original String content is stored verbatim, term vectors store the
actual separate terms that were
produced by the Analyzer.This allows you to retrieve all terms, and
the frequency of their occurrence
within the document and sorted in lexicographic order, for a
particular indexed Field of a particular
Document.
  TermVector.YES – record the unique terms that occurred, and their
counts, in each document,
                              but do not store any positions or
offsets information.
  TermVector.WITH_POSITIONS – record the unique terms and their
counts, and also the
                              positions of each occurrence of every
term, but no offsets.
 TermVector.WITH_OFFSETS – record the unique terms and their counts,
with the offsets (start &
                              end character position) of each
occurrence of every term, but no positions.
 TermVector.WITH_POSITIONS_OFFSETS – store unique terms and their
counts, along with
                              positions and offsets.
 TermVector.NO – do not store

I am confused. what's the difference between TermVector and Index?
in an index, we can save postion information and also we can save it
in TermVector.
If I want to support phrase query, I must save position in index. And
if I want to support fast highlighter and similar like this, I have to
save TermVector.
How these information stored?
e.g. there are 2 docs using WhitespaceAnalyzer
1,  it is a good day    good night
2,  you are a good man

The index's data structure seems like:   good -> doc1 2(tf)  3  5; doc2 1(tf) 3
what about termvector?    like?   "lucene in action" says it indexed
first by doc id then term. I can't image it
2010/5/31 Andrzej Bialecki <ab@getopt.org>:
> On 2010-05-31 10:54, Uwe Schindler wrote:
>> No.
>
> See also LUCENE-2048 (nice round number ;) ).
>
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message