lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saurabh Gokhale <saurabhgokh...@gmail.com>
Subject MoreLikeThis and TermVector relationship
Date Tue, 25 Oct 2011 04:23:48 GMT
Hi,

In my project, my intention is to show similar documents to the user based
on the documents searched by the user.

*As per Lucid Solr reference guide...*
For best results, use stored TermVectors in the schema.xml for fields
specified for similarity. For example: <field name="cat" ...
termVectors="true" />
If termVectors are not stored, MoreLikeThis will generate terms from stored
fields

Now since I am using lucene and not Solr, I will ask question from Lucene
point of view:

1. What is the difference between the below 2 index statements. As per my
understanding first one does not store separate TermVector and second does.

new Field("title", data.getTitle() , Field.Store.NO <http://field.store.no/>,
Field.Index.ANALYZED)
new Field("title", data.getTitle() , Field.Store.NO <http://field.store.no/>,
Field.Index.ANALYZED, Field.TermVector.YES)

So if that is the case, how will it impact MoreLikeThis Searching?

2. Also how much difference does it make in the match results when I enable
TermVectors and when i dont?
I found 2 interesting things:
A. Lucene index size got almost tripled (for my data) when I enable
TermVectors.
B. When I used MoreLikeThis on the index which had term Vector and on the
index which did not specifically had TermVector enabled, both morelikethis
results were exactly same, so what is the advantage of TermVector?

Thanks

Saurabh

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message