lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Weighted cosine similarity calculation using Lucene
Date Fri, 20 Apr 2012 11:44:13 GMT
Maybe I'm missing something here, but why not just boost the
terms in the fields at query time?

Best
Erick

On Fri, Apr 20, 2012 at 4:20 AM, Kasun Perera <kasunp@opensource.lk> wrote:
> I have documents that are marked up with Taxonomy and Ontology terms
> separately.
> When I calculate the document similarity, I want to give higher weights to
> those Taxonomy terms and Ontology terms.
>
>
> When I index the document, I have defined the Document content, Taxonomy
> and Ontology terms as Fields for each document like this in my program.
>
>
> *Field ontologyTerm= new Field("fiboterms", fiboTermList[curDocNo],
> Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.YES);*
>
> *Field taxonomyTerm = new Field("taxoterms", taxoTermList[curDocNo],
> Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.YES);*
>
> *Field document = new Field(docNames[curDocNo], strRdElt,
> Field.TermVector.YES);*
>
>
>
> I’m using Lucene index .TermFreqVector functions to calculate TFIDF values
> and, then calculate cosine similarity between two documents using TFIDF
> values.
>
>
> For give weights to Ontology and Taxonomy terms when calculating the cosine
> similarity, what I can do is, programmatically multiply the Taxonomy
> and Ontology
> term frequencies with defined weight factor before calculating the TFIDF
> scores. Will this give higher weight to Taxonomy and Ontology terms in
> document similarity calculation?
>
>
> Are there Lucene functions that can be used to give higher weights to the
> certain fields when calculating TFIDF values using TermFreqVector? can I
> just use the setboost() function for this purpose, then how?
>
> --
> Regards
>
> Kasun Perera

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message