lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Vector Space Model: New Similarity Implementation Issues
Date Thu, 28 Feb 2008 12:05:15 GMT
Not sure I am understanding what you are asking, but I will give it a  
shot.   See below


On Feb 26, 2008, at 3:45 PM, Dharmalingam wrote:

>
> Hi List,
>
> I am pretty new to Lucene. Certainly, it is very exciting. I need to
> implement a new Similarity class based on the Term Vector Space  
> Model given
> in http://www.miislita.com/term-vector/term-vector-3.html
>
> Although that model is similar to Lucene’s model
> (http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/search/Similarity.html

> ),
> I am having hard time to extend the Similarity class to calculate that
> model.
>
> In that model, “tf” is multiplied with Idf for all terms in the  
> index, but
> in Lucene “tf” is calculated only for terms in the given Query.  
> Because of
> that effect, the norm calculation should also include “idf” for all  
> terms.
> Lucene calculates the norm, during indexing, by “just” counting the  
> number
> of terms per document. In the web formula (in miislita.com), a  
> document norm
> is calculated after multiplying “tf” and “idf”.

Are you wondering if there is a way to score all documents regardless  
of whether the document has the term or not?  I don't quite get your  
statement: "In that model, “tf” is multiplied with Idf for all terms  
in the index, but in Lucene “tf” is calculated only for terms in the  
given Query."

Isn't the result for those documents that don't have query terms just  
going to be 0 or am I not fully understanding?  I briefly skimmed the  
paper you cite and it doesn't seem that different, it's just  
describing the Salton's VSM right?

>
>
> FYI: I could implement “idf” according to miisliat.com formula, but  
> not the
> “tf” and “norm”
>
> Could you please comment me how I can implement a new Similarity  
> class that
> will fit in the Lucene’s architecture, but still implement the  
> vector space
> model given in miislita.com

In the end, you may need to implement some lower level Query classes,  
but I still don't fully understand what you are trying to do, so I  
wouldn't head down that path just yet.

--------------------------
Grant Ingersoll
http://www.lucenebootcamp.com
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ






---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message