lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: tf/idf similarity with modified document similarity
Date Sat, 08 Mar 2014 01:27:54 GMT
Do you expect to have relatively large or relatively small result sets? For 
the former, are you willing to accept slow performance? I mean, your logic 
will have to scan all of the documents and fetch and check their term 
frequencies to count up df for each desired term. Maybe at least some of 
that info is hanging around as part of the query matching process.

Still, that is a reasonable feature to want and it has been requested 
before. Worth a Jira.

-- Jack Krupansky

-----Original Message----- 
From: Christian Reuschling
Sent: Thursday, March 6, 2014 1:34 PM
To: java-user@lucene.apache.org
Subject: tf/idf similarity with modified document similarity

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

what is the best method to score documents similar to default similarity, 
but the document
frequency should be calculated per query against the matching result 
document set, not statically
against the whole corpus.

Didn't found a good and performant solution yet.

Thank you!

Christian
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlMYv6AACgkQ6EqMXq+WZg+cjQCbBCwxnGyn18kEEbJ2aHbiyTNv
xpcAnRho4H/YGKzsmoOXN91+06nruhHa
=g3Ka
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message