lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lionel duboeuf <lionel.dubo...@boozter.com>
Subject Re: index per-user basis and document frequency
Date Tue, 16 Jun 2009 08:51:17 GMT
Ted Dunning wrote:
> I don't think that this would be such a great idea.
>
> Better to use a custom
> similarity<http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/search/Similarity.html>data
> structure.  Before you do that, though, you might try just using the
> overall corpus statistics and not worry about this per user indexing with
> specialized statistics.  If users' are no more different from each other
> than sub-corpora in a normal retrieval system then you are liable to get
> much better results using corpus wide stats than with user level stats.
>
> On Mon, Jun 15, 2009 at 2:06 PM, Lionel Duboeuf
> <lionel.duboeuf@boozter.com>wrote:
>   
ok, enven if i modify similarity measure, i will face polysemy problem.
e.g. the term "car" in english is different to the term "car" in french.
Also what is the best approach to calculate easily (and fastly) numDocs 
for a given user ?

thanks for your answer.

lionel





Mime
View raw message