Return-Path: Delivered-To: apmail-lucene-general-archive@www.apache.org Received: (qmail 79434 invoked from network); 16 Jun 2009 08:51:46 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 16 Jun 2009 08:51:46 -0000 Received: (qmail 70946 invoked by uid 500); 16 Jun 2009 08:51:57 -0000 Delivered-To: apmail-lucene-general-archive@lucene.apache.org Received: (qmail 70872 invoked by uid 500); 16 Jun 2009 08:51:57 -0000 Mailing-List: contact general-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@lucene.apache.org Delivered-To: mailing list general@lucene.apache.org Received: (qmail 70862 invoked by uid 99); 16 Jun 2009 08:51:56 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Jun 2009 08:51:56 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [212.27.42.5] (HELO smtp5-g21.free.fr) (212.27.42.5) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Jun 2009 08:51:47 +0000 Received: from smtp5-g21.free.fr (localhost [127.0.0.1]) by smtp5-g21.free.fr (Postfix) with ESMTP id 65818D4808F; Tue, 16 Jun 2009 10:51:20 +0200 (CEST) Received: from [192.168.0.10] (lns-bzn-46-82-253-230-100.adsl.proxad.net [82.253.230.100]) by smtp5-g21.free.fr (Postfix) with ESMTP id 60686D480CC; Tue, 16 Jun 2009 10:51:18 +0200 (CEST) Message-ID: <4A375D05.2000601@boozter.com> Date: Tue, 16 Jun 2009 10:51:17 +0200 From: lionel duboeuf User-Agent: Thunderbird 2.0.0.21 (X11/20090409) MIME-Version: 1.0 To: general@lucene.apache.org CC: boozter-tbm@boozter.com Subject: Re: index per-user basis and document frequency References: <4A36B7CC.3000108@boozter.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Ted Dunning wrote: > I don't think that this would be such a great idea. > > Better to use a custom > similaritydata > structure. Before you do that, though, you might try just using the > overall corpus statistics and not worry about this per user indexing with > specialized statistics. If users' are no more different from each other > than sub-corpora in a normal retrieval system then you are liable to get > much better results using corpus wide stats than with user level stats. > > On Mon, Jun 15, 2009 at 2:06 PM, Lionel Duboeuf > wrote: > ok, enven if i modify similarity measure, i will face polysemy problem. e.g. the term "car" in english is different to the term "car" in french. Also what is the best approach to calculate easily (and fastly) numDocs for a given user ? thanks for your answer. lionel