Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DEF5FDAE1 for ; Fri, 31 Aug 2012 00:42:25 +0000 (UTC) Received: (qmail 87814 invoked by uid 500); 31 Aug 2012 00:42:22 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 87774 invoked by uid 500); 31 Aug 2012 00:42:22 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 87766 invoked by uid 99); 31 Aug 2012 00:42:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 31 Aug 2012 00:42:22 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of eirikrwu@gmail.com designates 209.85.223.176 as permitted sender) Received: from [209.85.223.176] (HELO mail-ie0-f176.google.com) (209.85.223.176) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 31 Aug 2012 00:42:16 +0000 Received: by iecs9 with SMTP id s9so1525937iec.35 for ; Thu, 30 Aug 2012 17:41:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=ei3KABUrKOq3SABWTV/XYPFNGsBNbikxKygN3yLXXKE=; b=CUCuPMTIO4IhDe2L0hU+zGRXxeEy0XPNzB4oulyCwrpHURoWKnM1xAz8XWroEDihkM 2OReko6IcADuHo83U22dGD0/SE2xKmHX7CwhG9BPYOJpJIEdLfzlN3QszMDuymGJOhK7 8Zlb/jDImJmasDDzPeTTYIbAlIUF7hy0hWNXPs/d3Ly5b5ZwbiTKCyjYCSoTguynWVA5 qQK7YQNvUK8t0l6v5UQC//CgVrV8FZDKOhIXVZzomU5dP835NaUck60Yi8dRcjwJer0z kbSRf3ov5V8oMJMfl1DJy+eXxzgqJj3a5ig2lM6oaaVgRUC5hZfWXXTSQL0gkPuYBgc0 j6pA== MIME-Version: 1.0 Received: by 10.42.85.69 with SMTP id p5mr6762688icl.24.1346373715549; Thu, 30 Aug 2012 17:41:55 -0700 (PDT) Received: by 10.50.108.3 with HTTP; Thu, 30 Aug 2012 17:41:55 -0700 (PDT) In-Reply-To: References: <6C78E97C707B5B4C8CC61D44F8754586392857@SUEX10-mbx-03.ad.syr.edu> Date: Fri, 31 Aug 2012 08:41:55 +0800 Message-ID: Subject: Re: Solr4 distributed IDF From: Eric Wu To: solr-user@lucene.apache.org Content-Type: multipart/alternative; boundary=20cf303347bd0cdc1f04c8850fdd --20cf303347bd0cdc1f04c8850fdd Content-Type: text/plain; charset=ISO-8859-1 Hi Walter, Thank you for your help. I think you are right, the most important issue here is "the most selective terms are rare". So I probably still need to implement distributed IDF to get better results. On Fri, Aug 31, 2012 at 8:36 AM, Walter Underwood wrote: > That is true if you randomly distribute the documents. If they are > distributed according to topic, there can be some big anomalies. > > Also, the DFs for rare terms will have bigger errors. There is some > statistical theorem about this, but I can't remember it right now. Thanks > to Zipf, most of your terms are rare. Also, the most selective terms are > rare. > > wunder > > On Aug 30, 2012, at 5:25 PM, Lance Norskog wrote: > > > The math for "confidence values" in probability theory shows that > > distributed DF does not matter after not very many documents. If you > > have 10s of thousands of documents in each shard, don't worry. > > > > On Thu, Aug 30, 2012 at 1:19 PM, Steven A Rowe wrote: > >> Hi Ke, > >> > >> Have you seen ? > >> > >> Steve > >> > >> -----Original Message----- > >> From: Eric Wu [mailto:eirikrwu@gmail.com] > >> Sent: Thursday, August 30, 2012 3:05 AM > >> To: solr-user@lucene.apache.org > >> Subject: Solr4 distributed IDF > >> > >> Hi there, > >> > >> Does there exist any issue ticket about the distributed IDF feature in > >> solr4? Or maybe there already have some patches that I can use? Thank > you > >> very much. > >> > >> -- > >> Ke Wu, > >> Best Regards > > > > > -- Ke Wu, Best Regards --20cf303347bd0cdc1f04c8850fdd--