From java-user-return-30869-apmail-lucene-java-user-archive=lucene.apache.org@lucene.apache.org Mon Nov 05 12:02:07 2007 Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 39891 invoked from network); 5 Nov 2007 12:02:06 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 5 Nov 2007 12:02:06 -0000 Received: (qmail 37765 invoked by uid 500); 5 Nov 2007 12:01:48 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 37735 invoked by uid 500); 5 Nov 2007 12:01:48 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 37724 invoked by uid 99); 5 Nov 2007 12:01:48 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Nov 2007 04:01:48 -0800 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=FUZZY_CPILL,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [208.97.132.207] (HELO spunkymail-a9.g.dreamhost.com) (208.97.132.207) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Nov 2007 12:02:15 +0000 Received: from [192.168.0.3] (adsl-074-229-189-244.sip.rmo.bellsouth.net [74.229.189.244]) by spunkymail-a9.g.dreamhost.com (Postfix) with ESMTP id 7B9C621172 for ; Mon, 5 Nov 2007 04:01:25 -0800 (PST) Message-Id: From: Grant Ingersoll To: java-user@lucene.apache.org In-Reply-To: <7e536b1f0711042157j4b2ee68fld8043344b022095d@mail.gmail.com> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v912) Subject: Re: Group by in Lucene ? Date: Mon, 5 Nov 2007 07:01:23 -0500 References: <7e536b1f0711042157j4b2ee68fld8043344b022095d@mail.gmail.com> X-Mailer: Apple Mail (2.912) X-Virus-Checked: Checked by ClamAV on apache.org Solr has an issue outstanding right now that implements something that may be close to what you want. They are calling it Field Collapsing. See https://issues.apache.org/jira/browse/SOLR-236 -Grant On Nov 5, 2007, at 12:57 AM, Marcus Herou wrote: > Hi. > > I have a situation where I'm searching amongst some 100K feeds and > only want > one result per site in return. I have developed a really simple > method of > grouping which just scrolls through the resultset(hitset) until a > maxNum > docs of feeds from a set of unique sites is populated. Since I don't > wanna > reinvent the wheel, I want to know if Lucene has something like this > built. > I as well will use Solr soon and then my own homecooked recipe will > not work > so I really need a standard way of doing this. > > I know Nutch has something like it called depupField which default > is set to > 2. > > Anyone? > > > Kindly > > //Marcus > > -- > Marcus Herou Solution Architect & Core Java developer Tailsweep AB > +46702561312 > marcus.herou@tailsweep.com > http://www.tailsweep.com -------------------------- Grant Ingersoll http://lucene.grantingersoll.com Lucene Boot Camp Training: ApacheCon Atlanta, Nov. 12, 2007. Sign up now! http://www.apachecon.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org