Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 86756 invoked from network); 3 Feb 2010 01:58:08 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 3 Feb 2010 01:58:08 -0000 Received: (qmail 93848 invoked by uid 500); 3 Feb 2010 01:58:06 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 93773 invoked by uid 500); 3 Feb 2010 01:58:06 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 93763 invoked by uid 99); 3 Feb 2010 01:58:06 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Feb 2010 01:58:06 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of anshumg@gmail.com designates 209.85.216.186 as permitted sender) Received: from [209.85.216.186] (HELO mail-px0-f186.google.com) (209.85.216.186) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Feb 2010 01:57:57 +0000 Received: by pxi16 with SMTP id 16so774409pxi.29 for ; Tue, 02 Feb 2010 17:57:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type; bh=tBdJR61SQ2b0N83deqcbAm41qp7/AXp5t6l7GXb8ktg=; b=xK4t7GiaKfMjkWjyowsqu1MkrtsNSVDlDnZcEUVDGTtXMfwkk/D+faQ1ezUdPZRs+q aWTn+oG3pTgSe1ooTyY8JTM6QVX0po1XC8NrC7eu4G8yA3Izx/gZi5lT05iVvETyCkjF ufH7iWePrWP5GyOPMlniW6bbNDYC87MP3x7iA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=RK1X/Vm22DuHEoK33ei/o+gt+EiRKhRUWaUoddQn/Lch+9VdnpH1t4wlG9g3qJZDSb Q23MBitULZ8WVQn8ByAsIfQQJvq7kTtg74OqMg9QY5gKf9AKRZlepPGpucsDX6yhhXZR 9QqLuANTjd9Kxvb+0I9mhBhgkTSL+kqDhhJBg= MIME-Version: 1.0 Received: by 10.114.248.25 with SMTP id v25mr4597697wah.74.1265162256219; Tue, 02 Feb 2010 17:57:36 -0800 (PST) In-Reply-To: <805803.47041.qm@web53208.mail.re2.yahoo.com> References: <805803.47041.qm@web53208.mail.re2.yahoo.com> From: Anshum Date: Wed, 3 Feb 2010 07:27:16 +0530 Message-ID: <867513fe1002021757t53533a49he725a0dda1a77b8@mail.gmail.com> Subject: Re: Limiting search result for web search engine To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=0016e64ddac8dd80a4047ea88a81 X-Virus-Checked: Checked by ClamAV on apache.org --0016e64ddac8dd80a4047ea88a81 Content-Type: text/plain; charset=ISO-8859-1 Hi Mike, Not really through queries, but you may do this by writing a custom collector. You'd need some supporting data structure to mark/hash the occurrence of a domain in your result set. -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to everybody, the opinions to me. The distinction is yours to draw............ On Wed, Feb 3, 2010 at 6:56 AM, Mike Polzin wrote: > I am working on building a web search engine and I would like to build a > reults page similar to what Google does. The functionality I am looking to > include is what I refer to a "rolling up" sites, meaning that even if a > particular site (defined by its base URL) has many relevent hits on various > pages for the searches keywords, that site is only shown once in the results > listing with a link to the most relevent hit on that site. What I do not > want is to have one site dominate a search results page. > > Does it make sense to just do the search, get the hits list and then > programatically remove the results which, although they meet the search > criteria, are not as relevent? Is there a way to do this through queries? > > Thanks in advance! > > Mike > > > --0016e64ddac8dd80a4047ea88a81--