Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 40979 invoked from network); 5 Jun 2008 13:13:08 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 5 Jun 2008 13:13:08 -0000 Received: (qmail 10603 invoked by uid 500); 5 Jun 2008 13:13:07 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 10571 invoked by uid 500); 5 Jun 2008 13:13:06 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 10560 invoked by uid 99); 5 Jun 2008 13:13:06 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Jun 2008 06:13:06 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dansegel@gmail.com designates 209.85.146.183 as permitted sender) Received: from [209.85.146.183] (HELO wa-out-1112.google.com) (209.85.146.183) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Jun 2008 13:12:09 +0000 Received: by wa-out-1112.google.com with SMTP id m33so431134wag.9 for ; Thu, 05 Jun 2008 06:12:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:mime-version:content-type; bh=6c6Il0eAlI/evkY2Mnw7zC7OYI4Eg6RkCir8FqphRwg=; b=HisrML3KeMN4ZNLyAMAG87TpSHEwalsSEqGLrI1o3hfVAQF6lu0jTtOO/A4YV1jo4l tJFDKF6bmzXEmYhjmoFz1WqIRxXzVTgtZW093PoSiF206AUhpAVjPXzb4uWzAJ8spDw5 +l2we+g63xn/fJHSnkz+/rg3BVF8HMoF6o8cA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:mime-version:content-type; b=HFHqo52lgh/tWDVLRKm6+n9sfHkGs5bHMvWuRzmN6VsRKPzVSW14FWdUkuqWXRVf/u Lwtj6VE3YgBYbaDTM+sodr4fGaO75IkYxxhLmHBP8T/00MRyaIPlhpkgFMbljWoiJMQ8 5dN8fZuX7+fu9b36qpzwQnpMaBECvS5+XJMyM= Received: by 10.114.146.1 with SMTP id t1mr1443950wad.76.1212671551362; Thu, 05 Jun 2008 06:12:31 -0700 (PDT) Received: by 10.115.46.12 with HTTP; Thu, 5 Jun 2008 06:12:31 -0700 (PDT) Message-ID: <499802fc0806050612u1e711e21u466887582767dd7d@mail.gmail.com> Date: Thu, 5 Jun 2008 09:12:31 -0400 From: "Dan Segel" To: core-user@hadoop.apache.org Subject: Gigablast.com search engine, 10billion pages!!! MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_1595_1833856.1212671551357" X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_1595_1833856.1212671551357 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Our ultimate goal is to basically replicate gigablast.com search engine. They claim to have less than 500 servers that contain 10billion pages indexed, spidered and updated on a routine basis... I am looking at featuring 500 million pages indexed per node, and have a total of 20 nodes. Each node will feature 2 quad core processes, 4TB (at raid 5) and 32 gb of ram. I believe this can be done however how many searches per second do you think would be realistic in this instance? We are looking at achieving 25+/- searches per second ultimately spread out over the 20 nodes... I can really uses some advice with this one. Thanks, D. Segel ------=_Part_1595_1833856.1212671551357--