hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Watt <sw...@us.ibm.com>
Subject Re: 5 billion pages indexed and searchable.
Date Wed, 11 Nov 2009 16:28:12 GMT
Hi Dan

I might be stating the obvious here, but have you looked at Nutch ? Nutch 
uses Hadoop and is able to crawl, index and search (using Lucene). We've 
been using it for awhile and it works well.

Kind regards
Steve Watt



From:
Dan Segel <dansegel@gmail.com>
To:
common-dev@hadoop.apache.org
Date:
11/11/2009 07:58 AM
Subject:
5 billion pages indexed and searchable.



I am looking to develop a search engine that will can handle 25 searches 
per
second and have 5+ billion pages indexed.  I intend to use the hardware
below connected with fiber (of corse), do you think this is overkill, or 
am
I falling way short.

I plan I buying 32 servers (actually contained inside 2 blade servers), 
each
configured as follows:
http://www.eztradelive.com/product.php?productid=157&cat=45&page=1
> Dual 2.0GHz Quad Core CPU
> 2 x 300GB 2.5" SAS HDD (RAID)
> 16GB DDR2 RAM

+ 60 TB of storage at RAID 10 with fiber connection.


Dan



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message