hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Segel" <danse...@gmail.com>
Subject Gigablast.com search engine, 10billion pages!!!
Date Thu, 05 Jun 2008 13:12:31 GMT
Our ultimate goal is to basically replicate gigablast.com search engine.
They claim to have less than 500 servers that contain 10billion pages
indexed, spidered and updated on a routine basis...  I am looking at
featuring 500 million pages indexed per node, and have a total of 20 nodes.
Each node will feature 2 quad core processes, 4TB (at raid 5) and 32 gb of
ram.  I believe this can be done however how many searches per second do you
think would be realistic in this instance?  We are looking at achieving
25+/- searches per second ultimately spread out over the 20 nodes... I can
really uses some advice with this one.

D. Segel

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message