lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <markharw...@yahoo.co.uk>
Subject Re: Max size of index? How do search engines avoid this?
Date Mon, 18 May 2009 10:17:51 GMT

>techniques used by big search engines to search among such huge data.

Two keywords here - partitioning and replication.

Partitioning is breaking the content down into shards and assigning shards to servers. These
can then be queried in parallel to make search response times independent of the data volumes
being searched. I seem to remember a quote that a single Google search currently gets spread
across ~1,000 servers in parallel.

Replication is about handling user volumes - take each shard and assign it to many replica
servers then load balance requests across them to spread the load. This also gives you redundancy
and helps in recovery from machine failure.


You may want to take a look at Solr to help you with this.

Cheers
Mark





----- Original Message ----
From: raistlink <elamas@gmail.com>
To: java-user@lucene.apache.org
Sent: Monday, 18 May, 2009 10:42:02
Subject: Max size of index? How do search engines avoid this?


Hi,
I think I've read that there is a limit for de index, may be 2Gb for fat
machines. If this is right I ask you for good resources (webs or books)
about programming search engines to know about the techniques used by big
search engines to search among such huge data.

Thanks
-- 
View this message in context: http://www.nabble.com/Max-size-of-index--How-do-search-engines-avoid-this--tp23594241p23594241.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message