lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kay Kay <kaykay.uni...@gmail.com>
Subject Re: Is Lucene a good choice for PB scale mailbox search?
Date Tue, 24 Nov 2009 06:56:50 GMT
fulin tang wrote:
> We are going to add full-text search for our mailbox service .
>
> The problem is we have more than 1 PB mails there , and obviously we
> don't want to add another PB storage for search service , so we hope
> the index data will be small enough for storage while the search keeps
> fast .
>
> The lucky is that every user just search with mails of their own , so
> we can split the data into a lot of indexes instead of keeping them in
> a big one .
>   
If it is going to be sharded by the 'To' or 'Cc' list - then potentially 
the mail information is going to be duplicated proportional to the 
number of people in an email thread. Selecting some other dimension like 
time, for sharding  might be useful to begin with.
> So, after all these concerns ,  the question is , is lucene a good
> choice for this ? or which is the right way to do this ? Does anyone
> have done this  before ?
>   

With PB of storage - check out solr sharding / katta for prior work in 
this arena.

> All opinions and comments are welcome !
>
> fulin
>
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message