lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Burton-West <tburtonw...@gmail.com>
Subject Re: Limit of Index size per machine..
Date Thu, 06 Aug 2009 20:19:07 GMT

Hello,

I think you are confusing the size of the data you want to index with the
size of the index.  For our indexes (large full text documents) the Solr
index is about 1/3 of the size of the documents being indexed.  For 3 TB of
data you might have an index of 1 TB or less.  This depends on many factors
in your index configuration, including whether you store fields.

What kind of performance do you need for indexing time and for search
response time?

We are trying to optimize search response time and  have been running tests
on a 225GB Solr index with 32GB of ram and are getting 95% of our test
queries returning in less than a second.  However, the slowest 1% of queries
are returning 5 and 10 seconds.

On the other hand it takes almost a week to index about 670GB of full text
documents.

We will be scaling up to 3 million documents which will be about 2 TB of
text and 0.75 TB index size.  We plan to distribute the index across 5
machines.

More information on our setup and results is available
at:http://www.hathitrust.org/blogs/large-scale-search

Tom
> > The expected processed log file size per day: 100 GB
> > We are expecting to retain these indexes for 30 days
> (100*30 ~ 3 TB).


>>>That means we need approximately 3000 GB (Index Size)/24 GB (RAM) = 125
servers. 

It would be very hard to convince my org to go for 125 servers for log
management of 3 Terabytes of indexes. 

Has any one used, solr for processing and handling of the indexes of the
order of 3 TB ? If so how many servers were used for indexing alone.

Thanks,
sS

-- 
View this message in context: http://www.nabble.com/Limit-of-Index-size-per-machine..-tp24833163p24853662.html
Sent from the Solr - User mailing list archive at Nabble.com.


Mime
View raw message