lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: Sharding Techniques
Date Wed, 11 May 2011 08:59:10 GMT
Ganesh


Nobody is saying that sharding is never a good idea - it just doesn't
seem to be applicable in the case being discussed.  On my indexes I
care much more about speed of searching rather than speed of indexing.
 The latter typically happens in the background in the dead of night
and within reason I don't really care how long it takes.  Your
application and requirements will be different and you may come to
different conclusions.


--
Ian.


On Wed, May 11, 2011 at 6:09 AM, Ganesh <emailgane@yahoo.co.in> wrote:
>
> We also use similar kind of technique, breaking indexes in to smaller and search using
ParallelMultiSearcher. We have to do incremental indexing and the records older than 6 months
or 1 year (based on ageout setting) should be deleted. Having multiple small indexes is really
fast in terms of indexing.
>
> Since you guys mentioned about keeping single large index. Search time woule be faster
but the indexing and index optimization will take more time. How you are handling it in case
of incremental indexing. If we keep the indexes size to 100+ GB then each file size (fdt,
fdx etc) would in GB's. Small addition or deletion to the file will not cause more IO as it
has to skip those bytes and write it at the end of file.
>
> Regards
> Ganesh
>
> ----- Original Message -----
> From: "Burton-West, Tom" <tburtonw@umich.edu>
> To: <java-user@lucene.apache.org>
> Sent: Tuesday, May 10, 2011 9:46 PM
> Subject: RE: Sharding Techniques
>
>
> Hi Samar,
>
>>>Normal queries go fine under 500 ms but when people start searching
>>>"anything" some queries take up to > 100 seconds. Don't you think
>>>distributing smaller indexes on different machines would reduce the average
>>>.search time. (Although I have a feeling that search time for smaller queries
>>>may be slightly increased)
>
> What are the characteristics of your slow queries?  Can you give examples?   Are the
slow queries always slow or only under heavy load?   What the bottleneck is and whether splitting
into smaller indexes would help depends on just what your bottleneck is. It's not clear that
your index is large enough that the size of the index is causing your bottleneck.
>
> We run indexes of about 350GB with average response times under 200ms and 99th percentile
reponse times of under 2 seconds. (We have a very low qps rate however).
>
>
> Tom
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message