lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <>
Subject Re: Sharding Techniques
Date Mon, 09 May 2011 13:06:33 GMT
30Gb isn't that big by lucene standards.  Have you considered or tried
just having one large index?  If necessary you could restrict searches
to particular "indexes", or groups thereof, by a field in the combined
index, preferably used as a filter.  If the slow searches have to
search across 63 separate indexes it is perhaps not surprising that
they are slow.  What do you do about sharing or caching
searcher/reader instances?  There are lots of useful tips on

40 fields isn't that many - should be fine.

On sharding/scaling/etc,
looks well worth a read.


On Mon, May 9, 2011 at 12:56 PM, Samarendra Pratap <> wrote:
> Hi list,
>  We have an index directory of 30 GB which is divided into 3 subdirectories
> (idx1, idx2, idx3) which are again divided into 21 sub-subdirectories
> (idx1-1, idx1-2, ...., idx2-1, ...., idx3-1, ...., idx3-21).
> We are running with java 1.6, lucene 2.9 (going to upgrade to 3.1 very
> soon), linux (fedora core - kernel 2.6.17-13.1), reiserfs.
> We have almost 40 fields in each index (is it a bad to have so many
> fields?). most of them are id based fields.
> We are using 8 servers for search, and each of which receives approximately
> 3000/hour queries in peak hour and search time of more than 1 second is
> considered bad (is it really bad?) as per the business requirement.
> Since past few months we are experiencing issues (load and search time) on
> our search servers, due to which I am looking for sharding techniques. Can
> someone guide or give me pointers where i can read more and test?
> Keeping parts of indexes on different servers search on all of them and then
> merging the results - what could be the best approach?
> Let me tell you that most queries use only 6-7 indexes and 4 - 5 fields (to
> search for) but some queries (searching all the data) require all the
> indexes and are primary cause of the performance degradation.
> Any suggestions/ideas are greatly appreciated. And further more will
> sharding (or similar thing) really reduce search time? (load is a less
> severe issue when compared to search time)
> --
> Regards,
> Samar

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message