lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ganesh" <>
Subject Re: Sharding Techniques
Date Tue, 10 May 2011 04:29:06 GMT
We are using similar technique as yours. We keep smaller indexes and use ParallelMultiSearcher
to search across the index. Keeping smaller indexes is good as index and index optimzation
would be faster.  There will be small delay while searching across the indexes.

1. What is your search time?
2. Is your index optimized?

I have a doubt, If we keep the indexes size to 30 GB then each file size (fdt, fdx etc) would
in GB's. Small addition or deletion to the file will not cause more IO as it has to skip those
bytes and write it at the end of file. 



----- Original Message ----- 
From: "Samarendra Pratap" <>
To: <>
Sent: Monday, May 09, 2011 5:26 PM
Subject: Sharding Techniques

> Hi list,
> We have an index directory of 30 GB which is divided into 3 subdirectories
> (idx1, idx2, idx3) which are again divided into 21 sub-subdirectories
> (idx1-1, idx1-2, ...., idx2-1, ...., idx3-1, ...., idx3-21).
> We are running with java 1.6, lucene 2.9 (going to upgrade to 3.1 very
> soon), linux (fedora core - kernel 2.6.17-13.1), reiserfs.
> We have almost 40 fields in each index (is it a bad to have so many
> fields?). most of them are id based fields.
> We are using 8 servers for search, and each of which receives approximately
> 3000/hour queries in peak hour and search time of more than 1 second is
> considered bad (is it really bad?) as per the business requirement.
> Since past few months we are experiencing issues (load and search time) on
> our search servers, due to which I am looking for sharding techniques. Can
> someone guide or give me pointers where i can read more and test?
> Keeping parts of indexes on different servers search on all of them and then
> merging the results - what could be the best approach?
> Let me tell you that most queries use only 6-7 indexes and 4 - 5 fields (to
> search for) but some queries (searching all the data) require all the
> indexes and are primary cause of the performance degradation.
> Any suggestions/ideas are greatly appreciated. And further more will
> sharding (or similar thing) really reduce search time? (load is a less
> severe issue when compared to search time)
> -- 
> Regards,
> Samar

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message