lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Collins <>
Subject Re: Optimizing indexes with mulitiple processors?
Date Fri, 10 Jun 2005 16:33:11 GMT
How many documents did you try to index?  I am using a relatively large
minMergeDoc that causes me to run out of memory when I make such a change. (I
am using 1/2 gb of heap btw).  I believe changing it in the outputstream object
means that a lot of in memory only objects use that size too...I assume that
the real bang for the buck is for the FSOutputStream and FSInputStream. 
Unravelling that case drops me into array copy issues that I have to debug.

I dont know I would of used truss in this regard, this only points out what
size hit the kernel not what went over the wire.  I would suggest using
ethereal to ensure thats how its ending up on the wire.  As for what goes over
the wire, thats something the cifs / nfs client negotiates with the server.  I
believe NetApp for instance supports upto 32k on nfs and almost 64k with cifs. 



--- "Peter A. Friend" <> wrote:

> On Jun 9, 2005, at 11:52 PM, Chris Collins wrote:
> > In that case I have a different performance issue, that is that  
> > FSInputStream
> > and FSOutputStream inherit the buffer size of 1k from OS and IS   
> > This would be
> > useful to increase to reduce the amount of RPC's to the filer when  
> > doing merges
> > ..... assuming that reads and writes are sequential (CIFS supports  
> > a 64k block
> > and NFS supports upto I think 32k).  I haven't spent much time on  
> > this so far
> > so its not like I know its hard todo.  From preliminary experiments  
> > its obvious
> > that changing the OS buffersize is not the thing todo.
> >
> > If anyone has successfully increased the FSOutputStream and  
> > FSInputStream
> > buffers and got it not to blow up on array copies I would love to  
> > know the
> > short cut.
> I just started up with Lucene, and I have been looking at the NFS  
> issues. Since the OS doesn't report the block size in use by the  
> Netapp, EMC, whatever, you need to tweak it manually. I found this in  
> src/java/org/apache/lucene/store/
> /** Abstract class for output to a file in a Directory.  A random- 
> access output
> * stream.  Used for all Lucene index output operations.
> * @see Directory
> * @see InputStream
> */
> public abstract class OutputStream {
>    static final int BUFFER_SIZE = 1024;
> I changed that value to 8k, and based on the truss output from an  
> index run, it is working. Haven't gotten much beyond that to see if  
> it causes problems elsewhere. The value also needs to be altered on  
> the read end of things. Ideally, this will be made settable via a  
> system property.
> Peter
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message