lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Uneffective writeBytes and readBytes [FIX]
Date Thu, 08 Sep 2005 21:17:57 GMT
I don't in general disagree with this sort of optimization, but I think 
a good fix is a bit more complicated than what you posted.

Lukas Zapletal wrote:
> And here comes the fixes:
> 
> OutputStream:
> 
> 	/**
> 	 * Writes an array of bytes.
> 	 * 
> 	 * @param b
> 	 *            the bytes to write
> 	 * @param length
> 	 *            the number of bytes to write
> 	 * @see InputStream#readBytes(byte[],int,int)
> 	 */
> 	public final void writeBytes(byte[] b, int length) 
> 		throws IOException { 
> 
> //		for (int i = 0; i < length; i++) writeByte(b[i]);
> 
> 		if (bufferPosition > 0) // flush buffer
> 			flush();
> 
> 		if (length < BUFFER_SIZE) {
> 				flushBuffer(b, length);
> 		} else {
> 			int pos = 0;
> 			int size;
> 			while (pos < length) {
> 				if (length - pos < BUFFER_SIZE) {
> 					size = length - pos;
> 				} else {
> 					size = BUFFER_SIZE;
> 				}
> 				System.arraycopy(b, pos, 
> 					buffer, 0, size); pos += size;
> 				flushBuffer(buffer, size);
> 				bufferStart += size;
> 			}
> 		}
> 	}

This forces a flush() each time a byte array of any size is written. 
That could be much slower when lots of small byte arrays are written, 
since flush() invokes a system call.  What would be best is, if there is 
room in the buffer, to simply use System.arraycopy to append the new 
data to the buffer, with no flush.  If the new data is larger than a 
buffer, then the buffer should be flushed and the new data written 
directly, without ever copying it into the buffer.  If the new data is 
smaller than a buffer but larger than the room available in the current 
buffer, then it should be used to fill the current buffer, that should 
be flushed, then the remainder should be copied to the buffer.  Does 
that sound right?

> InputStream:
> 
> 	public final void readBytes(byte[] b, int offset, int len)
> 			throws IOException {
> //		if (len < BUFFER_SIZE) { // not required
> //			for (int i = 0; i < len; i++)
> //				// read byte-by-byte
> //				b[i + offset] = (byte) readByte();
> //		} else { // read all-at-once
> 			long start = getFilePointer();
> 			seekInternal(start);
> 			readInternal(b, offset, len);
> 
> 			bufferStart = start + len;
> 			bufferPosition = 0; 
> 			bufferLength = 0;
>  //		}
> 	}

Again, this could be much slower when lots of small arrays are written, 
since each call forces seek and read system calls.  However this could 
be optimized for the case where the desired data resides entirely in the 
current buffer to use System.arraycopy.

> There is significant time improvement for writing and slight for
> reading. I also recommend set the buffer to 8 or 16 kilobytes.

In certain cases Lucene allocates many stream buffers.  Making these 
larger can thus greatly increase the amount of memory used.  Also, the 
filesystem should optimize sequential reads so that the primary 
improvement seen with a larger buffer size is fewer system calls.  In my 
experience, a buffer of 1k or so is usually large enough so that the 
system call overheads are minimal.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message