lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tate Avery" <>
Subject FileChannel implementation of Directory
Date Fri, 17 Oct 2003 19:03:25 GMT

I was read a posting from Doug Cutting (circa 2001) that stated the following:

"Multi-CPU and/or multi-disk systems can provide greater parallelism and hence query throughput.
However Lucene's FSDirectory serializes reads to a given file (since it only has a single
file descriptor per file) which limits i/o parallelism. Someone with a large disk array would
be better served by a Directory implementation that uses Java 1.4's new i/o classes. In particular,
the FileChannel class supports reads that do not move the file pointer, so that multiple reads
on the same file can be in progress at the same time."

I attempted to implement this suggestion.  But, I did not have great success.

Basically, I copied the existing FSDirectory (from 1.3-rc1) and modified the FCInputStream
inner class.  I changed it to get a FileChannel (channel) in the constructor and to clone
properly.  But, mainly, I changed "readInternal" to look like this:

	protected void readInternal(byte[] b, int offset, int len)
		throws IOException
	{, offset, len), getFilePointer());

In other words, wrap the byte array, let the channel do the reading, and get the current file
pointer from the super class.

It works fine...  the same queries return the same results, etc.  But, the new Directory implementation
consistently falls a few ms short of the old one (over sustained trials with various amounts
of concurrency) re: overall response time.  Usually it wins out for both 'querying' (i.e. and loading (i.e. Hits.doc(i) ).

According to the FileChannel API, absolute reads should be able to occur concurrently.  However,
the existing FSDirectory serializes access to the underlying files.  So, I figured that FSDirectory
would be faster with a single search thread... but FileChannelDirectory would win with multiple
threads.  Apparently, not so (given my implementation :-).  I also tested on a regular IDE
HD and a SCSI.  Both tests, however, were Win2k based.

Does anyone know why I might not be seing a performance increase for multiple concurrent threads
using my "FileChannelDirectory" ?

Any ideas would be appreciated.

Thank you,

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message