Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@apache.org Received: (qmail 78693 invoked from network); 10 Jul 2002 19:40:59 -0000 Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131) by daedalus.apache.org with SMTP; 10 Jul 2002 19:40:59 -0000 Received: (qmail 12493 invoked by uid 97); 10 Jul 2002 19:41:14 -0000 Delivered-To: qmlist-jakarta-archive-lucene-user@jakarta.apache.org Received: (qmail 12460 invoked by uid 97); 10 Jul 2002 19:41:12 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 12448 invoked by uid 98); 10 Jul 2002 19:41:12 -0000 X-Antivirus: nagoya (v4198 created Apr 24 2002) Message-ID: <3D2C8DB6.8060705@lucene.com> Date: Wed, 10 Jul 2002 12:40:38 -0700 From: Doug Cutting User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.0.0) Gecko/20020530 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Lucene Users List Subject: Re: Crash / Recovery Scenario References: <200207091148.31931.karl@gan.no> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Karl �ie wrote: > A better solution would be to hack the FSDirectory to store each file it would > store in a file-directory as a serialized byte array in a blob of a sql > table. This would increase performance because the whole Directory don't have > to change each time, and it doesn't have to read the while directory into > memory. I also suspect lucene to sort its records into these different files > for increased performance (like: i KNOW that record will be in segment "xxx" > if it is there at all). > > I have looked at the source for the RAMDirectory and the FSDirectory and they > could both be altered to store their internal buffers into a BLOB, but i > haven't managed to do this successfully. The problem i have been pounding is > the lucene.InputStream's seek() function. This really requires the underlying > impl to be either a file, or a array in memory. For a BLOB this would mean > that the blob has to be fetched, then read/seek-ed/written/ then stored back > again. (is this correct?!?, and if so is there a way to know WHEN it is > required to fetch/store the array). A BLOB can be randomly accessed: http://java.sun.com/j2se/1.4/docs/api/java/sql/Blob.html#getBytes(long,%20int) A good driver should page BLOBs over the connection. A great driver might even have a separate thread doing read-aheads. (Dream on.) It looks like the leading JDBC driver for MySQL (mm) does not page blobs, but rather always reads the entire blob. Sigh. On the bright side, the JDBC driver for PostgreSQL does page BLOBS over the connection. So it should be easy to implement a Lucene InputStream based on a BLOB. The Directory should be a simple table of BLOBs. Lucene rarely seeks on writable streams. In other words, nearly all files are written sequentially. With a quick scan, I can see only one place where Lucene seeks an OutputStream: in TermInfosWriter it overwrites the first four bytes once just before the file is closed. So to implement a Lucene OutputStream you could cache the value of Blob.setBinaryStream(int), and only create a new underlying output stream when seek() is called. Doug -- To unsubscribe, e-mail: For additional commands, e-mail: