lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: SQLDirectory
Date Thu, 05 Feb 2004 21:51:26 GMT
Philippe Laflamme wrote:
> I've worked on an implementation for Postgres. I used the Large Object API
> provided by the Postgres JDBC driver. It works fine but I doubt it is very
> scalable because the number of open connections during indexing can become
> very high.
> Lucene opens many different files when writing to an index. This results in
> opening one PG connection per open file. While working on a small index (30
> 000 files), I saw the number of open connections become quite high (approx
> 150). If you don't have a lot of RAM, this is problematic.

A connection per file sounds very heavyweight.

The way I would try to implement Directory with SQL is to have a single
table of buffers per index, e.g., with columns ID, BLOCK_NUMBER and
DATA.  The contents of a file are the appended DATA columns with the
same ID, ordered by the BLOCK_NUMBER field.  This would be indexed by ID
and BLOCK_NUMBER, together a unique key.

The BLOCK_NUMBER field indicates which part of the file the row
concerns.  Thus the DATA of BLOCK_NUMBER=0 might hold the first 1024
bytes, the DATA of BLOCK_NUMBER=1 might hold the next 1024 bytes, and so
on.  This would permit efficient random access.

You'll need another table with NAME, ID, and MODIFIED_DATE, with a
single entry per file.  The length of a file can be computed with a
query that finds the length of DATA in the last BLOCK_NUMBER with an ID.

I would initially cache a single connection to the database and
serialize requests over it.  A pool of connections might be more
efficient when multiple threads are searching, but I would benchmark
that before investing much in such an implementation.

Has anyone yet implemented an SQL Directory this way?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message