lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anthony Vito <v...@mnis.com>
Subject SQLDirectory implementation
Date Fri, 16 Apr 2004 22:22:36 GMT
  I noticed some talk on SQLDirectory a month or so ago. ( I just joined
the list :) ) I have a JDBC implementation that stores the "files" in a
couple of tables and stores the data for the files as blocks (BLOBs) of
a certain size ( 16k by default ). It also has an LRU cache for the
blocks which makes the performance quite acceptable. 

Quoting Doug:
--------------------------
The way I would try to implement Directory with SQL is to have a single
table of buffers per index, e.g., with columns ID, BLOCK_NUMBER and
DATA.  The contents of a file are the appended DATA columns with the
same ID, ordered by the BLOCK_NUMBER field.  This would be indexed by ID
and BLOCK_NUMBER, together a unique key.

The BLOCK_NUMBER field indicates which part of the file the row
concerns.  Thus the DATA of BLOCK_NUMBER=0 might hold the first 1024
bytes, the DATA of BLOCK_NUMBER=1 might hold the next 1024 bytes, and so
on.  This would permit efficient random access.

You'll need another table with NAME, ID, and MODIFIED_DATE, with a
single entry per file.  The length of a file can be computed with a
query that finds the length of DATA in the last BLOCK_NUMBER with an ID.

I would initially cache a single connection to the database and
serialize requests over it.  A pool of connections might be more
efficient when multiple threads are searching, but I would benchmark
that before investing much in such an implementation.

Has anyone yet implemented an SQL Directory this way?
--------------------------------

So to answer the question... Pretty Much. Just a few little minor
differences.

I have one table that stores each file as a row, with a name, and a
directory name, so I have have more then one index stored in the same
two tables, and a length. The other table stores an ID from the first
table, a sequence number (BLOCK_NUMBER), and the DATA for that BLOCK. My
current code creates new prepared statements for each DB access, so a
statement pooling connection is a must. ( this could probably be worked
around )

I actually prefixed all the file names with MySQL. Even though it's pure
JDBC and should work with any driver or database. I'll go clean that up
this weekend and put up a site with the code and the API docs. I'd be
interested to see what kind people have to say, and if the results of
any better tests people have cooked up.

-vito


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message