lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dror Matalon <d...@zapatec.com>
Subject Re: SQLDirectory
Date Fri, 06 Feb 2004 21:01:30 GMT
On Fri, Feb 06, 2004 at 11:22:07AM -0500, Philippe Laflamme wrote:
> > A connection per file sounds very heavyweight.
> 
> Indeed it is. Using Postgres' LargeObjects to represent a file has its
> limitations: everytime Lucene requires a stream on a file, a connection is
> required (and cannot be shared). Implementing it this way was quick, but not
> at all optimal.

But large objects have much better read/write performence than using
regular text fields.

> 
> Your suggestion is quite interesting. It would not require the usage of
> Blobs which are not very portable. It could be implement using standard SQL
> types and would make an elegant SQLDirectory (and not an RDBMS specific
> Directory).

I suspect you're going to get lousy performence compared to using
regular files. Why is it that you want to save the index files in a db?
It's not like you'll have any additional meta data or functionality. The
only advantage that I can think of is that you can have control of
read/write locking across machines. In other words, you can have one
machine doing the writing and one or more machines doing the
reading/searching.

> 
> Has anyone tried something like this? I could modify the PgSQLDirectory I've
> created, but having previous experiences would be very helpful...
> 
> Regards,
> Phil
> 
> > -----Original Message-----
> > From: Doug Cutting [mailto:cutting@apache.org]
> > Sent: February 5, 2004 16:51
> > To: Lucene Users List
> > Subject: Re: SQLDirectory
> >
> >
> > Philippe Laflamme wrote:
> > > I've worked on an implementation for Postgres. I used the Large
> > Object API
> > > provided by the Postgres JDBC driver. It works fine but I doubt
> > it is very
> > > scalable because the number of open connections during indexing
> > can become
> > > very high.
> > >
> > > Lucene opens many different files when writing to an index.
> > This results in
> > > opening one PG connection per open file. While working on a
> > small index (30
> > > 000 files), I saw the number of open connections become quite
> > high (approx
> > > 150). If you don't have a lot of RAM, this is problematic.
> >
> > A connection per file sounds very heavyweight.
> >
> > The way I would try to implement Directory with SQL is to have a single
> > table of buffers per index, e.g., with columns ID, BLOCK_NUMBER and
> > DATA.  The contents of a file are the appended DATA columns with the
> > same ID, ordered by the BLOCK_NUMBER field.  This would be indexed by ID
> > and BLOCK_NUMBER, together a unique key.
> >
> > The BLOCK_NUMBER field indicates which part of the file the row
> > concerns.  Thus the DATA of BLOCK_NUMBER=0 might hold the first 1024
> > bytes, the DATA of BLOCK_NUMBER=1 might hold the next 1024 bytes, and so
> > on.  This would permit efficient random access.
> >
> > You'll need another table with NAME, ID, and MODIFIED_DATE, with a
> > single entry per file.  The length of a file can be computed with a
> > query that finds the length of DATA in the last BLOCK_NUMBER with an ID.
> >
> > I would initially cache a single connection to the database and
> > serialize requests over it.  A pool of connections might be more
> > efficient when multiple threads are searching, but I would benchmark
> > that before investing much in such an implementation.
> >
> > Has anyone yet implemented an SQL Directory this way?
> >
> > Doug
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message