lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Philippe Laflamme" <>
Subject RE: SQLDirectory
Date Fri, 06 Feb 2004 21:25:53 GMT
> > > A connection per file sounds very heavyweight.
> >
> > Indeed it is. Using Postgres' LargeObjects to represent a file has its
> > limitations: every time Lucene requires a stream on a file, a
> connection is
> > required (and cannot be shared). Implementing it this way was
> quick, but not
> > at all optimal.
> But large objects have much better read/write performence than using
> regular text fields.

Maybe, but they require opening a lot of connections on the database (at
least, for Postgres' implementation). I don't know much about Lucene's
requirements regarding the number of files generated during indexing but
they seem large. So if the number of concurrently open files grows as the
index grows, opening a connection per file is not an option.

> >
> > Your suggestion is quite interesting. It would not require the usage of
> > Blobs which are not very portable. It could be implement using
> standard SQL
> > types and would make an elegant SQLDirectory (and not an RDBMS specific
> > Directory).
> I suspect you're going to get lousy performence compared to using
> regular files.

Yes and it was to be expected: it's doubtful that a large object be faster
than any regular file.

Postgres uses a regular file per large object, so with the additional
overhead of the JDBC driver, I was expecting slower performances. I did not
expect the number of connections to become so high though.

> Why is it that you want to save the index files in a db?
> It's not like you'll have any additional meta data or functionality. The
> only advantage that I can think of is that you can have control of
> read/write locking across machines. In other words, you can have one
> machine doing the writing and one or more machines doing the
> reading/searching.

I'm not looking for blazing indexing performance, I'm more interested in the
searching side of things. Making an index available on several different
hosts is trivial using a database, but not so easy using a file system.
Also, using database replication makes distributing an index a breeze... To
me, it's more a matter of creating a scalable design.

Any thoughts on that would be appreciated...


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message