lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Snyder" <psny...@postbulletin.com>
Subject RE: Advantage of putting lucene index in RDBMS
Date Thu, 05 Oct 2006 13:57:00 GMT
Re-reading Aleksei's post, I have to ask, is it really not
possible/practical to index the database metadata (such as date, area and
schema/table/primary-key info) as Lucene document fields?  I am having
difficulty conceiving a scenario when this would not be a practical option.

-----Original Message-----
From: Vladimir Olenin [mailto:VOlenin@cihi.ca] 
Sent: Thursday, October 05, 2006 8:41 AM
To: java-user@lucene.apache.org
Subject: RE: Advantage of putting lucene index in RDBMS

As one of the people who asked about placing indeces into RDBMS, I was
primarily interested in just storing index in the RDBMS (basically, storing
the structures described on this page
http://lucene.apache.org/java/docs/fileformats.html in the relational DB).
The main reason is NOT to be able to perform some magic with joining Lucene
and 'pure DB query' results (which, actually, IS useful in some
curcumstances, but don't really see a problem of doing it in Java after
quering DB and Lucene), but rather avoid the cost of reindexing and
associated problems in complex enterprise environments.

There is a wide range of applications like YellowPages, eshops, etc, which
already store indexed data from which their site is built. Each DB update
can be a very complex process (pulling data from different sources, building
views, restructuring DB for faster read access, etc).
If all that is required is to provide 'smarter' and more efficent search
within already existing DB, it only makes sense to try to 'plug' Lucene
engine to _already available_ structures.

Yet another advantage of storing index in the DB is its 'managability'
and 'debugabiliy' (nice word!). Through there is Luke, etc, administrators
in big companies do not want to learn many new tools and having smth already
familiar to deal with can sometimes be a good argument in favor of product
adoption. (BTW, Compass, as Aleksei mentioned, can be the answer to this
prayer - meant to check it out long time ago, but haven't got around to it
yet. Also, it seems like the project is half-dead. I wonder if it's true...)

Again, not really sure whether that's feasible, because Lucene does store
quite a bit of other info together with index (and inverted index), but
sometimes that's an option.

Vlad

PS: not really sure what Aleksei had in mind. From my point of view the only
way to 'perform both DB and Lucene query' on the DB side is to either
'reimplement' Lucene engine in the DB (eg, rewrite it in PL/SQL,
etc) OR perform Java calls from DB (eg, through Java Stored Procedures in
case of Oracle).

-----Original Message-----
From: Paul Snyder [mailto:psnyder@postbulletin.com]
Sent: Thursday, October 05, 2006 9:19 AM
To: java-user@lucene.apache.org
Subject: RE: Advantage of putting lucene index in RDBMS

Aleksei, can you point me to a document detailing this procedure with
examples?  If not, would you consider creating one?  I am particularly
interested in what prerequisite steps are needed to perform a Lucene query
within SQL (if I understand correctly what you are doing).

-----Original Message-----
From: Aleksei Valikov [mailto:valikov@gmx.net]
Sent: Thursday, October 05, 2006 3:39 AM
To: java-user@lucene.apache.org
Subject: Re: Advantage of putting lucene index in RDBMS

Hi.

> I have been reading the lists for couple of week now, and I noticed 
> people asking about placing their indexes into a RDBMS. What is the 
> advantage of that?
> 
> So far lucene was able to solve all my problems, but I am curious how 
> else people are using it (especially with RDBMS).

Having an index in the DB makes it possible to join full-text queries
against this index with some other structured queries against other table.

Here's a practical example. We have a data management system for managing
geographic metadata. Documents that we manage (geometadata) have spatial
extents (bounding boxes), temporal extents (time periods) and a lot of
textual information.

Currently, complex queries like "contains 'water*', from 1998 till 2001, in
area (5, 45, 15, 55)" can't be processed in the relational DB only or in
Lucene index only. We have to make a relatively expensive union/join outside
the Lucene and the RDB. Having Lucene index within the DB and Lucene query
(re)formulatable in SQL would allow us performing the join inside the DB
which is much more performant.

Bye.
/lexi


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message