db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rick Hillegas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DERBY-590) How to integrate Derby with Lucene API?
Date Mon, 21 Oct 2013 14:38:43 GMT

    [ https://issues.apache.org/jira/browse/DERBY-590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13800696#comment-13800696

Rick Hillegas commented on DERBY-590:

Thanks again for working on this, Andrew. I noticed that lucene_titles.sql invokes a procedure
called LuceneSupport.indexDatabase(). I can't find that procedure in lucene_demo.diff. Where
should I look for that procedure?

Here's my crude interpretation of what the code is doing: The tool makes it possible to do
full-text search on data which is stored in the text columns of Derby tables. The tables must
have unique Derby indexes. Lucene itself relies on indexes which it builds and stores outside
Derby in the file system. Over time, the Lucene indexes drift out of sync with the text data.
The application periodically asks Derby to update specific Lucene indexes, bringing them back
into sync with the text data. 

Loading the tool via syscs_register_tool() creates the following schema objects:

a) LuceneSupport.indexTable() - This procedure indexes a text column in a Derby table.

b) LuceneSupport.luceneUpdateDocument() - This procedure updates a Lucene index which was
created by the previous procedure, bringing the Lucene index back into sync with the text

c) LuceneSupport.luceneQuery() - This is a table function for running a full-text search against
a Derby column.

As is, this sounds like a very useful piece of functionality. We could make this production-ready
incrementally and document it at the end of that effort. At a minimum, we would want to:

i) Quibble a bit about the api, the names of schema objects, and where the code goes.

ii) Add comments to the code.

iii) Think about edge cases. For example, what happens if the Lucene indexes become corrupt
or are deleted? How do we keep track of which columns are indexed? What happens when Derby
is recovered from a backup or the database is recreated?

iv) Write tests.

Some follow-on efforts might also make sense:

1) We could consider moving the Lucene indexes inside the database.

2) Maybe we could add triggers on the indexed columns so that the Lucene indexes remain in
sync with the Derby data. Don't know how much of a performance drag that would be. Maybe this
could be an optional feature of creating a Lucene index.

3) Replace the procedure calls with explicit CREATE FULLTEXT (and maybe UPDATE FULLTEXT) statements.
This would be an opportunity to think about how we could load and unload optional Derby statements.


> How to integrate Derby with Lucene API?
> ---------------------------------------
>                 Key: DERBY-590
>                 URL: https://issues.apache.org/jira/browse/DERBY-590
>             Project: Derby
>          Issue Type: Improvement
>          Components: Documentation, SQL
>            Reporter: Abhijeet Mahesh
>              Labels: derby_triage10_11
>         Attachments: lucene_demo.diff
> In order to use derby with lucene API what should be the steps to be taken? 

This message was sent by Atlassian JIRA

View raw message