db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Geoffrey Hendrey <geoff_hend...@yahoo.com>
Subject Re: Lucene integration
Date Wed, 25 Mar 2009 16:48:24 GMT

Will you and other derby folks be at javaone?

Sent from my iPhone

On Mar 25, 2009, at 8:49 AM, Rick Hillegas <Richard.Hillegas@Sun.COM> wrote:

Geoff hendrey wrote:
Interesting. I am reviewing the JavaDocs.

I am concerned that transactional integrity might not be an important requirement. That probably
will raise the hackles of database experts like the Derby team, but I would prefer a non-transactional
support for Lucene-derby integration.
Thanks, Geoff. I think that a successful feature needs a real user like you.

Please consider the following things I would like:

  1. ability to search an entire *row* as a document, not a
     column-as-document model

OK. This makes sense to me. Each column would be a Lucene field, if I am understanding the
terms correctly.

  1. ability to "look back in time" and see old versions of rows

This is interesting. What would you like the key to be? A two part invention composed of the
table's primary key plus a timestamp? Something else?

What about aborted changes? Would you be happy with a solution which recorded index entries
for changes which were subsequently discarded because, say, an INSERT integrity violation
rolled back a transaction to the savepoint laid down before the INSERT ran?

Thanks,
-Rick

  1. very high performance
  2. high quality search results (from my experience you must combine
     FuzzyLikeThisQuery with SnowballAnalyzer).

In my view, the lucene integration is more like a system that indexes a constant stream of
information that enters the database. Transactions are nice-to-have, but not really needed
to achieve the equivalent of a "search engine for the database". When I execute a Lucene search,
all I need to get back are row id's that I can use to lazy-retrieve the row if the user wants
to drill down on a particular search result. With a web search engine like Google, it is possible
that the page may no longer exist when the user clicks on the search result (it happens from
time to time).

This is why I don't think we really *need* transactional integrity on the lucene search.
-geoff
“XML? Too much like HTML. It'll never work on the Web!”
-anonymous


*From:* Rick Hillegas <Richard.Hillegas@Sun.COM>
*To:* Derby Discussion <derby-user@db.apache.org>
*Sent:* Tuesday, March 24, 2009 11:55:38 AM
*Subject:* Re: Lucene integration

Hi Geoffrey,

I'm hoping to have some time to look at Lucene integration after we put 10.5 to bed. In the
meantime, I was wondering if you have any experience with implementations of Lucene Directory
which place the Lucene indexes inside a relational database? According to the following link,
people have been disappointed with the performance of this approach (don't know what that
means)--at first blush, however, the approach seems like an attractive way to keep the Lucene
indexes transactionally consistent with the original character data:

http://wiki.apache.org/lucene-java/LuceneFAQ#head-e55d8e6971f9f01daaf3e14ce1d2f34485adba6e

Thanks,
-Rick

Rick Hillegas wrote:
> Hi Geoffrey,
>
> I'm on the road right now but I'd like to make some suggestions after I gather my thoughts
and get over my jet lag. I think that it is definitely possible to hook into the query processing
layer in order to fork the tuple stream so that a listener process can populate the Lucene
indexes. I think that scraping the replication log stream would raise a lot of issues around
when work is really committed vs. when savepoints are rolled back, and I would recommend against
that approach.
>
> Regards,
> -Rick
>
> Geoffrey Hendrey wrote:
>> Ok, well on to plan B then. Is there some stage in the preparation of inserts, updates,
and deletes at which the logical identity of a row is established? That could be a good place
to provide a lucene hook, or a more general interceptor.
>>
>>
>> On Mar 18, 2009, at 6:55 AM, Jørgen Løland <Jorgen.Loland@Sun.COM <mailto:Jorgen.Loland@Sun.COM>>
wrote:
>>
>> Geoff hendrey wrote:
>> I've been folowing knuts pointers and reading the docs on the classes that marshal
themselves over the wire via their writeObject method.
>> So, question about this:
>> "Type=update, Table=employee, Page=4321, Index=4, field 3=50000"
>> Does the page and index, collectively, constitute a "row ID".
>> If it is always a constant, than these three field are sufficient to permanently
identify the row, and we can use that information to consititute a document ID in lucene.
>>
>> It's constant until the record is moved to another page (which means "no", really).
>>
>> >




Mime
View raw message