db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rick Hillegas <Richard.Hille...@Sun.COM>
Subject Re: Lucene integration
Date Wed, 25 Mar 2009 17:08:18 GMT
Geoffrey Hendrey wrote:
> Will you and other derby folks be at javaone?
>   
Hi Geoff,

I'll be there along with the other Java DB folks. Don't know about other 
community members.

Regards,
-Rick
> Sent from my iPhone
>
> On Mar 25, 2009, at 8:49 AM, Rick Hillegas <Richard.Hillegas@Sun.COM> wrote:
>
> Geoff hendrey wrote:
> Interesting. I am reviewing the JavaDocs.
>
> I am concerned that transactional integrity might not be an important requirement. That
probably will raise the hackles of database experts like the Derby team, but I would prefer
a non-transactional support for Lucene-derby integration.
> Thanks, Geoff. I think that a successful feature needs a real user like you.
>
> Please consider the following things I would like:
>
>   1. ability to search an entire *row* as a document, not a
>      column-as-document model
>
> OK. This makes sense to me. Each column would be a Lucene field, if I am understanding
the terms correctly.
>
>   1. ability to "look back in time" and see old versions of rows
>
> This is interesting. What would you like the key to be? A two part invention composed
of the table's primary key plus a timestamp? Something else?
>
> What about aborted changes? Would you be happy with a solution which recorded index entries
for changes which were subsequently discarded because, say, an INSERT integrity violation
rolled back a transaction to the savepoint laid down before the INSERT ran?
>
> Thanks,
> -Rick
>
>   1. very high performance
>   2. high quality search results (from my experience you must combine
>      FuzzyLikeThisQuery with SnowballAnalyzer).
>
> In my view, the lucene integration is more like a system that indexes a constant stream
of information that enters the database. Transactions are nice-to-have, but not really needed
to achieve the equivalent of a "search engine for the database". When I execute a Lucene search,
all I need to get back are row id's that I can use to lazy-retrieve the row if the user wants
to drill down on a particular search result. With a web search engine like Google, it is possible
that the page may no longer exist when the user clicks on the search result (it happens from
time to time).
>
> This is why I don't think we really *need* transactional integrity on the lucene search.
> -geoff
> “XML? Too much like HTML. It'll never work on the Web!”
> -anonymous
>
>
> *From:* Rick Hillegas <Richard.Hillegas@Sun.COM>
> *To:* Derby Discussion <derby-user@db.apache.org>
> *Sent:* Tuesday, March 24, 2009 11:55:38 AM
> *Subject:* Re: Lucene integration
>
> Hi Geoffrey,
>
> I'm hoping to have some time to look at Lucene integration after we put 10.5 to bed.
In the meantime, I was wondering if you have any experience with implementations of Lucene
Directory which place the Lucene indexes inside a relational database? According to the following
link, people have been disappointed with the performance of this approach (don't know what
that means)--at first blush, however, the approach seems like an attractive way to keep the
Lucene indexes transactionally consistent with the original character data:
>
> http://wiki.apache.org/lucene-java/LuceneFAQ#head-e55d8e6971f9f01daaf3e14ce1d2f34485adba6e
>
> Thanks,
> -Rick
>
> Rick Hillegas wrote:
>   
>> Hi Geoffrey,
>>
>> I'm on the road right now but I'd like to make some suggestions after I gather my
thoughts and get over my jet lag. I think that it is definitely possible to hook into the
query processing layer in order to fork the tuple stream so that a listener process can populate
the Lucene indexes. I think that scraping the replication log stream would raise a lot of
issues around when work is really committed vs. when savepoints are rolled back, and I would
recommend against that approach.
>>
>> Regards,
>> -Rick
>>
>> Geoffrey Hendrey wrote:
>>     
>>> Ok, well on to plan B then. Is there some stage in the preparation of inserts,
updates, and deletes at which the logical identity of a row is established? That could be
a good place to provide a lucene hook, or a more general interceptor.
>>>
>>>
>>> On Mar 18, 2009, at 6:55 AM, Jørgen Løland <Jorgen.Loland@Sun.COM <mailto:Jorgen.Loland@Sun.COM>>
wrote:
>>>
>>> Geoff hendrey wrote:
>>> I've been folowing knuts pointers and reading the docs on the classes that marshal
themselves over the wire via their writeObject method.
>>> So, question about this:
>>> "Type=update, Table=employee, Page=4321, Index=4, field 3=50000"
>>> Does the page and index, collectively, constitute a "row ID".
>>> If it is always a constant, than these three field are sufficient to permanently
identify the row, and we can use that information to consititute a document ID in lucene.
>>>
>>> It's constant until the record is moved to another page (which means "no", really).
>>>
>>>       
>
>
>
>   


Mime
View raw message