db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rick Hillegas <Richard.Hille...@Sun.COM>
Subject Re: Lucene integration
Date Wed, 25 Mar 2009 15:49:52 GMT
Geoff hendrey wrote:
> Interesting. I am reviewing the JavaDocs.
> I am concerned that transactional integrity might not be an important 
> requirement. That probably will raise the hackles of database experts 
> like the Derby team, but I would prefer a non-transactional support 
> for Lucene-derby integration.
Thanks, Geoff. I think that a successful feature needs a real user like you.
> Please consider the following things I would like:
>    1. ability to search an entire *row* as a document, not a
>       column-as-document model
OK. This makes sense to me. Each column would be a Lucene field, if I am 
understanding the terms correctly.
>    1. ability to "look back in time" and see old versions of rows
This is interesting. What would you like the key to be? A two part 
invention composed of the table's primary key plus a timestamp? 
Something else?

What about aborted changes? Would you be happy with a solution which 
recorded index entries for changes which were subsequently discarded 
because, say, an INSERT integrity violation rolled back a transaction to 
the savepoint laid down before the INSERT ran?

>    1. very high performance
>    2. high quality search results (from my experience you must combine
>       FuzzyLikeThisQuery with SnowballAnalyzer).
> In my view, the lucene integration is more like a system that indexes 
> a constant stream of information that enters the database. 
> Transactions are nice-to-have, but not really needed to achieve the 
> equivalent of a "search engine for the database". When I execute a 
> Lucene search, all I need to get back are row id's that I can use to 
> lazy-retrieve the row if the user wants to drill down on a particular 
> search result. With a web search engine like Google, it is possible 
> that the page may no longer exist when the user clicks on the search 
> result (it happens from time to time).
> This is why I don't think we really *need* transactional integrity on 
> the lucene search.
> -geoff
> “XML? Too much like HTML. It'll never work on the Web!”
> -anonymous
> *From:* Rick Hillegas <Richard.Hillegas@Sun.COM>
> *To:* Derby Discussion <derby-user@db.apache.org>
> *Sent:* Tuesday, March 24, 2009 11:55:38 AM
> *Subject:* Re: Lucene integration
> Hi Geoffrey,
> I'm hoping to have some time to look at Lucene integration after we 
> put 10.5 to bed. In the meantime, I was wondering if you have any 
> experience with implementations of Lucene Directory which place the 
> Lucene indexes inside a relational database? According to the 
> following link, people have been disappointed with the performance of 
> this approach (don't know what that means)--at first blush, however, 
> the approach seems like an attractive way to keep the Lucene indexes 
> transactionally consistent with the original character data:
> http://wiki.apache.org/lucene-java/LuceneFAQ#head-e55d8e6971f9f01daaf3e14ce1d2f34485adba6e
> Thanks,
> -Rick
> Rick Hillegas wrote:
> > Hi Geoffrey,
> >
> > I'm on the road right now but I'd like to make some suggestions 
> after I gather my thoughts and get over my jet lag. I think that it is 
> definitely possible to hook into the query processing layer in order 
> to fork the tuple stream so that a listener process can populate the 
> Lucene indexes. I think that scraping the replication log stream would 
> raise a lot of issues around when work is really committed vs. when 
> savepoints are rolled back, and I would recommend against that approach.
> >
> > Regards,
> > -Rick
> >
> > Geoffrey Hendrey wrote:
> >> Ok, well on to plan B then. Is there some stage in the preparation 
> of inserts, updates, and deletes at which the logical identity of a 
> row is established? That could be a good place to provide a lucene 
> hook, or a more general interceptor.
> >>
> >>
> >> On Mar 18, 2009, at 6:55 AM, Jørgen Løland <Jorgen.Loland@Sun.COM 
> <mailto:Jorgen.Loland@Sun.COM>> wrote:
> >>
> >> Geoff hendrey wrote:
> >> I've been folowing knuts pointers and reading the docs on the 
> classes that marshal themselves over the wire via their writeObject 
> method.
> >> So, question about this:
> >> "Type=update, Table=employee, Page=4321, Index=4, field 3=50000"
> >> Does the page and index, collectively, constitute a "row ID".
> >> If it is always a constant, than these three field are sufficient 
> to permanently identify the row, and we can use that information to 
> consititute a document ID in lucene.
> >>
> >> It's constant until the record is moved to another page (which 
> means "no", really).
> >>
> >> 
> >

View raw message