lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Lucene Seaches VS. Relational database Queries
Date Thu, 13 Apr 2006 22:14:22 GMT
On 4/13/06, Ananth T. Sarathy <ananth.t.sarathy@gmail.com> wrote:
>
> No we do have drop downs selects that would allow for the substitution,
> but
> we also have a free text fields to allow the user to search. That solution
> would I think work for the DB query replacement, but you would need a
> regular non underscored field to allow for free text.
>
>
Well, as I say, you've solved that problem already. Somewhere, somehow, you
have to decide what to do with the "free text" data. Somewhere, somehow,
you've got to decide whether "stunt director trainee" means "stunt director"
+ trainee, stunt + "director trainee", or stunt + director + trainee. Or
else you can't form your SQL in the first place. And the query doesn't
produce reasonable results if you *do* form the query.

If you can form your SQL with distinct "Title = 'blah'" clauses, you can
substitute underscores for spaces in the terms. If you can do that, you can
ask Lucene to find the terms you indexed with underscores. And if you can't
form your SQL queries in the first place, the question is irrelevant.

All that said, perhaps a better question is "why is your SQL slow?".
Relational databases are really good at this sort of thing. Many smart
people have put many, many developer years into making relational databases
deal with joins efficiently. Assuming you have the proper indexes etc.

As much as I've been impressed with Lucene, I have to ask whether it's
relevant to your problem. I have no clue what database you're using, how
it's set up, or whether the examples you've given are simplified enough that
I don't understand what the *real* problem is. But if your issue isn't
really dealing with a full text search, your relational DB should be able to
handle it, given the proper wherewithal. Have you done "explain plan" or its
equivalent in your DB? Have you tried adding indexes to avoid full table
scans? In short, have you fully convinced yourself that your RDB can't
handle the problem?

I'm *extremely* leery of introducing another "moving part" into a product
without fully exhausting the current parts. It's *never* a good thing to add
a new step into the process unless you can convince yourself that it solves
more problems than it introduces. You've already alluded to keeping the DB
and the Lucene indexes in synch. I *guarantee* that there will be other
issues that rise up and bite you. *Count* on whatever you think you'll spend
in introducing Lucene into your mix (say effort X) costing you *at least* 2X
more time/energy than you think. I'd actually give it a multiplier closer to
4X.

This is NOT a slam on lucene. But developers often miss the bigger picture.
What processes are you going to put in place to keep the Lucene part of the
product up to date? How much is it going to cost your company to
troubleshoot the Lucene portion? How many company resources are going to be
spent answering customer complaints? What is the ongoing maintenance
requirement?

I like Lucene. I've just persuaded my company to use it in our next product.
I've been incredibly impressed with it's architeture and implementation. But
it's a text search engine, and shouldn't be confused with a RDB. *Assuming*
that the RDB is an integral part of your product, I'd spend a lot of time
making that do what I needed before I'd introduce another moving part.

All for what it's worth, from an old "C" programmer <G>..

Best
Erick

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message