lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <>
Subject Re: Searching API: QueryParser vs Programatic queries
Date Mon, 22 May 2006 20:11:10 GMT
There's a long scree that I'm leaving at the bottom because I put effort
into it and I like to rant. But here's, perhaps, an approach.

Maybe I'm mis-interpreting what you're trying to do. I'm assuming that you
have several search fields (I'm not exactly sure what "driven by meta-data"
means in this case, but what the heck).

It seems to me that you can always do something like:

BooleanQuery bq;
QueryParser qp1 = new QueryParser("field1", "<your query fragment here>",
Query q1 = qp1.parse("search term or clause);

QueryParser qp2 = new QueryParser("field2", "<your query fragment here>",
Query q2 = qp2.parse("search term or clause);


and eventually submit the query you've build up in bq.

You can arbitrarily build these up. In other words, your q1, q2, q3, etc can
be the same field for the first N clauses, and another field for the second
M clauses. Or you could build up the <query fragment> to consist of all the
terms for a particular field.

As I said, I have no clue whether this is possible in your application. If
not, see below <G>.

********************Scree starts here***********************************

I've had similar arguments with myself. But I'm getting less forgiving with
myself when I reinvent wheels, and firmly slap my own wrists.

Pretend you are talking to your boss/technical lead/coworker. I'm assuming
you actually want to get a product out the door. Your manager asks: "How can
you justify spending the time to create, debug and maintain code that has
already been written for you for the sake of cleanliness at the expense of
the other things you could be contributing instead"?

There are some very good answers to this, but most of the ones I've tried to
use involve a lot of hand-waving on the order of "If we ever extend the
application......", or "It would be cleaner".  At which point the
conversation *should* go something like this....

Manager: "let me get this straight. You can spend 10 minutes right now
implementing the pass-to-the-query-parser solution and an unknown amount
(but probably way more than your initial estimate)
implementing/debugging/testing a 'cleaner' solution. Is that right?"

You: "Yes but....."

Manager: "Furthermore, the functionality you want to add is *already* built
into the 'use-the-parser' solution, right?"

You: "Yes, but...."

Manager: "And the amount of time you'll spend debugging this, not to mention
the amount of *other* people's time you'll spend identifying any bugs and
figuring out that it's in this new code will only increase as the longer any
bugs to undetected, right?

You: "Yes, but..."

Manager: "Do it the use-the-parser way. We can always implement it the other
way if we have time. It doesn't cost us *any* time to implement the 'use the
query parser' way, whereas your way has a measurable cost now, an unknown
cost in the future and no measurable gain. Add a big comment if you want
about how I forced you to do this ugly thing.....".

Of course there are good reasons to take the time now *if* it will save
time/effort in the future. But this sure doesn't seem like one of those
situations to me. Not to mention that it'll be MUCH simpler for the next
person looking at it to understand. Here are several things off the top of
my head that'll become maintenance issues for a custom solution, that are
*all* taken care of by the use-the-parser solution

1> How are you going to handle stop words?
2> Will you ever want to change analyzers to, say, keep URLs together? Or
maybe break them up?
3> What happens if you want to use the RegularExpressionAnalyzer to, say,
remove all punctuation or other user-entered junk?
4> Will you remember all the ins-and-outs of this code in even 1 month? What
about the next poor joker who has to figure it out?

None of this is to say that your suggestion that there be utility classes
that allow this sort of thing doesn't have merit. But I have to wonder
whether it would be effort well spent for you at this time, in this project

As you can see, this is one of my hot-button issues <G>. If you want to
really see me go off the deep end, just *mention* premature


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message