lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Goetz <br...@quiotix.com>
Subject Re: Status of proximity in query language
Date Mon, 18 Feb 2002 23:21:57 GMT
> These are situation where the end user who is using this syntax has to know
> the limitations and options.

Right, but that's no excuse for creating more of these situations,
especially one as egregious as introducing an infix operator that
_looks_ like it should work with arbitrary operands but doesn't.
That's like offering a desk calculator with a + button that only adds
even numbers.  

Lets not lose sight of something: the query parser is a peripheral
element of lucene; it converts text representation of queries into the
internal representation.  No one _has_ to use it.  Its supposed to be
a convenient first-order approximation that is good enough for most
applications.

> In my user documentation

We can't assume every end user will have access to good documentation,
or any for that matter.  The Yahoo serach engine has a doc page, but
few users ever look at it.  

Having NEAR as an infix operator is simply confusing.  Lets not add
confusing features.  

> For Doug's case
>   ((a AND b) OR (c AND d)) NEAR20 ((e AND f) OR (g AND h))
> I understand that this is a difficult case to process, but I also think it
> is somewhat of an unpractical case in reality.

OK, what about combinations like:
  Foo* NEAR Bar
The way this is processed internally, its basically the same (I think). 

> What about putting a constraint on the NEAR operator to only be limited to
> Term Queries (at least at first).

Lets find a better solution.
 
> I think this is how most users will use this type of search anyway. I agree
> that it is difficult to solve the general case, but for a limited case, I
> think this would be valuable to users.

It IS valuable.  But lets add it in way such that its not confusing.  

Since the slop is tied to the phrasequery mechanism, lets think about
syntax that operates only on that.  

Ideas:
  "foo bar"(3)
  "foo bar"[3]
  "foo bar"~3

The latter makes some sense as the ~ already indicates fuzzy, and slop
is a similar concept to fuzzy (searching for an approximate match.)

I can make the latter work pretty easily, too.  



--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message