lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Carlson <carl...@bookandhammer.com>
Subject Re: Status of proximity in query language
Date Mon, 18 Feb 2002 23:10:19 GMT
I know that there are cases that are difficult to match with a single
generic syntax. However, doesn't the queryParser already run into issues
like this with other operators like fuzzy operator and not being able to use
a wildcard as a prefix (i.e. *bar).

These are situation where the end user who is using this syntax has to know
the limitations and options.

In my user documentation, I state the limitations that only allow them to
use a NEAR operator between 2 Term Queries (single words). There is still
some issues with this (foo near10 bar near20 xml) that should be resolved.

For Doug's case
  ((a AND b) OR (c AND d)) NEAR20 ((e AND f) OR (g AND h))
I understand that this is a difficult case to process, but I also think it
is somewhat of an unpractical case in reality.

What about putting a constraint on the NEAR operator to only be limited to
Term Queries (at least at first).

I think this is how most users will use this type of search anyway. I agree
that it is difficult to solve the general case, but for a limited case, I
think this would be valuable to users.

I guess the other option is to come up with a new syntax like near(a, b)
that matches the current issues. But this really has many of the same issues
since someone could type near("a b",c*).

I guess in the end, I feel like the slop factor is one of the unique
features of Lucene, and I would like to allow people to easily use it.

What are your thoughts?

--Peter


On 2/18/02 12:39 PM, "Brian Goetz" <brian@quiotix.com> wrote:

>> Thanks for the feedback on why the NEAR operator was not yet incorporated. I
>> didn't understand all the issues for not using the NEAR operator. For my
>> purposes, I am fine with these limitations and describe them in the search
>> documentation.
> 
> Right, but we have to be extra careful with the syntax for the query
> parser, as it is exposed to users who don't even know what Lucene is,
> let alone having read the docs.  We're designing for typical users here,
> not programmers. 
> 
>> However, as a potential solution, what do people think about a multi-level
>> slop functionality? That is having the slop really be an array of distances
>> between terms.
>> 
>> So "foo bar" NEAR3 "unga bunga" would be translated into
>> 
>> Foo within 1 of bar within 3 of unga within 1 of bunga.
>> 
>> This would become a single phrase query, but the "slop" would be variable
>> between each word.
> 
> That's OK when everthing is a term.  What about
> Foo* NEAR Bar
> 
> We'd need to elevate the concept of 'slop' up the query hierarchy so you
> could apply slop to arbitrary queries.  Doug, is that practical?
> 
> 
> 
> --
> To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>
> 
> 


--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message