Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@apache.org Received: (qmail 25878 invoked from network); 18 Feb 2002 23:45:03 -0000 Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131) by daedalus.apache.org with SMTP; 18 Feb 2002 23:45:03 -0000 Received: (qmail 19018 invoked by uid 97); 18 Feb 2002 23:45:08 -0000 Delivered-To: qmlist-jakarta-archive-lucene-dev@jakarta.apache.org Received: (qmail 19002 invoked by uid 97); 18 Feb 2002 23:45:08 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 18991 invoked from network); 18 Feb 2002 23:45:07 -0000 Errors-To: User-Agent: Microsoft-Entourage/10.0.0.1331 Date: Mon, 18 Feb 2002 15:45:05 -0800 Subject: Re: Status of proximity in query language From: Peter Carlson To: Lucene Developers List Message-ID: In-Reply-To: <20020218152157.A15514@lx.quiotix.com> Mime-version: 1.0 Content-type: text/plain; charset="US-ASCII" Content-transfer-encoding: 7bit X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Sounds Great, Let hook something up to the phrase query, but I would just suggest another character so that it's not confusing that the same operator is used for two different concepts. Some thoughts "foo bar"#3 "foo bar"!3 "foo bar"n3 "foo bar"$3 Really I would just suggest excluding what is currently used and ? (used in urls), & (used in urls), > (would have encode for xml), < (would have to encode for xml), % (can use be used to escape characters). --Peter On 2/18/02 3:21 PM, "Brian Goetz" wrote: >> These are situation where the end user who is using this syntax has to know >> the limitations and options. > > Right, but that's no excuse for creating more of these situations, > especially one as egregious as introducing an infix operator that > _looks_ like it should work with arbitrary operands but doesn't. > That's like offering a desk calculator with a + button that only adds > even numbers. > > Lets not lose sight of something: the query parser is a peripheral > element of lucene; it converts text representation of queries into the > internal representation. No one _has_ to use it. Its supposed to be > a convenient first-order approximation that is good enough for most > applications. > >> In my user documentation > > We can't assume every end user will have access to good documentation, > or any for that matter. The Yahoo serach engine has a doc page, but > few users ever look at it. > > Having NEAR as an infix operator is simply confusing. Lets not add > confusing features. > >> For Doug's case >> ((a AND b) OR (c AND d)) NEAR20 ((e AND f) OR (g AND h)) >> I understand that this is a difficult case to process, but I also think it >> is somewhat of an unpractical case in reality. > > OK, what about combinations like: > Foo* NEAR Bar > The way this is processed internally, its basically the same (I think). > >> What about putting a constraint on the NEAR operator to only be limited to >> Term Queries (at least at first). > > Lets find a better solution. > >> I think this is how most users will use this type of search anyway. I agree >> that it is difficult to solve the general case, but for a limited case, I >> think this would be valuable to users. > > It IS valuable. But lets add it in way such that its not confusing. > > Since the slop is tied to the phrasequery mechanism, lets think about > syntax that operates only on that. > > Ideas: > "foo bar"(3) > "foo bar"[3] > "foo bar"~3 > > The latter makes some sense as the ~ already indicates fuzzy, and slop > is a similar concept to fuzzy (searching for an approximate match.) > > I can make the latter work pretty easily, too. > > > > -- > To unsubscribe, e-mail: > For additional commands, e-mail: > > -- To unsubscribe, e-mail: For additional commands, e-mail: