lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Victor Hadianto" <vic...@hadianto.net>
Subject Re: Dash Confusion in QueryParser - Bug? Feature?
Date Wed, 15 Oct 2003 00:38:01 GMT
----- Original Message ----- 
From: "Erik Hatcher" <erik@ehatchersolutions.com>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Wednesday, October 15, 2003 10:01 AM
Subject: Re: Dash Confusion in QueryParser - Bug? Feature?


> On Saturday, October 11, 2003, at 09:44  AM, Michael Giles wrote:
> > He is probably using the StandardAnalyzer.  I was about to write the
> > exact same email (but using Wal-Mart as an example on this page -
> > http://www.benchmark.com/cgi-bin/suid/~bcmlp/
> > newsletter.cgi?mode=show&year=2003&date=2003-10-07). I index and
> > search with the same analyzer (Standard), but when I search for
> > Wal-Mart, I don't find a match.  I DO find a match if I search for
> > "Wal-Mart" or Wal Mart (no hyphen).  This seems like a bug.
>
> Sorry for the delay.  I've been meaning to reply to this.
>
> When you index using StandardAnalyzer, you are indexing it to two terms
> "wal" and "mart" (without the quotes).  QueryParser does its own
> (weird?) stuff to
> strings passed to it.  Here's how it breaks down:
>
>      String[] queries = {"Wal-Mart", "\"Wal-Mart\"", "Wal Mart"};
>      for (int i = 0; i < queries.length; i++) {
>        String query = queries[i];
>        Query q = QueryParser.parse(query, "contents", new
> StandardAnalyzer());
>        System.out.println(query + " = " + q);
>      }
>
> Wal-Mart = contents:wal -contents:mart
> "Wal-Mart" = contents:"wal mart"
> Wal Mart = contents:wal contents:mart
>
> Notice all three are completely different queries.  The Wal-Mart one is
> excluding "mart" making it miss documents you expect.  The second one
> is a phrase query, which is basically what you're after.  The third one
> is matching any documents with "wal" or "mart" in them regardless of
> whether they are side-by-side.
>
> Is this a bug?  Nah... just the nature of the QueryParser beast.  It
> would be a non-backwards-compatible change to change how QueryParser
> deals with a dash. That is the main issue here with it interpreting it
> as a NOT operator.  But it seems logical to me that it shouldn't do so
> when its mashed against a word like this and leave it to the analyzer
> to deal with.

I believe this is the same problem that I had the other day. If you search
the mailing list for "t-shirt" you should get some threads discussing this
problem.

In fact why don't give it here:

http://nagoya.apache.org/eyebrowse/BrowseList?listName=lucene-user@jakarta.apache.org&by=thread&from=317960


Cheers,

victor


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message