lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allison, Timothy B." <talli...@mitre.org>
Subject RE: Searching for words begining with "or"
Date Fri, 19 Jul 2013 12:05:31 GMT
If Jack's recommendation for keeping stopwords will work in your use case, this constructor
should do the trick:

Analyzer analyzer = new StandardAnalyzer(VERSION, CharArraySet.EMPTY_SET)

________________________________________
From: Jack Krupansky [jack@basetechnology.com]
Sent: Friday, July 19, 2013 12:59 AM
To: java-user@lucene.apache.org
Subject: Re: Searching for words begining with "or"

Just so you know, the presence of a wildcard in a term means that the term
will not be analyzed. So, state:OR* should fail since "OR" will not be in
the index - because it would index as "or" (lowercase). Hmmm... why does
"or" seem familiar...?

Ah.... yeah... right!... The standard analyzer includes the standard stop
filter, which defaults to using this set of stopwords:

final List<String> stopWords = Arrays.asList(
  "a", "an", "and", "are", "as", "at", "be", "but", "by",
  "for", "if", "in", "into", "is", "it",
  "no", "not", "of", "on", "or", "such",
  "that", "the", "their", "then", "there", "these",
  "they", "this", "to", "was", "will", "with"
);

And... "or" is on that list! So, the standard analyzer is removing "or" from
the index! That's why the query can't find it.

Unless you really want these stop words removed, construct your own analyzer
that does not do stop word removal.

-- Jack Krupansky

-----Original Message-----
From: ABlaise
Sent: Friday, July 19, 2013 12:07 AM
To: java-user@lucene.apache.org
Subject: Re: Searching for words begining with "or"

When I make my query, everything goes well until I add the last part :
(city:or* OR state:or*).
I tried the first solution that was given to me but putting \OR and \AND
doesn't seem to be the solution. The query is actually well built, he has no
problem with OR or \OR to parse the query since the query looks like that :
+(+(areaType:city areaType:neighborhood areaType:county)
+areaName:portland*) +(city:or* state:or*).
It seems to me as a valid query. It's just that he can't seem to find the
'OR' *in* the index... it's like they don't exist. And I know this because
if I retrieve the last dysfunctional part of the query, he finds (among
others) the right document, with the state written in it... It's like he
can't 'see' the 'or' in the index...

As for the upper/lower case, I am using a standard Analyzer to index and to
search and I feed him with the states in upper case and he doesn't seem to
change it. Still, I tried to put them in lower case but it didn't change
anything...

Thanks in advance for your future answers and for the help you already
provided me with.



--
View this message in context:
http://lucene.472066.n3.nabble.com/Searching-for-words-begining-with-or-tp4079018p4079035.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message