lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jon Pipitone <jpipit...@mshri.on.ca>
Subject Re: '-' character not interpreted correctly in field names
Date Wed, 14 May 2003 18:27:49 GMT


Biswas, Goutam_Kumar wrote:
>>>>>but I haven't been able to find a solution in the 
>>>>>mailing list archives, or by writing my own analyzer:
>>>>
> 
> I just tried modifying your Analyzer code just a little bit and it worrked
> for me. 
> 
> public class MyAnalyzer extends Analyzer {
> 	public TokenStream tokenStream(String fieldName, Reader reader)
> {
> 		return new CharTokenizer(reader) {
> 			protected boolean isTokenChar(char c) {
> 				//return Character.isLetter(c) || c == '-';
>                         return true;
> 			}
> 		};
> 	}
> 	}
> 
> So this analyzer won't throw away any characters whatsoever. Not sure that's
> what you want though!

Goutam,

That's very strange!  I'm using Lucene 1.3rc1, and when I try parsing 
the query "foo-bar:foo" with your modifications, I still get "body:foo 
-bar:foo".

jp

> 
> -----Original Message-----
> From: Eric Isakson [mailto:Eric.Isakson@sas.com]
> Sent: Tuesday, May 13, 2003 1:50 AM
> To: Lucene Users List
> Subject: RE: '-' character not interpreted correctly in field names
> 
> 
> I just looked at the QueryParser.jj code, your field names never get
> processed by the analyzer. It does look like the query parser will honor
> escapes though. I haven't tried this, but try a query like "foo\-bar:foo"
> and have a look at the QueryParser.jj file for how it handles field names
> when parsing your query.
> 
> Eric
> 
> -----Original Message-----
> From: Jon Pipitone [mailto:jpipitone@mshri.on.ca] 
> Sent: Monday, May 12, 2003 4:03 PM
> To: Lucene Users List
> Subject: Re: '-' character not interpreted correctly in field names
> 
> 
> Hi Otis, Terry,
> 
>  >>>You can write a custom Analyzer that does not remove dashes from
> 
>>>>tokens, and use it for both indexing and searching.  >>>  >>>This
is a
>>>
> frequent question and answer on this list.
> 
> Sorry for the noise, but I haven't been able to find a solution in the 
> mailing list archives, or by writing my own analyzer:
> 
> 	public class MyAnalyzer extends Analyzer {
> 	public TokenStream tokenStream(String fieldName, Reader reader)
> {
> 		return new CharTokenizer(reader) {
> 			protected boolean isTokenChar(char c) {
> 				return Character.isLetter(c) || c == '-';
> 			}
> 		};
> 	}
> 	}
> 
> I parse a query like this:
> 
> 	String queryString = "foo-bar:foo";
> 	String queryResult =
> 		QueryParser.parse(queryString, "body", new MyAnalyzer())
> 
> With the output:
> 	body:foo -bar:foo
> 
> But I would expect the output:
> 	 foo-bar:foo
> 
> If I print out the tokens that MyAnalyzer produces I do get "foo-bar" 
> and then "foo".
> 
> Any pointers on what I'm doing wrong?
> 
> jp
> 
> 
> 
> 
>>>>--- Jon Pipitone <jpipitone@mshri.on.ca> wrote:
>>>>
>>>>
>>>>>Hi all,
>>>>>
>>>>>
>>>>>>I believe that the tokenizer treats a dash as a token
>>>>>
>>>separator.
>>>
>>>
>>>>>>Hence, the only way, as I recall, to eliminate this behavior
>>>>>
>>>is
>>>
>>>
>>>>>>to modify QueryParser.jj so it doesn't do this.  However,
>>>>>
>>>doing
>>>
>>>
>>>>>>this can cause some other problems, like hyphenated words at a 
>>>>>>line break and the like.
>>>>>
>>>>>I've recently started using lucene and I'm running into the same 
>>>>>issue with the query parser.  I'd like to use queries that contain
>>>>
>>>dashes
>>>
>>>
>>>>>in
>>>>>the field name, but as far as I can tell it seems that the
>>>>
>>>current
>>>
>>>
>>>>>query
>>>>>grammar treats field names as terms, and so, as Terry notes, a
>>>>
>>>dash
>>>
>>>
>>>>>becomes a token seperator.
>>>>>
>>>>>Terry suggests modifying the QueryParser.jj -- I would suspect by 
>>>>>creating a seperate non-terminal for field names.
>>>>>
>>>>>Has anyone done any work on this already?  Is modifying 
>>>>>QueryParser.jj the best approach?
>>>>>
>>>>>Thanks,
>>>>>jp
>>>>>
>>>>>
>>>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message