lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jon Pipitone <jpipit...@mshri.on.ca>
Subject '-' character not interpreted correctly in field names
Date Mon, 12 May 2003 15:23:02 GMT
Hi all,

 > I believe that the tokenizer treats a dash as a token separator.
 > Hence, the only way, as I recall, to eliminate this behavior is
 > to modify QueryParser.jj so it doesn't do this.  However, doing
 > this can cause some other problems, like hyphenated words at a
 > line break and the like.

I've recently started using lucene and I'm running into the same issue 
with the query parser.  I'd like to use queries that contain dashes in 
the field name, but as far as I can tell it seems that the current query 
grammar treats field names as terms, and so, as Terry notes, a dash 
becomes a token seperator.

Terry suggests modifying the QueryParser.jj -- I would suspect by 
creating a seperate non-terminal for field names.

Has anyone done any work on this already?  Is modifying QueryParser.jj 
the best approach?

Thanks,
jp

> From: Terry Steichen <terry@net-frame.com>
> Subject: '-' character not interpreted correctly in field names
> Date: Mon, 3 Feb 2003 09:19:58 -0500
> Content-Type: text/plain;
> 	charset="iso-8859-1"
> 
> 
> I believe that the tokenizer treats a dash as a token separator.  Hence, the
> only way, as I recall, to eliminate this behavior is to modify
> QueryParser.jj so it doesn't do this.  However, doing this can cause some
> other problems, like hyphenated words at a line break and the like.
> 
> (Of course, if you do make such a change, you'll have to go back and reindex
> after such a change.)
> 
> I've run into this problem myself and I've 'punted' -  on certain fields,
> when I index, I replace the dash with an underscore.  This isn't a real good
> solution, and it does require me to keep remembering in which fields I have
> to do this substitution in the search.  But, for the moment it works.  I'll
> probably go back and make some kind of change later, when I have more time.
> 
> HTH,
> 
> Terry
> 
> ----- Original Message -----
> From: "hermit" <hermit@freestart.hu>
> To: <lucene-user@jakarta.apache.org>
> Sent: Monday, February 03, 2003 2:39 AM
> Subject: '-' character not interpreted correctly in field names
> 
> 
>> Hello!
>>
>> I have a problem, a big one. I have successfully indexed 600 MB of XML
>> data, but the search can't give any results if the field contains any
>> '-' characters .
>> For example: compound@cgx-code:[2 - 5] must match at least two results
>> based on my XML data but it gives nothing.
>>
>> Can you advice me a simple solution? Or is it a bug?
>>
>>     The Hermit



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message