lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Terry Steichen" <te...@net-frame.com>
Subject Re: '-' character not interpreted correctly in field names
Date Mon, 03 Feb 2003 14:19:58 GMT
I believe that the tokenizer treats a dash as a token separator.  Hence, the
only way, as I recall, to eliminate this behavior is to modify
QueryParser.jj so it doesn't do this.  However, doing this can cause some
other problems, like hyphenated words at a line break and the like.

(Of course, if you do make such a change, you'll have to go back and reindex
after such a change.)

I've run into this problem myself and I've 'punted' -  on certain fields,
when I index, I replace the dash with an underscore.  This isn't a real good
solution, and it does require me to keep remembering in which fields I have
to do this substitution in the search.  But, for the moment it works.  I'll
probably go back and make some kind of change later, when I have more time.

HTH,

Terry

----- Original Message -----
From: "hermit" <hermit@freestart.hu>
To: <lucene-user@jakarta.apache.org>
Sent: Monday, February 03, 2003 2:39 AM
Subject: '-' character not interpreted correctly in field names


> Hello!
>
> I have a problem, a big one. I have successfully indexed 600 MB of XML
> data, but the search can't give any results if the field contains any
> '-' characters .
> For example: compound@cgx-code:[2 - 5] must match at least two results
> based on my XML data but it gives nothing.
>
> Can you advice me a simple solution? Or is it a bug?
>
>     The Hermit
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message