lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: '-' character not interpreted correctly in field names
Date Mon, 12 May 2003 15:41:41 GMT
You can write a custom Analyzer that does not remove dashes from
tokens, and use it for both indexing and searching.

This is a frequent question and answer on this list.

Otis

--- Jon Pipitone <jpipitone@mshri.on.ca> wrote:
> Hi all,
> 
>  > I believe that the tokenizer treats a dash as a token separator.
>  > Hence, the only way, as I recall, to eliminate this behavior is
>  > to modify QueryParser.jj so it doesn't do this.  However, doing
>  > this can cause some other problems, like hyphenated words at a
>  > line break and the like.
> 
> I've recently started using lucene and I'm running into the same
> issue 
> with the query parser.  I'd like to use queries that contain dashes
> in 
> the field name, but as far as I can tell it seems that the current
> query 
> grammar treats field names as terms, and so, as Terry notes, a dash 
> becomes a token seperator.
> 
> Terry suggests modifying the QueryParser.jj -- I would suspect by 
> creating a seperate non-terminal for field names.
> 
> Has anyone done any work on this already?  Is modifying
> QueryParser.jj 
> the best approach?
> 
> Thanks,
> jp
> 
> > From: Terry Steichen <terry@net-frame.com>
> > Subject: '-' character not interpreted correctly in field names
> > Date: Mon, 3 Feb 2003 09:19:58 -0500
> > Content-Type: text/plain;
> > 	charset="iso-8859-1"
> > 
> > 
> > I believe that the tokenizer treats a dash as a token separator. 
> Hence, the
> > only way, as I recall, to eliminate this behavior is to modify
> > QueryParser.jj so it doesn't do this.  However, doing this can
> cause some
> > other problems, like hyphenated words at a line break and the like.
> > 
> > (Of course, if you do make such a change, you'll have to go back
> and reindex
> > after such a change.)
> > 
> > I've run into this problem myself and I've 'punted' -  on certain
> fields,
> > when I index, I replace the dash with an underscore.  This isn't a
> real good
> > solution, and it does require me to keep remembering in which
> fields I have
> > to do this substitution in the search.  But, for the moment it
> works.  I'll
> > probably go back and make some kind of change later, when I have
> more time.
> > 
> > HTH,
> > 
> > Terry
> > 
> > ----- Original Message -----
> > From: "hermit" <hermit@freestart.hu>
> > To: <lucene-user@jakarta.apache.org>
> > Sent: Monday, February 03, 2003 2:39 AM
> > Subject: '-' character not interpreted correctly in field names
> > 
> > 
> >> Hello!
> >>
> >> I have a problem, a big one. I have successfully indexed 600 MB of
> XML
> >> data, but the search can't give any results if the field contains
> any
> >> '-' characters .
> >> For example: compound@cgx-code:[2 - 5] must match at least two
> results
> >> based on my XML data but it gives nothing.
> >>
> >> Can you advice me a simple solution? Or is it a bug?
> >>
> >>     The Hermit
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


__________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.
http://search.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message