lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Terry Steichen" <te...@net-frame.com>
Subject Re: '-' character not interpreted correctly in field names
Date Mon, 12 May 2003 15:46:06 GMT
Otis,

Since this does indeed comes up so often, maybe we should include an example
of such an Analyzer with the basic package?

Regards,

Terry

----- Original Message -----
From: "Otis Gospodnetic" <otis_gospodnetic@yahoo.com>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Monday, May 12, 2003 11:41 AM
Subject: Re: '-' character not interpreted correctly in field names


> You can write a custom Analyzer that does not remove dashes from
> tokens, and use it for both indexing and searching.
>
> This is a frequent question and answer on this list.
>
> Otis
>
> --- Jon Pipitone <jpipitone@mshri.on.ca> wrote:
> > Hi all,
> >
> >  > I believe that the tokenizer treats a dash as a token separator.
> >  > Hence, the only way, as I recall, to eliminate this behavior is
> >  > to modify QueryParser.jj so it doesn't do this.  However, doing
> >  > this can cause some other problems, like hyphenated words at a
> >  > line break and the like.
> >
> > I've recently started using lucene and I'm running into the same
> > issue
> > with the query parser.  I'd like to use queries that contain dashes
> > in
> > the field name, but as far as I can tell it seems that the current
> > query
> > grammar treats field names as terms, and so, as Terry notes, a dash
> > becomes a token seperator.
> >
> > Terry suggests modifying the QueryParser.jj -- I would suspect by
> > creating a seperate non-terminal for field names.
> >
> > Has anyone done any work on this already?  Is modifying
> > QueryParser.jj
> > the best approach?
> >
> > Thanks,
> > jp
> >
> > > From: Terry Steichen <terry@net-frame.com>
> > > Subject: '-' character not interpreted correctly in field names
> > > Date: Mon, 3 Feb 2003 09:19:58 -0500
> > > Content-Type: text/plain;
> > > charset="iso-8859-1"
> > >
> > >
> > > I believe that the tokenizer treats a dash as a token separator.
> > Hence, the
> > > only way, as I recall, to eliminate this behavior is to modify
> > > QueryParser.jj so it doesn't do this.  However, doing this can
> > cause some
> > > other problems, like hyphenated words at a line break and the like.
> > >
> > > (Of course, if you do make such a change, you'll have to go back
> > and reindex
> > > after such a change.)
> > >
> > > I've run into this problem myself and I've 'punted' -  on certain
> > fields,
> > > when I index, I replace the dash with an underscore.  This isn't a
> > real good
> > > solution, and it does require me to keep remembering in which
> > fields I have
> > > to do this substitution in the search.  But, for the moment it
> > works.  I'll
> > > probably go back and make some kind of change later, when I have
> > more time.
> > >
> > > HTH,
> > >
> > > Terry
> > >
> > > ----- Original Message -----
> > > From: "hermit" <hermit@freestart.hu>
> > > To: <lucene-user@jakarta.apache.org>
> > > Sent: Monday, February 03, 2003 2:39 AM
> > > Subject: '-' character not interpreted correctly in field names
> > >
> > >
> > >> Hello!
> > >>
> > >> I have a problem, a big one. I have successfully indexed 600 MB of
> > XML
> > >> data, but the search can't give any results if the field contains
> > any
> > >> '-' characters .
> > >> For example: compound@cgx-code:[2 - 5] must match at least two
> > results
> > >> based on my XML data but it gives nothing.
> > >>
> > >> Can you advice me a simple solution? Or is it a bug?
> > >>
> > >>     The Hermit
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
>
>
> __________________________________
> Do you Yahoo!?
> The New Yahoo! Search - Faster. Easier. Bingo.
> http://search.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message