lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: '-' character not interpreted correctly in field names
Date Mon, 12 May 2003 15:49:08 GMT
Nah, but we could include it in the FAQ or Lucene Sandbox.  If you have
something ready to go, please email.

Otis

--- Terry Steichen <terry@net-frame.com> wrote:
> Otis,
> 
> Since this does indeed comes up so often, maybe we should include an
> example
> of such an Analyzer with the basic package?
> 
> Regards,
> 
> Terry
> 
> ----- Original Message -----
> From: "Otis Gospodnetic" <otis_gospodnetic@yahoo.com>
> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> Sent: Monday, May 12, 2003 11:41 AM
> Subject: Re: '-' character not interpreted correctly in field names
> 
> 
> > You can write a custom Analyzer that does not remove dashes from
> > tokens, and use it for both indexing and searching.
> >
> > This is a frequent question and answer on this list.
> >
> > Otis
> >
> > --- Jon Pipitone <jpipitone@mshri.on.ca> wrote:
> > > Hi all,
> > >
> > >  > I believe that the tokenizer treats a dash as a token
> separator.
> > >  > Hence, the only way, as I recall, to eliminate this behavior
> is
> > >  > to modify QueryParser.jj so it doesn't do this.  However,
> doing
> > >  > this can cause some other problems, like hyphenated words at a
> > >  > line break and the like.
> > >
> > > I've recently started using lucene and I'm running into the same
> > > issue
> > > with the query parser.  I'd like to use queries that contain
> dashes
> > > in
> > > the field name, but as far as I can tell it seems that the
> current
> > > query
> > > grammar treats field names as terms, and so, as Terry notes, a
> dash
> > > becomes a token seperator.
> > >
> > > Terry suggests modifying the QueryParser.jj -- I would suspect by
> > > creating a seperate non-terminal for field names.
> > >
> > > Has anyone done any work on this already?  Is modifying
> > > QueryParser.jj
> > > the best approach?
> > >
> > > Thanks,
> > > jp
> > >
> > > > From: Terry Steichen <terry@net-frame.com>
> > > > Subject: '-' character not interpreted correctly in field names
> > > > Date: Mon, 3 Feb 2003 09:19:58 -0500
> > > > Content-Type: text/plain;
> > > > charset="iso-8859-1"
> > > >
> > > >
> > > > I believe that the tokenizer treats a dash as a token
> separator.
> > > Hence, the
> > > > only way, as I recall, to eliminate this behavior is to modify
> > > > QueryParser.jj so it doesn't do this.  However, doing this can
> > > cause some
> > > > other problems, like hyphenated words at a line break and the
> like.
> > > >
> > > > (Of course, if you do make such a change, you'll have to go
> back
> > > and reindex
> > > > after such a change.)
> > > >
> > > > I've run into this problem myself and I've 'punted' -  on
> certain
> > > fields,
> > > > when I index, I replace the dash with an underscore.  This
> isn't a
> > > real good
> > > > solution, and it does require me to keep remembering in which
> > > fields I have
> > > > to do this substitution in the search.  But, for the moment it
> > > works.  I'll
> > > > probably go back and make some kind of change later, when I
> have
> > > more time.
> > > >
> > > > HTH,
> > > >
> > > > Terry
> > > >
> > > > ----- Original Message -----
> > > > From: "hermit" <hermit@freestart.hu>
> > > > To: <lucene-user@jakarta.apache.org>
> > > > Sent: Monday, February 03, 2003 2:39 AM
> > > > Subject: '-' character not interpreted correctly in field names
> > > >
> > > >
> > > >> Hello!
> > > >>
> > > >> I have a problem, a big one. I have successfully indexed 600
> MB of
> > > XML
> > > >> data, but the search can't give any results if the field
> contains
> > > any
> > > >> '-' characters .
> > > >> For example: compound@cgx-code:[2 - 5] must match at least two
> > > results
> > > >> based on my XML data but it gives nothing.
> > > >>
> > > >> Can you advice me a simple solution? Or is it a bug?
> > > >>
> > > >>     The Hermit
> > >
> > >
> > >
> > >
> ---------------------------------------------------------------------
> > > To unsubscribe, e-mail:
> lucene-user-unsubscribe@jakarta.apache.org
> > > For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
> > >
> >
> >
> > __________________________________
> > Do you Yahoo!?
> > The New Yahoo! Search - Faster. Easier. Bingo.
> > http://search.yahoo.com
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
> >
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


__________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.
http://search.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message