Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@apache.org Received: (qmail 68679 invoked from network); 12 May 2003 15:23:09 -0000 Received: from exchange.sun.com (192.18.33.10) by daedalus.apache.org with SMTP; 12 May 2003 15:23:09 -0000 Received: (qmail 6511 invoked by uid 97); 12 May 2003 15:25:15 -0000 Delivered-To: qmlist-jakarta-archive-lucene-user@nagoya.betaversion.org Received: (qmail 6504 invoked from network); 12 May 2003 15:25:15 -0000 Received: from daedalus.apache.org (HELO apache.org) (208.185.179.12) by nagoya.betaversion.org with SMTP; 12 May 2003 15:25:15 -0000 Received: (qmail 68390 invoked by uid 500); 12 May 2003 15:23:06 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 68373 invoked from network); 12 May 2003 15:23:06 -0000 Received: from slri8.mshri.on.ca (HELO EX.mshri.on.ca) (192.197.250.28) by daedalus.apache.org with SMTP; 12 May 2003 15:23:06 -0000 Received: from mshri.on.ca (38.112.98.19 [38.112.98.19]) by EX.mshri.on.ca with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id KZFFW3K2; Mon, 12 May 2003 11:12:10 -0400 Message-ID: <3EBFBC56.4000906@mshri.on.ca> Date: Mon, 12 May 2003 11:23:02 -0400 From: Jon Pipitone Organization: Samuel Lunenfeld Research Institute User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.0.1) Gecko/20020921 Netscape/7.0 X-Accept-Language: en-us, en MIME-Version: 1.0 To: lucene-user@jakarta.apache.org Subject: '-' character not interpreted correctly in field names Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Hi all, > I believe that the tokenizer treats a dash as a token separator. > Hence, the only way, as I recall, to eliminate this behavior is > to modify QueryParser.jj so it doesn't do this. However, doing > this can cause some other problems, like hyphenated words at a > line break and the like. I've recently started using lucene and I'm running into the same issue with the query parser. I'd like to use queries that contain dashes in the field name, but as far as I can tell it seems that the current query grammar treats field names as terms, and so, as Terry notes, a dash becomes a token seperator. Terry suggests modifying the QueryParser.jj -- I would suspect by creating a seperate non-terminal for field names. Has anyone done any work on this already? Is modifying QueryParser.jj the best approach? Thanks, jp > From: Terry Steichen > Subject: '-' character not interpreted correctly in field names > Date: Mon, 3 Feb 2003 09:19:58 -0500 > Content-Type: text/plain; > charset="iso-8859-1" > > > I believe that the tokenizer treats a dash as a token separator. Hence, the > only way, as I recall, to eliminate this behavior is to modify > QueryParser.jj so it doesn't do this. However, doing this can cause some > other problems, like hyphenated words at a line break and the like. > > (Of course, if you do make such a change, you'll have to go back and reindex > after such a change.) > > I've run into this problem myself and I've 'punted' - on certain fields, > when I index, I replace the dash with an underscore. This isn't a real good > solution, and it does require me to keep remembering in which fields I have > to do this substitution in the search. But, for the moment it works. I'll > probably go back and make some kind of change later, when I have more time. > > HTH, > > Terry > > ----- Original Message ----- > From: "hermit" > To: > Sent: Monday, February 03, 2003 2:39 AM > Subject: '-' character not interpreted correctly in field names > > >> Hello! >> >> I have a problem, a big one. I have successfully indexed 600 MB of XML >> data, but the search can't give any results if the field contains any >> '-' characters . >> For example: compound@cgx-code:[2 - 5] must match at least two results >> based on my XML data but it gives nothing. >> >> Can you advice me a simple solution? Or is it a bug? >> >> The Hermit --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org