Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 48269 invoked from network); 17 Dec 2007 18:53:32 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 17 Dec 2007 18:53:32 -0000 Received: (qmail 31549 invoked by uid 500); 17 Dec 2007 18:53:15 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 31513 invoked by uid 500); 17 Dec 2007 18:53:15 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 31502 invoked by uid 99); 17 Dec 2007 18:53:15 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Dec 2007 10:53:15 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of mike.klaas@gmail.com designates 64.233.166.178 as permitted sender) Received: from [64.233.166.178] (HELO py-out-1112.google.com) (64.233.166.178) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Dec 2007 18:52:53 +0000 Received: by py-out-1112.google.com with SMTP id d32so10491160pye.12 for ; Mon, 17 Dec 2007 10:52:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:mime-version:in-reply-to:references:content-type:message-id:content-transfer-encoding:from:subject:date:to:x-mailer; bh=kgKhqf7y6NWoFqE1KAfpgIob6H9TpeVRFbJyMsftS0Q=; b=vtM4sfr5gR+3fwxvteWtuqx+NEZ7AHt2rMwUVemIDc4R9KLLWs2kh+kkvUNoH+2PgRJWj2IoHMR1i4nokFOpwtSFoEX4sUoEw9EAo4+i0lJQtIZxu0385X11XqgEmqCG1QEiPjn/r4+FkS1L2dtqqmnP0D1cP43Lcm8g2+0cGXk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:content-type:message-id:content-transfer-encoding:from:subject:date:to:x-mailer; b=BVXeDBKcOqCrq7qazKVS+8Ly3wxXGEfbwP4wDrWBLV7OCEvdCc28hJIjnwv6x/20/Gx/qqbHu6jhiEWaBAAgZ42moeIpszGsJlDAo3C6IyrmukUI3L0kzstN6jJwKZ4JhcKZy5jYslKwrD6uFqtG5Ky+Go5ZzRH0LplM3VgeZQk= Received: by 10.64.131.4 with SMTP id e4mr15556980qbd.68.1197917573184; Mon, 17 Dec 2007 10:52:53 -0800 (PST) Received: from ?192.168.1.104? ( [24.215.75.34]) by mx.google.com with ESMTPS id d12sm9188407qbc.2007.12.17.10.52.51 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 17 Dec 2007 10:52:51 -0800 (PST) Mime-Version: 1.0 (Apple Message framework v752.2) In-Reply-To: <99833356D526854CAF8ECF4D7C769D49C78EFA@MAIL05.northamerica.cerner.net> References: <99833356D526854CAF8ECF4D7C769D49C78B9D@MAIL05.northamerica.cerner.net> <2789FED1-21C0-4D38-941D-FE52DF4703B1@gmail.com> <99833356D526854CAF8ECF4D7C769D49C78EFA@MAIL05.northamerica.cerner.net> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <43E84B8D-FCFC-4495-A56F-17EE6E7EA16E@gmail.com> Content-Transfer-Encoding: 7bit From: Mike Klaas Subject: Re: thoughts/suggestions for analyzing/tokenizing class names Date: Mon, 17 Dec 2007 10:52:48 -0800 To: java-user@lucene.apache.org X-Mailer: Apple Mail (2.752.2) X-Virus-Checked: Checked by ClamAV on apache.org Either index them as a series of tokens: org org.apache org.apache.lucene org.apache.lucene.document org.apache.lucene.document.Document or index them as a single token, and use prefix queries (this is what I do for reverse domain names): classname:(org.apache org.apache.*) Note that "classname:org.apache*" would probably be wrong--you might not want to match org.apache-fake.lucene.document regards, -Mike On 17-Dec-07, at 9:39 AM, Beyer,Nathan wrote: > Good point. > > I don't want the sub-package names on their own to match. > > Text (class name) > - "org.apache.lucene.document.Document" > Queries that would match > - "org.apache", "org.apache.lucene.document" > Queries that DO NOT match > - "apache", "lucene", "document" > > -Nathan > > -----Original Message----- > From: Mike Klaas [mailto:mike.klaas@gmail.com] > Sent: Monday, December 17, 2007 11:29 AM > To: java-user@lucene.apache.org > Subject: Re: thoughts/suggestions for analyzing/tokenizing class names > > On 15-Dec-07, at 3:14 PM, Beyer,Nathan wrote: > >> I have a few fields that use package names and class names and I've >> been >> looking for some suggestions for analyzing these fields. >> >> A few examples - >> >> Text (class name) >> - "org.apache.lucene.document.Document" >> Queries that would match >> - "org.apache" , "org.apache.lucene.document" >> >> Text (class name + method signature) >> -- "org.apache.lucene.document.Document#add(Fieldable)" >> Queries that would match >> -- "org.apache.lucene", "org.apache.lucene.document.Document#add" >> >> Any thoughts on how to approach tokenizing these types of texts? > > Perhaps it would help to include some examples of queries you _don't_ > want to match. For all the examples above, simply tokenizing > alphanumeric components would suffice. > > -Mike > > ---------------------------------------------------------------------- > CONFIDENTIALITY NOTICE This message and any included attachments > are from Cerner Corporation and are intended only for the > addressee. The information contained in this message is > confidential and may constitute inside or non-public information > under international, federal, or state securities laws. > Unauthorized forwarding, printing, copying, distribution, or use of > such information is strictly prohibited and may be unlawful. If you > are not the addressee, please promptly delete this message and > notify the sender of the delivery error by e-mail or you may call > Cerner's corporate offices in Kansas City, Missouri, U.S.A at (+1) > (816)221-1024. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org