From java-dev-return-10032-apmail-lucene-java-dev-archive=lucene.apache.org@lucene.apache.org Wed May 04 05:14:48 2005 Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 95501 invoked from network); 4 May 2005 05:14:48 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 4 May 2005 05:14:48 -0000 Received: (qmail 76851 invoked by uid 500); 4 May 2005 05:16:34 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 76821 invoked by uid 500); 4 May 2005 05:16:33 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 76806 invoked by uid 99); 4 May 2005 05:16:33 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from mta1.lbl.gov (HELO mta1.lbl.gov) (128.3.41.24) by apache.org (qpsmtpd/0.28) with ESMTP; Tue, 03 May 2005 22:16:33 -0700 Received: from mta1.lbl.gov (localhost [127.0.0.1]) by mta1.lbl.gov (8.12.10/8.12.10) with ESMTP id j445Ecn3018756 for ; Tue, 3 May 2005 22:14:39 -0700 (PDT) Received: from [10.0.1.3] (adsl-67-125-77-9.dsl.snfc21.pacbell.net [67.125.77.9]) by mta1.lbl.gov (8.12.10/8.12.10) with ESMTP id j445EcP8018753 for ; Tue, 3 May 2005 22:14:38 -0700 (PDT) Mime-Version: 1.0 (Apple Message framework v619.2) In-Reply-To: References: <189d07e140f5642580fcf3cea73a9523@lbl.gov> Content-Type: text/plain; charset=US-ASCII; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: Wolfgang Hoschek Subject: Re: contrib: keywordTokenStream Date: Tue, 3 May 2005 22:14:27 -0700 To: java-dev@lucene.apache.org X-Mailer: Apple Mail (2.619.2) X-Virus-Checked: Checked X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N On May 3, 2005, at 5:26 PM, Erik Hatcher wrote: > Wolfgang, > > I've now added this. Thanks :-) > I'm not seeing how this could be generally useful. I'm curious how > you are using it and why it is better suited for what you're doing > than any other analyzer. > > "keyword tokenizer" is a bit overloaded terminology-wise, though - > look in the contrib/analyzers/src/java area to see what I mean. > > Erik The difference between this and the KeywordTokenizer from the contrib/analyzer is that it - can operate on multiple keywords rather than just a single one. So it's slighly more general. - Takes a collection (typically of String values) as a input rather than a Reader. I can see the java.io.Reader scalability rationale used throughout the analysis APIs, but for many use cases (including my own) Strings are a lot handier (and more efficient to deal with) - the string values are small anyway. So it's a convenient way to add terms (keywords if you like) that have been parsed/massaged into string(s) by some existing external means (e.g. grouped regex scanning of legacy formatted text files into various fields, etc) into an index "as is", without any further transforming analysis. Most folks could write such a (non-essential) utility themselves but it's handy in a similar way that you have the Field.Keyword convenience infrastructure... > "keyword tokenizer" is a bit overloaded terminology-wise, though If you come up with a better name feel free to rename it. Wolfgang. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org