From general-return-3865-apmail-lucene-general-archive=lucene.apache.org@lucene.apache.org Fri Aug 3 13:37:50 2012 Return-Path: X-Original-To: apmail-lucene-general-archive@www.apache.org Delivered-To: apmail-lucene-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9382BDA44 for ; Fri, 3 Aug 2012 13:37:50 +0000 (UTC) Received: (qmail 51237 invoked by uid 500); 3 Aug 2012 13:37:50 -0000 Delivered-To: apmail-lucene-general-archive@lucene.apache.org Received: (qmail 51128 invoked by uid 500); 3 Aug 2012 13:37:50 -0000 Mailing-List: contact general-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@lucene.apache.org Delivered-To: mailing list general@lucene.apache.org Received: (qmail 51120 invoked by uid 99); 3 Aug 2012 13:37:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Aug 2012 13:37:49 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of leegee@gmail.com designates 74.125.82.176 as permitted sender) Received: from [74.125.82.176] (HELO mail-we0-f176.google.com) (74.125.82.176) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Aug 2012 13:37:42 +0000 Received: by weyu3 with SMTP id u3so472653wey.35 for ; Fri, 03 Aug 2012 06:37:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:reply-to:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=CpHmtqK/SEz5nFfCrcmWXjoZEGVPKvTlNscpzo+y4Z8=; b=IVtb5nx61wkD++T9n5tAY/6AQiUO94BRk88IXcBZlnZQ6rlXoCDrwHMsOEnCimvQbe Yx58qXKej4FvrNjvGU++sPvoj3dCcvSdCvxuv44JrkQv6LLqWpHWB2erhtXEpQ8O3X3V 3Zd2fGm9uus6LxuxIxjKA+q3SHarIWQe0NWJlPT/UHsubSVXRSW0itCd3yEHE7C2TdYU quahMx2X7pxwFEdTYwPlNXBwsVKn2dF/W4t4drMXjVFDO78d4cY9jnG62usfkqCh+82G gXZks0H45NLMZUzbfg2CgTWIAkvCgwlo9g64n9wn2Sy9s6quw+wtpW7s7GzbUg2skzLo mhCQ== Received: by 10.180.78.5 with SMTP id x5mr4469847wiw.13.1344001040773; Fri, 03 Aug 2012 06:37:20 -0700 (PDT) Received: from Lee.local (catv-178-48-88-150.catv.broadband.hu. [178.48.88.150]) by mx.google.com with ESMTPS id bc2sm25977239wib.0.2012.08.03.06.37.18 (version=TLSv1/SSLv3 cipher=OTHER); Fri, 03 Aug 2012 06:37:19 -0700 (PDT) Message-ID: <501BD40C.1010206@leegoddard.net> Date: Fri, 03 Aug 2012 15:37:16 +0200 From: Lee Goddard Reply-To: lee@leegoddard.net User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:14.0) Gecko/20120713 Thunderbird/14.0 MIME-Version: 1.0 To: Robert Muir CC: general@lucene.apache.org Subject: Re: Custom Analyzer Strategy? References: <50191006.4020205@leegoddard.net> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org On 03/08/2012 14:56, Robert Muir wrote: > On Wed, Aug 1, 2012 at 7:16 AM, Lee Goddard wrote: >> New to Lucene development, though I have been an indexing user for some >> years, I find a need to develop an analyzer that reads a bespoke-format >> (binary) file. I was wondering: > Hello: usually you would not process such a binary file with an > analyzer, you would parse the binary file into the Fields you care > about and then add them to your Document. > > The analyzer is separate from that "parsing", its the way you specify > text preprocessing at both index and query time like lowercasing, > stemming, etc. > >> * Are there tutorials on analyzer development, or (ideally) an example >> custom simple analyzer? > Start with http://lucene.apache.org/core/3_6_1/api/core/org/apache/lucene/analysis/package-summary.html#package_description > >> * Is it possible to send the output of one analyzer to another, and if so, >> is it possible to have that chain defined in the configuration of Lucene (or >> Solr...), or would it need to be hard-coded? > you can configure your analysis chain declaratively in Solr in a > configuration file. Thanks very much, Robert. And now I see the package summary JavaDoc you pointed to, I feel quite silly. Cheers Lee