Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C61D4179C6 for ; Tue, 11 Nov 2014 18:29:43 +0000 (UTC) Received: (qmail 79134 invoked by uid 500); 11 Nov 2014 18:29:42 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 79076 invoked by uid 500); 11 Nov 2014 18:29:42 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 79061 invoked by uid 99); 11 Nov 2014 18:29:41 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Nov 2014 18:29:41 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of iorixxx@yahoo.com designates 98.138.91.60 as permitted sender) Received: from [98.138.91.60] (HELO nm22-vm0.bullet.mail.ne1.yahoo.com) (98.138.91.60) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Nov 2014 18:29:36 +0000 Received: from [98.138.100.114] by nm22.bullet.mail.ne1.yahoo.com with NNFMP; 11 Nov 2014 18:26:06 -0000 Received: from [98.138.87.9] by tm105.bullet.mail.ne1.yahoo.com with NNFMP; 11 Nov 2014 18:26:06 -0000 Received: from [127.0.0.1] by omp1009.mail.ne1.yahoo.com with NNFMP; 11 Nov 2014 18:26:06 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 68172.31405.bm@omp1009.mail.ne1.yahoo.com Received: (qmail 69877 invoked by uid 60001); 11 Nov 2014 18:26:06 -0000 X-YMail-OSG: lA72rJMVM1me9iYKvJ1taZOnqh5j0SdxgDOFzC8jjB.bMGb uqDIom8F63kgVfurJIh9LLQQoOfQbyqfk98Nv0.Lt1BZAejLnnKSyZVAVFxH bQGGkoLYs6TLdwx.40OYE8B4f9tqhYpnvig9B7UBe_fYcPjAKZFLa_rju40t QfIU7WF1H63e8Z0KL1hX9wM4Yj9EBYrHQQK6RwmDhyzD7CUcus3mv.XHahij RSAB6sA6ezLvbvlGXPiEagOrKBsx5A8.k8xF6fFrBMCWdkMyw_u6xdujWLP6 7AV8seQ76EUp4jg5DeVq.OBUUnUFYGaSTDLmxt8C42LpHMOKF9y.fBXLW..U _.6LRyJCuCA6Z3bneLl2gpMAR2ALaHvNrF8Hicz7MaMGJonz_Zsa6OiFv7bD H_A.8WdzqC2Ei0BV8ck3IGwzdPcydCy0Jpd7HGpGbZ6NelArKqha6pajFjJL Sf703WG4tFQYcyiW8GT6m0ERMr2hWi9wL0duBdGViOD5rr_U7f4i0lcKdyh0 X_HMawDYobSS7Xo9VCuVX1sKcRwoJsPKe6kHz40oBt2z3rsCT.g-- Received: from [78.167.49.149] by web124701.mail.ne1.yahoo.com via HTTP; Tue, 11 Nov 2014 10:26:05 PST X-Rocket-MIMEInfo: 002.001,SGksCgpXaXRoIHRoYXQgYW5hbHlzZXIsIHlvdXIgc2VhcmNoZXMgKGZvciBzYW1lIHdvcmQsIGJ1dCBkaWZmZXJlbnQgY2FwaXRhbGlzZWQpIGNvdWxkIHJldHVybiBkaWZmZXJlbnQgcmVzdWx0cy4KCkFobWV0CgoKT24gVHVlc2RheSwgTm92ZW1iZXIgMTEsIDIwMTQgNjo1NyBQTSwgTWFydGluIE8nU2hlYSA8YXBweTc0QGRzbC5waXBleC5jb20.IHdyb3RlOgpJbiB0aGUgZW5kIEkgZWRpdGVkIHRoZSBjb2RlIG9mIHRoZSBTdGFuZGFyZEFuYWx5emVyIGFuZCB0aGUKU25vd2JhbGxBbmFseXplciB0byBkaXMBMAEBAQE- X-Mailer: YahooMailWebService/0.8.203.733 References: <007d01cffced$d095ca40$71c15ec0$@dsl.pipex.com> <074201cffcef$64164c80$2c42e580$@thetaphi.de> <009701cffcf1$01fbc9b0$05f35d10$@dsl.pipex.com> <074c01cffcf4$abbd6690$033833b0$@thetaphi.de> <1415632738.4237.YahooMailNeo@web124705.mail.ne1.yahoo.com> <00f801cffdd0$74a5edd0$5df1c970$@dsl.pipex.com> Message-ID: <1415730365.96295.YahooMailNeo@web124701.mail.ne1.yahoo.com> Date: Tue, 11 Nov 2014 10:26:05 -0800 From: Ahmet Arslan Reply-To: Ahmet Arslan Subject: Re: How to disable LowerCaseFilter when using SnowballAnalyzer in Lucene 3.0.2 To: "java-user@lucene.apache.org" In-Reply-To: <00f801cffdd0$74a5edd0$5df1c970$@dsl.pipex.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Virus-Checked: Checked by ClamAV on apache.org Hi, With that analyser, your searches (for same word, but different capitalised) could return different results. Ahmet On Tuesday, November 11, 2014 6:57 PM, Martin O'Shea wrote: In the end I edited the code of the StandardAnalyzer and the SnowballAnalyzer to disable the calls to the LowerCaseFilter. This seems to work. -----Original Message----- From: Ahmet Arslan [mailto:iorixxx@yahoo.com.INVALID] Sent: 10 Nov 2014 15 19 To: java-user@lucene.apache.org Subject: Re: How to disable LowerCaseFilter when using SnowballAnalyzer in Lucene 3.0.2 Hi, Regarding Uwe's warning, "NOTE: SnowballFilter expects lowercased text." [1] [1] https://lucene.apache.org/core/4_3_0/analyzers-common/org/apache/lucene/anal ysis/snowball/SnowballFilter.html On Monday, November 10, 2014 4:43 PM, Uwe Schindler wrote: Hi, > Uwe > > Thanks for the reply. Given that SnowBallAnalyzer is made up of a > series of filters, I was thinking about something like this where I > 'pipe' output from one filter to the next: > > standardTokenizer =new StandardTokenizer (...); standardFilter = new > StandardFilter(standardTokenizer,...); > stopFilter = new StopFilter(standardFilter,...); snowballFilter = new > SnowballFilter(stopFilter,...); > > But ignore LowerCaseFilter. Does this make sense? Exactly. Create a clone of SnowballAnalyzer (from Lucene source package) in your own package and remove LowercaseFilter. But be aware, it could be that snowball needs lowercased terms to correctly do stemming!!! I don't know about this filter, I just want to make you aware. The same applies to stop filter, but this one allows to handle that: You should make stop-filter case insensitive (there is a boolean to do this): StopFilter(boolean enablePositionIncrements, TokenStream input, Set stopWords, boolean ignoreCase) Uwe > Martin O'Shea. > -----Original Message----- > From: Uwe Schindler [mailto:uwe@thetaphi.de] > Sent: 10 Nov 2014 14 06 > To: java-user@lucene.apache.org > Subject: RE: How to disable LowerCaseFilter when using > SnowballAnalyzer in Lucene 3.0.2 > > Hi, > > In general, you cannot change Analyzers, they are "examples" and can > be seen as "best practise". If you want to modify them, write your own > Analyzer subclass which uses the wanted Tokenizers and TokenFilters as > you like. You can for example clone the source code of the original > and remove LowercaseFilter. Analyzers are very simple, there is no > logic in them, it's just some "configuration" (which Tokenizer and > which TokenFilters). In later Lucene 3 and Lucene 4, this is very > simple: You just need to override createComponents in Analyzer class and add your "configuration" there. > > If you use Apache Solr or Elasticsearch you can create your analyzers > by XML or JSON configuration. > > Uwe > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: uwe@thetaphi.de > > > > -----Original Message----- > > From: Martin O'Shea [mailto:m.oshea@dsl.pipex.com] > > Sent: Monday, November 10, 2014 2:54 PM > > To: java-user@lucene.apache.org > > Subject: How to disable LowerCaseFilter when using SnowballAnalyzer > > in Lucene 3.0.2 > > > > I realise that 3.0.2 is an old version of Lucene but if I have Java > > code as > > follows: > > > > > > > > int nGramLength = 3; > > > > Set stopWords = new Set(); > > > > stopwords.add("the"); > > > > stopwords.add("and"); > > > > ... > > > > SnowballAnalyzer snowballAnalyzer = new > > SnowballAnalyzer(Version.LUCENE_30, > > "English", stopWords); > > > > ShingleAnalyzerWrapper shingleAnalyzer = new > > ShingleAnalyzerWrapper(snowballAnalyzer, nGramLength); > > > > > > > > Which will generate the frequency of ngrams from a particular a > > string of text without stop words, how can I disable the > > LowerCaseFilter which forms part of the SnowBallAnalyzer? I want to > > preserve the case of the ngrams generated so that I can perform > > various counts according to the presence / absence of upper case characters in the ngrams. > > > > > > > > I am something of a Lucene newbie. And I should add that upgrading > > the version of Lucene is not an option here. > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org