Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 95827 invoked from network); 21 May 2009 13:12:40 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 21 May 2009 13:12:40 -0000 Received: (qmail 52025 invoked by uid 500); 21 May 2009 13:12:50 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 51980 invoked by uid 500); 21 May 2009 13:12:50 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 51970 invoked by uid 99); 21 May 2009 13:12:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 May 2009 13:12:50 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [79.170.194.127] (HELO mail.roo10.com) (79.170.194.127) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 May 2009 13:12:41 +0000 Received: from [192.168.60.63] (78-105-13-3.dsl.cnl.uk.net [78.105.13.3]) by mail.roo10.com (Postfix) with ESMTP id 8229F5340BD8 for ; Thu, 21 May 2009 14:12:18 +0100 (BST) Subject: Re: hit highlighting in lucene ? From: Joel Halbert To: java-user@lucene.apache.org In-Reply-To: <8db6d74a0905210601l705e8000u36a492d7e2ec8171@mail.gmail.com> References: <8db6d74a0905210521s35bb6675sc79745e10c387d93@mail.gmail.com> <1242910090.6688.22.camel@bohr> <8db6d74a0905210601l705e8000u36a492d7e2ec8171@mail.gmail.com> Content-Type: text/plain Organization: SU3 Analytics Date: Thu, 21 May 2009 14:12:33 +0100 Message-Id: <1242911553.6688.30.camel@bohr> Mime-Version: 1.0 X-Mailer: Evolution 2.22.3.1 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org > If I index english pages > with the same indexer, it will not take care of stemming and stop word > removal? correct > Cant we have a single indexer that handles non-eng and eng in > equally good ways? You can have a single indexer, but, if you wanted to use one Analyzer for English documents (with stemming/stops) and another analyzer for other language documents then you would need to know, at the point of both *indexing* and *querying* what language your indexed document and your query were in. This makes the assumption that when a query is in English you only want to query English lang docs, and vica versa. You would also have to mark up your documents with a language identifier (i.e. 0=English, 1=Other Languages) so that when you query you have a conditional on the language. I've not had to deal with multi-language documents though - so I'm sure others will be better placed to offer their experience. -----Original Message----- From: KK Reply-To: java-user@lucene.apache.org To: java-user@lucene.apache.org Subject: Re: hit highlighting in lucene ? Date: Thu, 21 May 2009 18:31:44 +0530 Initially I was using standardAnalyzer but I switched to simpleAnalyzer which I guess doesnot do more that tokenizing[and may be tokenizing] and I think this does not do stemming which I dont/cant do because I've no stemmer for the languages I'm indexing. For indexing and querring I'm using the same SimpelAnalyzer. So as you say I can go for the standard highlighter api which I mentioned in my last mail, and this will handle any language for highlighting support. I should start using this one, right? One more thing. I've a single indexer and searcher that I'm usign for indexing pages of many different non-english languages and as I mentioned earier I'm using simpleAnalyzer, does that mean If I index english pages with the same indexer, it will not take care of stemming and stop word removal? But I dont want to have multiple indexer that is specific to languages. Cant we have a single indexer that handles non-eng and eng in equally good ways? Or any other ideas on the same ? Thanks, KK. On Thu, May 21, 2009 at 6:18 PM, Joel Halbert wrote: > The highlighter should be language independent. So long as you are > consistent with your use of Analyzer between > indexing/query/highlighting. > > As for the most appropriate Analyzer to use for your local language, > this is a seperate question - especially if you are using stop word and > stemming filters. > > The StandardAnalyzer is designed for English since it used the > StopFilter (English words only). > > > -----Original Message----- > From: KK > Reply-To: java-user@lucene.apache.org > To: java-user@lucene.apache.org > Subject: hit highlighting in lucene ? > Date: Thu, 21 May 2009 17:51:13 +0530 > > Hi All, > I was looking for various ways of implementing hit highlighting in Lucene > and found some standard classes that does support highlighting like this > *lucene*.apache.org/java/2_2_0/api/org/apache/*lucene*/search/*highlight* > /package-summary.html > > ik but what i believe is that this is only for english or does it support > other languages. I actually wanted to support highlighting for some > non-english languages which I'm able to index and fetch using utf-8 > encoding. So this means that if I want to have highlighting then I've to > get the utf-8 query and look for the same in the result and add apt tags > whereever required, it essentially boils down to implementing the standard > highlighter. I think the standard highlighter also supports other > languages. > Correct me if i'm wrong. > > Due to my requirement constraints I'm using just simpleAnalyzer and we dont > have tokenizers for these regional languages. Any other ideas of doing the > same would be helpful as well. > > Thanks, > KK. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org