Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 58011 invoked from network); 27 Apr 2004 22:42:28 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 27 Apr 2004 22:42:28 -0000 Received: (qmail 23783 invoked by uid 500); 27 Apr 2004 22:42:08 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 23758 invoked by uid 500); 27 Apr 2004 22:42:08 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 23741 invoked from network); 27 Apr 2004 22:42:07 -0000 Received: from unknown (HELO merc95.na.sas.com) (149.173.6.5) by daedalus.apache.org with SMTP; 27 Apr 2004 22:42:07 -0000 Received: from MERC23.na.sas.com ([10.19.9.179]) by merc95.na.sas.com with InterScan Messaging Security Suite; Tue, 27 Apr 2004 18:42:13 -0400 X-MimeOLE: Produced By Microsoft Exchange V6.5.6944.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: quoted-printable Subject: RE: languages supported by lucene 1.2.1 in eclipse help system Date: Tue, 27 Apr 2004 18:42:13 -0400 Message-ID: <38BCC8D26B88894DB8D921BD95121CAC0992E5@MERC23.na.sas.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: languages supported by lucene 1.2.1 in eclipse help system thread-index: AcQqBzB3yEph15p6TO+tdDSwANUU9ACmKPFw From: "Eric Isakson" To: "Lucene Users List" X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N I'm assuming what you have is an eclipse plugin that is making use of = the eclipse help system. If what you are doing is relying on the lucene = eclipse plugin, you may want to look at the help system anyway since it = will give you an example of an eclipse plugin that is using the lucene = plugin. The eclipse help system uses lucene but they have their own Analyzer = class that uses BreakIterator to identify tokens for languages other = than english and german. The lucene eclipse plugin just exports the = lucene jar and the html parser so that any plugin that depends on the = lucene plugin (like the help system) will have those jars in the = classpath of their plugin. For english they use the PorterStemFilter with a StopAnalyzer and a = stopword list. For german, they use the GermanAnalyzer supplied by the = lucene jar. In the latest CVS at :pserver:anonymous@dev.eclipse.org:/home/eclipse see the project in = org.eclipse.help.base/src/org/eclipse/help/internal/search in older eclipse versions see the R2_1_maintenance branch of = org.eclipse.help/src/org/eclipse/help/internal/search the class DefaultAnalyzer is the analyzer implementation for languages = other than english and german and WordTokenStream is where they use = BreakIterator to break the content from the reader into individual = tokens. The default Eclipse help system sets these extensions in the = org.eclipse.help.base plugin: Look at the extension point schema in = http://dev.eclipse.org/viewcvs/index.cgi/~checkout~/org.eclipse.help.base= /schema/luceneAnalyzer.exsd?rev=3DHEAD&content-type=3Dtext/plain for how = to declare your own analyzer extensions. Beware though, I read that this = affects all help searches in that language, not just the ones for your = plugin. Also, since the WordTokenStream is in a package with "internal" in its = path, you aren't supposed to ever make use of that class from other = plugins, so if you wanted your own analyzer based on that class and a = stop list, you shouldn't use that class without talking the eclipse help = developers into moving it outside of an internal package. Most of this has been around for a while, so it is probably the same or = very similar in previous eclipse versions, you may need to poke around = at the extension point schema in your eclipse plugins directory to = verify that the extension point works the same way in your version of = eclipse. I haven't used it in versions prior to 3.0M8 Hope this is useful to you, Eric -----Original Message----- From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]=20 Sent: Saturday, April 24, 2004 10:18 AM To: Lucene Users List Subject: Re: languages supported by lucene 1.2.1 in eclipse help system That's no myth :) Core Lucene (even the current version) does not include classes that = know how to analyze/tokenize text in languages other than English, = Russian, and German. However, take a look at the Snowball contributions = in Lucene Sandbox, where a few more analyzers are available, including = those for CJK group of langauges. Otis --- Jason Elliott wrote: > We have a plugin in our eclipse project named org.apache.lucene_1.2.1. > It works quite well in that help system. > =20 > I've been notified that this particular version of the lucene search=20 > analyzer searches well in German and English (GE), but not so well in=20 > the rest of the languages on this planet. > =20 > I have several questions > 1. If it does not search very "well" in French, Italian and Japanese > (FIJ), what does that really mean to a user conducting searches? > a. If this is a myth and the searches work the same in EFIG-J, please > let me know that. > b. If this is not a myth and there are plugins that enable the search > to work well in FIJ? > =20 > Thanks > jason > =20 >=20 --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org