lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Morus Walter <morus.wal...@tanto.de>
Subject Re: different analyzer all produce the same index?
Date Mon, 04 Oct 2004 13:43:12 GMT
sergiu gordea writes:
> Daan Hoogland wrote:
> 
> >H all,
> >
> >I try to create different indices using different Analyzer-classes. I 
> >tried standard, german, russian, and cjk. They all produce exactly the 
> >same index file (md5-wise). There are over 280 pages so I expected at 
> >least some differences.
> >
> >  
> >
> Take a look in the lucene source code... Maybe you will find the answer ...
> I asume that all the pages you indexed were written in English, 
> therefore is normal that german, russian and cjk analyzers to
> create identic indexex, but htey should be different  than english one 
> (StandardAnalyzer)
> 
german analyzer definitely won't leave english text as it is, since it
does algorithmic stemming.
E.g. your text get's
tak a look in the luc sourc cod mayb you will find the answ i asum tha all the pag you indexed
wer writt in english therefor is normal tha germa russia and cjk analyx to crea identic indexex
but htey should be diff tha english one standardanalyx
  while std analyzer does not stem at all and gives
take a look in the lucene source code maybe you will find the answer i asume that all the
pages you indexed were written in english therefore is normal that german russian and cjk
analyzers to create identic indexex but htey should be different than english one standardanalyzer

I'd rather suspect some problem with the indexing code.
So my advice is, to check what the analyzer produces.

Morus

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message