lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carsten Schnober <schno...@ids-mannheim.de>
Subject Re: TokenStreamComponents in Lucene 4.0
Date Mon, 19 Nov 2012 16:48:13 GMT
Am 19.11.2012 17:44, schrieb Carsten Schnober:

Hi again,
just a little update:

> However, after switching to Lucene 4 and TokenStreamComponents, I'm
> getting a strange behaviour: only the first document in the collection
> is tokenized properly. The others do appear in the index, but
> un-tokenized, although I have tried not to change anything in the logic.
> The Analyzer now has this createComponents() method calling the custom
> TokenStreamComponents class with my custom Tokenizer:
> 
> @Override
> protected TokenStreamComponents createComponents(String fieldName,
> Reader reader) {
>   final Tokenizer source = new KoraTokenizer(reader);
>   final TokenStreamComponents tokenstream = new
> KoraTokenStreamComponents(source);
>   try {
>     source.close();
>   } catch (IOException e) {
>     jlog.error(e.getLocalizedMessage());
>     e.printStackTrace();
>   }
>   return tokenstream;
> }

When using the packaged Analyzer.TokenStreamComponents class instead of
my custom KoraTokenStreamComponents class, the behaviour does not seem
to change:

-  final TokenStreamComponents tokenstream = new
KoraTokenStreamComponents(source);
+  final TokenStreamComponents tokenstream = new
TokenStreamComponents(source);

Best,
Carsten


-- 
Institut für Deutsche Sprache | http://www.ids-mannheim.de
Projekt KorAP                 | http://korap.ids-mannheim.de
Tel. +49-(0)621-43740789      | schnober@ids-mannheim.de
Korpusanalyseplattform der nächsten Generation
Next Generation Corpus Analysis Platform

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message