lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantyn Smirnov <inject...@yahoo.com>
Subject Confusion with Analyzer.tokenStream() re-use in 4.1
Date Wed, 27 Feb 2013 17:25:21 GMT
Dear all,

I'm using the following test-code: 

Document doc = new Document()
Analyzer a = new SimpleAnalyzer( Version.LUCENE_41 )

TokenStream inputTS = a.tokenStream( 'name1', new StringReader( 'aaa bbb
ccc' ) )
Field f = new TextField( 'name1', inputTS )
doc.add f

TokenStream ts = doc.getField( 'name1' ).tokenStreamValue()
ts.reset()

String sb = ''
while( ts.incrementToken() ) sb += ts.getAttribute( CharTermAttribute ) +
'|'
assert 'aaa|bbb|ccc|' == sb

inputTS = a.tokenStream( 'name2', new StringReader( 'xxx zzz' ) )
f = new TextField( 'name2', inputTS )
doc.add f

TokenStream ts = doc.getField( 'name2' ).tokenStreamValue()
ts.reset()

sb = ''
while( ts.incrementToken() ) sb += ts.getAttribute( CharTermAttribute ) +
'|'
assert 'xxx|zzz|' == sb // << FAILS! -> sb == '' and ts.incrementTokent() ==
false

The 1st added field lets read it's tokentStreamValue() tokens, all
subsequent calls bring nothing, unless I re-instantiate the analyzer. 

Another strange thing is, that just before adding a new field to the
document, the tokenStream is filled..

What am I doing wrong? 

TIA




--
View this message in context: http://lucene.472066.n3.nabble.com/Confusion-with-Analyzer-tokenStream-re-use-in-4-1-tp4043427.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message