lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: [update] adding field with constructor demanding tokenStream fails - Field(name, tokenStream, termVector) BUG
Date Sun, 24 Jul 2011 19:20:55 GMT
The problem with your code is simple: You cannot consume a tokenstream twice (like an iterator),
when you consume it with the System.out.println() loop it can no longer be consumed by the
Indexer. The same happens when you add the same TokenStream to several Fields to index.

 

Still I don’t understand the whole problem, looks like a XY-problem:  <http://www.perlmonks.org/index.pl?node_id=542341>
http://www.perlmonks.org/index.pl?node_id=542341

 

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

http://www.thetaphi.de <http://www.thetaphi.de/> 

eMail: uwe@thetaphi.de

 

From: andrzej_gadek [mailto:gadek.a@gmail.com] 
Sent: Sunday, July 24, 2011 9:11 PM
To: dev@lucene.apache.org
Subject: [update] adding field with constructor demanding tokenStream fails - Field(name,
tokenStream, termVector) BUG

 

update ->

 

I have find out that the problem comes from those constructors:

 

new Field(name, tokenStream)
new Field(name, tokenStream, termVector)

 

how?

 

[code]

StandardAnalyzer stAnalyzer = new StandardAnalyzer(Version.LUCENE_30);


TokenStream stStream = stAnalyzer.tokenStream("analizedContent", new StringReader(handler.toString()));
 

TermAttribute term = stStream.addAttribute(TermAttribute.class);
System.out.println("loop 1");
while(stStream.incrementToken()){


System.out.print(term.term() + ":");
}
System.out.println();

stStream.reset();


Field field = new Field("analizedContent", stStream, TermVector.YES);
field.setTokenStream(stStream);


TokenStream tSV = field.tokenStreamValue();
TermAttribute term2 = stStream.addAttribute(TermAttribute.class);

 

System.out.println("loop 2");
while(tSV.incrementToken()){
System.out.print(term2.term() + ":");
}
System.out.println();


System.out.println ("field.readerValue(): " + field.toString());
System.out.println ("field.readerValue(): " + field.readerValue());
System.out.println ("fieldfield.stringValue(): " + field.stringValue());

[/code]

 

and now what a get in console:

[example]


loop 1
welcome:q&a:professional:enthusiast:programmers:check:out:faq:stack:exchange:log:careers:chat:meta:about:faq:stack:overflow:questions:tags:users:badges:unanswered:ask:question:what:difference:between:getpath:getabsolutepath:getca...
/*and many more ;-)*/


field.readerValue(): indexed,tokenized,termVector<analizedContent:>
loop 2

field.readerValue(): null
fieldfield.stringValue(): null
[/example]

 

my comment: after creation of new Field we lose a value of posted tokenStream. So problem
o curse when U want to index pre-analized text, for example with different then default analyzer
(in my case polish one)  

 

my conclusion: something is not working wright!

Probably nobody will help me so i'm going to find some alternate way to do this. For example
will make String form analyzed text and then use default analyzer to parse it. 

 

Andrew

 


Mime
View raw message