lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Lucene Indexer Encoding problem
Date Tue, 21 Oct 2008 23:13:33 GMT

: //BUT WHEN I GET TEXT LIKE THAT TO ADD TO THE INDEX
: textData = stripper.getText(document);

have you looked at the String in textData to make sure it's what you 
expect?

: This code above properly saves extracted text to the txt file, whioch I dotn
: really need. What I want is to get text and add it to the Index right away.
: When I open index files in notepad I can see garbage instead of russian
: characters. 

you can't "open index files in notepad" ... they are binary files, not 
text files.  if you want to see what actaul terms are index, you can use 
tools like Luke to inspect that data.

BTW...

http://people.apache.org/~hossman/#java-dev
Please Use "java-user@lucene" Not "java-dev@lucene"

Your question is better suited for the java-user@lucene mailing list ...
not the java-dev@lucene list.  java-dev is for discussing development of
the internals of the Lucene Java library ... it is *not* the appropriate
place to ask questions about how to use the Lucene Java library when
developing your own applications.  Please resend your message to
the java-user mailing list, where you are likely to get more/better
responses since that list also has a larger number of subscribers.





-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message