lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhang, Lisheng" <>
Subject Can PDFBox or POI handle multi-byte characters with different enc odings?
Date Fri, 10 Feb 2006 18:45:57 GMT

Currently we are using PDFBox to process PDF files and
POI to process DOC/XLS files, before send strings to lucene
for indexing,

Does any one know if PDFBox or POI can process multi-
byte characters like Japanese with various encodings (whatever
specified in PDF or DOC)?

Thanks very much for helps, Lisheng 

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message