poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Norman M <mnoman15...@yahoo.com>
Subject Is POI really using streaming to parse files?
Date Fri, 09 Nov 2012 22:47:31 GMT
I am using Apache Tika to extract text from PPT/PPTX files.

Tika is using Apache POI to extract texts.

I tried to compare processing time and memory usage for POI vs Aspose (www.aspose.com)


The processing time and memory requirement for Tika (i-e POI) is almost double of Aspose.

Is Poi really using streaming to parse files? Why it is taking much more memory than Aspose
that I thought reads the whole file into memory.

I found this thread http://lucene.472066.n3.nabble.com/Large-xls-files-always-loaded-into-memory-td646710.html
where Tika founder is claiming that POi is not steaming inout files. That thread is quite
old, is it still the same?

Any response will be appreciated.

Thanks,
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message