poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Fisher <dave2w...@comcast.net>
Subject Re: streaming detection of OLE?
Date Tue, 16 Apr 2019 19:38:55 GMT
Hi -

Well it’s early POI stuff. Maybe a patch is possible for the narrow use case the Tika user
has.

I assume that all you need is the first block or two to confirm this looks like an OLE document.

Regards,
Dave

> On Apr 16, 2019, at 12:29 PM, Tim Allison <tallison@apache.org> wrote:
> 
> Thank you, Dave!  The reading examples use POIFSReader, which I had hoped
> was truly streaming, but it creates a POIFS, which requires a read/skip of
> the entire stream IIUC, and then iterates...Or, am I missing something?
> 
> I didn’t try POIFSReader by specifying a subdoc to process, but it looks
> like it opens a POIFS first no matter how you register a listener.
> 
> On Tue, Apr 16, 2019 at 3:20 PM Dave Fisher <dave2wave@comcast.net> wrote:
> 
>> Hi Tim,
>> 
>> Maybe the answer is using HPSF -
>> 
>> https://poi.apache.org/components/hpsf/how-to.html
>> 
>> Regards,
>> Dave
>> 
>>> On Apr 16, 2019, at 11:47 AM, Tim Allison <tallison@apache.org> wrote:
>>> 
>>> All,
>>> In Tika, when we do file type detection of OLE files
>>> (POIFSContainerDetector), we spool the file to disk, open a POIFS and
>>> make a decision based on document/directory names.  A user on
>>> TIKA-2849 does not want to copy the full file from a slow network
>>> drive for detection.  When I tried using a BoundedInputStream with
>>> POIFS, not surprisingly, I got EOF exceptions.
>>> Question: is there any way to do detection in a streaming mode for
>>> OLE files?  Or, is this the best we can do?  Thank you!
>>> 
>>>      Best,
>>> 
>>>                    Tim
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>>> For additional commands, e-mail: user-help@poi.apache.org
>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>> For additional commands, e-mail: user-help@poi.apache.org
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Mime
View raw message