poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Allison <talli...@apache.org>
Subject Re: streaming detection of OLE?
Date Tue, 16 Apr 2019 21:45:42 GMT
>Maybe a patch is possible for the narrow use case the Tika user has

Y. Will take a closer look at hpsf and POIFS. Definitely belongs in POI.
Thank you!

On Tue, Apr 16, 2019 at 3:45 PM Dave Fisher <dave2wave@comcast.net> wrote:

> Hi -
>
> Well it’s early POI stuff. Maybe a patch is possible for the narrow use
> case the Tika user has.
>
> I assume that all you need is the first block or two to confirm this looks
> like an OLE document.
>
> Regards,
> Dave
>
> > On Apr 16, 2019, at 12:29 PM, Tim Allison <tallison@apache.org> wrote:
> >
> > Thank you, Dave!  The reading examples use POIFSReader, which I had hoped
> > was truly streaming, but it creates a POIFS, which requires a read/skip
> of
> > the entire stream IIUC, and then iterates...Or, am I missing something?
> >
> > I didn’t try POIFSReader by specifying a subdoc to process, but it looks
> > like it opens a POIFS first no matter how you register a listener.
> >
> > On Tue, Apr 16, 2019 at 3:20 PM Dave Fisher <dave2wave@comcast.net>
> wrote:
> >
> >> Hi Tim,
> >>
> >> Maybe the answer is using HPSF -
> >>
> >> https://poi.apache.org/components/hpsf/how-to.html
> >>
> >> Regards,
> >> Dave
> >>
> >>> On Apr 16, 2019, at 11:47 AM, Tim Allison <tallison@apache.org> wrote:
> >>>
> >>> All,
> >>> In Tika, when we do file type detection of OLE files
> >>> (POIFSContainerDetector), we spool the file to disk, open a POIFS and
> >>> make a decision based on document/directory names.  A user on
> >>> TIKA-2849 does not want to copy the full file from a slow network
> >>> drive for detection.  When I tried using a BoundedInputStream with
> >>> POIFS, not surprisingly, I got EOF exceptions.
> >>> Question: is there any way to do detection in a streaming mode for
> >>> OLE files?  Or, is this the best we can do?  Thank you!
> >>>
> >>>      Best,
> >>>
> >>>                    Tim
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> >>> For additional commands, e-mail: user-help@poi.apache.org
> >>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> >> For additional commands, e-mail: user-help@poi.apache.org
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message