poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Hugo <m...@piragua.com>
Subject Re: MS OneNote
Date Fri, 26 Jul 2013 17:25:24 GMT
Thanks Nick!

On Jul 26, 2013, at 11:46 AM, Nick Burch <apache@gagravarr.org> wrote:

> On Fri, 26 Jul 2013, Mike Hugo wrote:
>> I'm looking into basic support (text extraction) for MS OneNote.  I found
>> this bug https://issues.apache.org/bugzilla/show_bug.cgi?id=50750 that has
>> some sample files attached.  Does anyone have any pointers as to where I
>> should get started?
>
> Use POIFSLister to work out if they have a single POIFS/OLE2 stream or multiple. If loads,
assume it's like Outlook (HSMF), use POIFSDump to look at the parts. If one, use POIFSViewer
and docs and try to work out if it's streams of records (eg HSSF), nested records (HSLF, DDF),
or streams (HWPF).
>
> Once you know that, try to do something to do a basic processing of the file structure.
Then add some .dev. tools to print the structure (look at visio, outlook etc for an idea of
how we've done that). Use your own dev tool to play with the structure more. Finally, flesh
out the implementation to cover all the key bits, and write lots of unit tests!
>
> Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
> For additional commands, e-mail: dev-help@poi.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Mime
View raw message