poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Burch <apa...@gagravarr.org>
Subject Re: MS OneNote
Date Fri, 26 Jul 2013 16:45:52 GMT
On Fri, 26 Jul 2013, Mike Hugo wrote:
> I'm looking into basic support (text extraction) for MS OneNote.  I found
> this bug https://issues.apache.org/bugzilla/show_bug.cgi?id=50750 that has
> some sample files attached.  Does anyone have any pointers as to where I
> should get started?

Use POIFSLister to work out if they have a single POIFS/OLE2 stream or 
multiple. If loads, assume it's like Outlook (HSMF), use POIFSDump to look 
at the parts. If one, use POIFSViewer and docs and try to work out if it's 
streams of records (eg HSSF), nested records (HSLF, DDF), or streams 
(HWPF).

Once you know that, try to do something to do a basic processing of the 
file structure. Then add some .dev. tools to print the structure (look at 
visio, outlook etc for an idea of how we've done that). Use your own dev 
tool to play with the structure more. Finally, flesh out the 
implementation to cover all the key bits, and write lots of unit tests!

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Mime
View raw message