poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Burch <n...@torchbox.com>
Subject Re: Exchange/PST/Mail parsing (fwd)
Date Tue, 10 Jul 2007 14:28:42 GMT
On Mon, 9 Jul 2007, Travis wrote:
>> Looking at the code, a few thoughts / suggestions:
>> * Can you ever have more than one message in a file? If so, it might be
>>    good to make the top level class handle getting one or more model level
>>    messages out
> Not in the MSG file format.  If there is ever a case of an MSG file 
> inside an MSG file it would be due to "attachments"

OK, I wasn't sure how it all fitted together

> ...which is simply a case of creating an input stream to read the data 
> from the attachment, just like you would for a text document, HTML file, 
> or zip.  I'm not sure if creating a special method in the library itself 
> for pulling this directly into a model is the best route to go...what 
> are your thoughts?

I guess it depends on if the attachment will be wrapped in its own poifs 
or not. If it is, don't worry too much, just have the api allow you to get 
all the MAPIMessages of attachments. If not, probably just a new 
MAPIMessage constructor to take the new thing.

> * I'd be inclined to move quite a bit of the MAPIMessage logic under
>>    /model/. Have a top level class that works directly on the POIFS, then
>>    have that give you one (or more?) model level classes that actually get
>>    (and in future set) all the nice strings
> I was actually thinking about refactoring the "Chunks" object out into 
> subclasses of StringChunk.  (I basically just wrote a bunch of unittests 
> and then did the simplest thing that would pass the tests).

That might be handy, I'm not sure. Whatever you do though, I'd be inclined 
to have MAPIMessage just deal in Chunks, and stuff (1+ classes) under 
model that deal with the user facing stuff. The class would let you get 
the subject as a string, and would call down to MAPIMessage for the chunk 

(In general, we try to split the low level from the user facing stuff, so 
it's easier to understand, and also so we can hide the user from some of 
the more crazy aspects of the file formats)

>> * The Chunks.getInstance().<Chunk> ideom doesn't quite seem the best fit
>>    to me. I might be inclined to go for a hslf style types + static method
>>    (RecordTypes + Record.createRecordForType), or a hssf style factory
>>    (RecordFactory). Do either of those ways look like they might work?
> Yes, It is however required to be a new instance of the chunk, due to 
> having the data value stored in it (this isn't that important right now, 
> but will be extremely important when we start writing the files out to 
> disk).

You'll have the factory / static method create the new chunk instance. The 
idea is to only have the one factory/static method, only one definition of 
all the different kinds of chunks, then many instances. Take a look at the 
code suggested for two different ways of achieving it, and see if either 
of them look like your style

>> * Once the code has settled, it'd be great to get something on the site.
>>    It's all done with forrest and src/documentation/content/xdocs/
> I'll look into forrest, I've never used it before.

If you want to just tweak an existing setup, it's pretty nice. You'll 
probably want forrest 0.5.x though, as I don't think our build environment 
works with forrest 0.6


To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org

View raw message