Oh rats. Thunderbird ate the indenting. The two examples should be:
multipart/alternative
text/plain
multipart/related
text/html
image/gif
image/gif
application/msword
and
multipart/related
text/html
image/gif
application/msword
the indenting indicates nesting. A message isn't just a bodypart
followed by attachments, it has structure like a file system. Something
which escapes most mail readers. Sigh.
John Haxby wrote:
> lude wrote:
>>> You also mentioned indexing each bodypart ("attachment") separately.
>>> Why? ....
>>> To my mind, there is no use case where it makes sense to search a
>>> particular bodypart
>>
>> I will give you the use case:
>>
>> [snip]
>> 3.) The result list would show this:
>> 1. mail-1 'subject'
>> 'Abstract of the message-text'
>> 2. mail-2 'subject'
>> Attachment with name 'filename.doc' contains 'Abstract of
>> file-content'
>>
>> Another Use-Case would be an extended search, which allows to select if
>> "attached files"
>> should be searched (yes or no).
>
> That's a good use case. File it as a bug and close it WONTFIX :-) The
> problem that you have is trying to determine whether something is
> going to be inline or an attachment. I'll give you a real-life example
> that caught out some old code the other day. We had a message with
> this structure:
>
> multipart/alternative
> text/plain
> multipart/related
> text/html
> image/gif
> image/gif
> application/msword
>
> Is there an attached file in there? Think before you read on.
>
>
>
>
>
>
> The answer should be "no". Are you surprised that at least one client
> decided that there was? What we have is three representations of the
> same document: plain text, html (with two pictures) and MS Word. The
> original, the Word document obviously has the best fidelity and comes
> last. The one client I'm thinking of (and I've lost track of which one
> it was) correctly suppressed the display of the text/plain
> alternative, displayed the HTML with its pictures in-line and then
> mistakenly displayed the Word document as an attachment.
>
> This is a fictional example, but it could exist:
>
> multipart/related
> text/html
> image/gif
> application/msword
>
> The gif image (and let's assume it can be indexed sensibly) is
> "obviously" a picture in the HTML bodypart. What's the word document?
> It's referenced from the HTML as a link just like the picture is. Is
> it an attachment? What's the difference between the word document
> referenced as a link within the multipart/related (by content-id) and
> a link to an external document (by http URL)? From a user perspective
> both are the same, but is one an attachment and the other not? I'm
> being unfair, this is not only an unrealistic problem but there isn't
> a right or a wrong answer. The word document isn't an attachment
> because it doesn't (or shouldn't) appear in the list of attachments
> and it's not in-line because you have to click on something to see it.
>
> So yes, I agree, your use-cases are good; I'm just not sure how you're
> going to identify an attachment :-)
>
> I do like the idea, though, of when you do a search for "xyzzy" that
> you get the abstract of the bodypart that contains "xyzzy" rather than
> the abstract (or subject) of the entire message and I'm going to think
> about that one some more. The problem that immediately springs to mind
> though is that a message can have an arbitrary number of bodyparts so
> if I have BODY-1, BODY-2, ..., BODY-N (where N is unknown) how hard is
> it for me to construct the search? I think I probably should construct
> the search that way because the score depends upon the size of the
> document and it seems to make sense that the document is the bodypart,
> not the entire message, but it seems more complex than is useful for
> mail messages.
>
> jch
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
|