chemistry-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Florian Müller <f...@apache.org>
Subject Re: MIME types preventing addition of content
Date Sat, 30 Aug 2014 11:45:19 GMT
Hi Tim,

You are supposed to set the MIME type. Repositories should accept it
even if it doesn't make sense. (For example, a zero byte document is
never a valid Word file.)

Some repositories handle content with no MIME type or the MIME type
"application/octet-stream" differently and try to determine the correct
MIME type. Alfresco does it, the SAP Document Service does it, and
probably others too. But you cannot rely on that.
(@Jay: What does FileNet do?)

SharePoint does it completely differently (and is not spec compliant in
this regard). It ignores your MIME type. Instead it determines the MIME
type when you access the document based on the file extension. That is,
if you change the name (and extension) after you've uploaded the
document, you get a different MIME type. The worst part is that if
SharePoint doesn't know the file extension, it doesn't return any MIME type.

I hope that helps.


- Florian

> Update - the problem I was having was against FileNet P8.  I tried the
> same code against Alfresco 4.2f, and I also got strange behaviour
> (although different).
> 
> With Alfresco, if you explicitly set the mime type when you add the
> document, the mime type gets set OK, but the content is doesn't seem to
> be present (length of 0 in workbench).  Also with Alfresco, if you set
> the mime type to 'application/octet-stream' it seems to override this
> with the actual correct mime type, and import the document successfully
> (content and all).
> 
> So I'm a bit confused as to how all this is supposed to work.  Should we
> not be setting the mime type ourselves?  Why do the repository
> implementations seem to want to interfere with this?
> 
> Again, this is mostly using a 'old' Word format document (.doc) and the
> application/msword mime type, of which our customers repositories are
> full of.
> 
> Tim
> 
> 
> 
> 
> 
> 
> On Fri, Aug 29, 2014 at 5:09 PM, Tim Webster <tim.webster@gmail.com
> <mailto:tim.webster@gmail.com>> wrote:
> 
>     Hi,
> 
>     I'm deriving the mime type of documents (using Apache Tika) and then
>     adding them to my CMIS repository using a Chemistry Java client.
>      For some reason, certain mime types seem to prevent the content
>     from being added.
> 
>     The document gets added, but the mime type and content are empty.
> 
>     The biggest offender seems to be application/msword, and there seem
>     to be others.
> 
>     I've used the same code for the past couple of years to do this, and
>     the only thing I've changed is the value of the mime type in the
>     ContentStream. Previously, I used to just set everything to
>     'application/octet-stream'.  If I switch back to that, it works
>     fine.  The Tika libraries are doing their job just fine, and
>     returning the correct mime type.
> 
>     If I add the same document through the workbench, the document gets
>     added, and the mime type and content are totally fine.
> 
>     I've attached a screenshot of my CMIS workbench so you can see the
>     effect.
> 
>     Anyone have any ideas?
> 
>     Inline image 1
> 
> 

Mime
View raw message