chemistry-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Streit <mcs...@gmail.com>
Subject MIME Types for BMP and XML files - diff between Chemistry Client API and Apache Tika
Date Mon, 23 Jan 2017 17:20:04 GMT
*Chemistry folks*

*Wondering if anyone has stumbled across a similar issue we are seeing.  We
have a custom product solution built around Apache Chemistry 0.11.0 as we
use the following dependencies in our app:*

<dependency>
<groupId>org.apache.chemistry.opencmis</groupId>
<artifactId>chemistry-opencmis-client-api</artifactId>
<version>0.11.0</version>
</dependency>
<dependency>
<groupId>org.apache.chemistry.opencmis</groupId>
<artifactId>chemistry-opencmis-client-bindings</artifactId>
<version>0.11.0</version>
</dependency>
<dependency>
<groupId>org.apache.chemistry.opencmis</groupId>
<artifactId>chemistry-opencmis-client-impl</artifactId>
<version>0.11.0</version>
</dependency>


*During an upload (creation) of a CMIS Document, we set the
ContentStreamMimeType property  *cmis:contentStreamMimeType based on the
value returned by using Apache Tika and its Detector interface.

2 cases are presenting problems for us:  files with extension .bmp and
.xml.   In each case, Tika returns one value that does not seem to align w/
the class:  org.apache.chemistry.opencmis.commons.impl.MimeTypes

For a file *widget.bmp*:

   - The MIME returned from Tika is *"image/x-ms-bmp"* and our application
   code successfully creates the cmis:Document object setting the
cmis:contentStreamMimeType
   to "*image/x-ms-bmp*".
   - If you create the content using the *Chemistry Workbench,* the content
   is created successfully as well, but the cmis:contentStreamMimeType is
   set to "*image/bmp*".

Likewise for a file *another_widget.xml*:

   - The MIME returned from Tika is *"application/xml"* and our application
   code successfully creates the cmis:Document object setting the
cmis:contentStreamMimeType
   to "*application/xml*".
   - If you create the content using the *Chemistry Workbench,* the content
   is created successfully as well, but the cmis:contentStreamMimeType is
   set to "*text/xml*".


It appears, based on what we can determine that the following class
*org.apache.chemistry.opencmis.commons.impl.MimeTypes* includes the
following lines:

   MIME2EXT.put("text/xml", "xml");

   MIME2EXT.put("image/bmp", "bmp");


The reason this can matter, at least for our case, is where our
backend CMIS service implementation is Alfresco Enterprise 5.1 and
apparently the "transformation" service provided to automatically
generate PDF renditions of uploaded files, will not generate the PDF
rendition when the MIME values returned using Apache Tika are
specified.

However, if the values returned using
*org.apache.chemistry.opencmis.commons.impl.MimeTypes **are then used
to set that property**: *cmis:contentStreamMimeType - the PDF
rendition is created successfully in both cases.

Obviously this points more to the internal transformation services
provided by Alfresco (I believe ImageMagik for BMP and PDFBox for XML
files)... *but the broader question is more about the DIFFERENCE in
what Apache Tika returns vs what the CMIS Client API returns*.  It
seems perhaps, Tika may cause downstream issues depending on what is
being done to the contentStream of the cmis:Document instance.


Note that our only reason for using Apache Tika was that we saw it
mentioned in the Manning book on CMIS and Chemistry:
https://www.manning.com/books/cmis-and-apache-chemistry-in-action
(an extremely helpful book BTW)


Thanks,


Mark Streit

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message