jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hsp <piccina...@ibest.com.br>
Subject Re: jackrabbit 2.6.0 Full Text Search
Date Thu, 04 Jul 2013 12:46:49 GMT
I see that in org.apache.jackrabbit.core.query.lucene.NodeIndexer
    /**
     * Returns <code>true</code> if the provided type is among the types
     * supported by the Tika parser we are using.
     *
     * @param type  the type to check.
     * @return whether the type is supported by the Tika parser we are
using.
     */
    protected boolean isSupportedMediaType(final String type) {
        if (supportedMediaTypes == null) {
            supportedMediaTypes = parser.getSupportedTypes(null);
        }
        return supportedMediaTypes.contains(MediaType.parse(type));
    }

The supportedMediaTypes will be load with:
application/x-tar,
application/x-bzip,
application/x-bzip2,
image/x-icon,
image/vnd.wap.wbmp,
image/vnd.adobe.photoshop,
application/x-cpio,
image/x-xcf,
application/zip,
image/x-ms-bmp,
image/jpeg,
image/png,
application/x-gtar,
application/x-archive,
image/gif,
application/x-gzip

This way the mimetypes I have (txt, office, pdf) will be never extracted...

But, where is the configuration for this? Because the default
tika-config.xml is:
<properties>

  <detectors>

    <detector class="org.apache.tika.detect.DefaultDetector"/>

  </detectors>

  <parsers>

    <parser class="org.apache.tika.parser.DefaultParser"/>

    <parser class="org.apache.tika.parser.EmptyParser">
      
      <mime>application/x-archive</mime>
      <mime>application/x-bzip</mime>
      <mime>application/x-bzip2</mime>
      <mime>application/x-cpio</mime>
      <mime>application/x-gtar</mime>
      <mime>application/x-gzip</mime>
      <mime>application/x-tar</mime>
      <mime>application/zip</mime>
      
      <mime>image/bmp</mime>
      <mime>image/gif</mime>
      <mime>image/jpeg</mime>
      <mime>image/png</mime>
      <mime>image/vnd.wap.wbmp</mime>
      <mime>image/x-icon</mime>
      <mime>image/x-psd</mime>
      <mime>image/x-xcf</mime>
    </parser>

  </parsers>

</properties>

I am feeling almost there 
Bit lacking this in documentation...

Best Regards
Helio



--
View this message in context: http://jackrabbit.510166.n4.nabble.com/jackrabbit-2-6-0-Full-Text-Search-tp4658832p4659000.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.

Mime
View raw message