lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Krugler <kkrugler_li...@transpac.com>
Subject Re: Need help on Solr Cell usage with specific Tika parser
Date Mon, 14 Jun 2010 16:27:35 GMT
Hi Olivier,

Are you setting the mime type explicitly via the stream.type parameter?

-- Ken

On Jun 14, 2010, at 9:14am, olivier sallou wrote:

> Hi,
> I use Solr Cell to send specific content files. I developped a  
> dedicated
> Parser for specific mime types.
> However I cannot get Solr accepting my new mime types.
>
> In solrconfig, in update/extract requesthandler I specified <str
> name="tika.config">./tika-config.xml</str> , where tika-config.xml  
> is in
> conf directory (same as solrconfig).
>
> In tika-config I added my mimetypes:
>
> <parser name="parse-readseq"
> class="org.irisa.genouest.tools.readseq.ReadSeqParser">
>                <mime>biosequence/document</mime>
>                <mime>biosequence/embl</mime>
>                <mime>biosequence/genbank</mime>
>        </parser>
>
> I do not know for:
>  <mimeTypeRepository resource="./tika-mimetypes.xml" magic="false"/>
>
> whereas path to tika mimetypes should be absolute or relative... and  
> even if
> this file needs to be redefined if "magic" is not used.
>
>
> When I run my update/extract, I have an error that "biosequence/ 
> document"
> does not match any known parser.
>
> Thanks
>
> Olivier

--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g





Mime
View raw message