lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Re : Using SolrJ with Tika
Date Fri, 04 Sep 2009 00:01:44 GMT
See https://issues.apache.org/jira/browse/SOLR-1411

On Sep 3, 2009, at 6:47 AM, Angel Ice wrote:

> Hi
>
> This is the solution I was testing.
> I got some difficulties with AutoDetectParser but I think it's the  
> solution I will use in the end.
>
>
> Thanks for the advice anyway :)
>
> Regards,
>
> Laurent
>
>
>
>
> ________________________________
> De : Abdullah Shaikh <abdullah.shaikh@viithiisys.com>
> À : solr-user@lucene.apache.org
> Envoyé le : Jeudi, 3 Septembre 2009, 14h31mn 10s
> Objet : Re: Using SolrJ with Tika
>
> Hi Laurent,
>
> I am not sure if this is what you need, but you can extract the  
> content from
> the uploaded document (MS Docs, PDF etc) using TIKA and then send it  
> to SOLR
> for indexing.
>
> String CONTENT = extract the content using TIKA (you can use
> AutoDetectParser)
>
> and then,
>
> SolrInputDocument doc = new SolrInputDocument();
> doc.addField("DOC_CONTENT", CONTENT);
>
> solrServer.add(doc);
> soltServer.commit();
>
>
> On Wed, Sep 2, 2009 at 5:26 PM, Angel Ice <lbil_fr@yahoo.fr> wrote:
>
>> Hi everybody.
>>
>> I hope it's the right place for questions, if not sorry.
>>
>> I'm trying to index rich documents (PDF, MS docs etc) in SolR/Lucene.
>> I have seen a few examples explaining how to use tika to solve  
>> this. But
>> most of these examples are using curl to send documents to Solr or  
>> an HTML
>> POST with an input file.
>> But i'd like to do it in full java.
>> Is there a way to use Solrj to index the documents with the
>> ExtractingRequestHandler of SolR or at least to get the extracted  
>> xml back
>> (with the extract.only option) ?
>>
>> Many thanks.
>>
>> Laurent.
>>
>>
>>
>>
>
>
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


Mime
View raw message