lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Problem with Solr 3.6.1 extracting ODT content using SolrCell's ExtractingRequestHandler
Date Mon, 26 Nov 2012 02:26:31 GMT
Did you commit after you added the document but before you tried the search?

Best
Erick


On Fri, Nov 23, 2012 at 6:25 PM, Brett Melbourne <
bmelbourne@halogensoftware.com> wrote:

> Hi all,
>
> I am encountering a problem where Solr 3.6.1 is not able to extract the
> text content from ODT (Open Office Document) files submitted to the
> ExtractingRequestHandler. I can reproduce this issue against the example
> schema running with jetty.
>
> Executing a simple index request (based on the example in the wiki):
> curl "
> http://localhost:8983/solr/update/extract?literal.id=doc1&uprefix=attr_&fmap.content=attr_content&commit=true
> "<
> http://localhost:8983/solr/update/extract?literal.id=doc1&uprefix=attr_&fmap.content=attr_content&commit=true%22>
> -F "myfile=@testfile.odt"
> returns no errors, and does not generate any exceptions in the log/console.
>
> A query for doc1 returns an empty attr_content field:
> <arr name="attr_content"> <str></str> </arr>
>
> Oddly enough, executing an "extractOnly=true" request against the
> ExtractingRequestHandler with the same ODT file correctly returns the text
> of the file.
>
> I am wondering:
>
> *         Is this a known issue? (I couldn't find any mention of this
> particular issue anywhere...)
>
> *         Are there any workarounds or does anyone have any suggestions?
>
> Thanks,
>
> Brett.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message