lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-1902) Tika no longer properly extracts content in Solr
Date Thu, 01 Jul 2010 14:31:51 GMT

    [ https://issues.apache.org/jira/browse/SOLR-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884295#action_12884295
] 

Grant Ingersoll commented on SOLR-1902:
---------------------------------------

I'm not seeing this.  I just tried trunk and it works for me.  Brad, can you produce a test
case?  What happens if you run with extractOnly?  Does it return the content?  I tried both
that and indexing using trunk and the example per the wiki docs and it all appears to work
for me.

> Tika no longer properly extracts content in Solr
> ------------------------------------------------
>
>                 Key: SOLR-1902
>                 URL: https://issues.apache.org/jira/browse/SOLR-1902
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - Solr Cell (Tika extraction)
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>             Fix For: 4.0
>
>
> See http://www.lucidimagination.com/search/document/2ca3fe953038a54f/problem_with_pdf_upgrading_cell#22360c8261801f24
> It appears that since the upgrade to Tika 0.7, Tika is now selecting an EmptyParser when
uploading docs, which then outputs an empty XHTML representation.  Still, it's strange that
the tests pass.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message