lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brad Greenlee (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-1902) Tika no longer properly extracts content in Solr
Date Fri, 04 Jun 2010 19:42:54 GMT

    [ https://issues.apache.org/jira/browse/SOLR-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12875725#action_12875725
] 

Brad Greenlee commented on SOLR-1902:
-------------------------------------

I am still seeing this issue. It works if I downgrade Tika to 0.6, but neither the 0.8-SNAPSHOT
included in the current Solr trunk nor a snapshot from the Tika trunk work for me. I'm running
Java 1.6.0_20 on OS X 10.6.3. I posted about the issue to the solr-user mailing list: http://lucene.472066.n3.nabble.com/TikaEntityProcessor-not-working-td856965.html

> Tika no longer properly extracts content in Solr
> ------------------------------------------------
>
>                 Key: SOLR-1902
>                 URL: https://issues.apache.org/jira/browse/SOLR-1902
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - Solr Cell (Tika extraction)
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>             Fix For: 4.0
>
>
> See http://www.lucidimagination.com/search/document/2ca3fe953038a54f/problem_with_pdf_upgrading_cell#22360c8261801f24
> It appears that since the upgrade to Tika 0.7, Tika is now selecting an EmptyParser when
uploading docs, which then outputs an empty XHTML representation.  Still, it's strange that
the tests pass.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message