manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: [TIP] Workaround for Solr bugs when Indexing Solr 1.4.1
Date Wed, 30 Mar 2011 16:00:50 GMT
It would be great if this information went at least into the FAQ, and
even better if we added a page to the site documentation.  I'm
thinking maybe a whole page titled "Integrating with Solr", which
would walk you through the process and the pitfalls.  What do you
think?

Karl

On Wed, Mar 30, 2011 at 11:39 AM, Erlend Garåsen
<e.f.garasen@usit.uio.no> wrote:
>
> Solr 1.4.1 has several bugs which makes it difficult to deploy MCF on a
> application server such as Resin. I have struggled a lot with some of these
> bugs and decided to share my experiences in case others have the same
> problems.
>
> First I figured out that I had to upgrade Tika to version 0.8 in order to
> extract the content of MS Office documents etc. Solr 1.4.1 ships with Tika
> 0.4 and will not work:
> https://issues.apache.org/jira/browse/SOLR-1902
>
> Here you have basically two options:
> 1. Install the following branch:
> http://svn.apache.org/viewvc/lucene/solr/branches/branch-1.4/
> 2. Install the latest version from trunk (not recommended for production
> use).
>
> Then I figured out that I couldn't parse dates correctly. You have the
> option in ExtractingRequestHandler to specify different date formats by the
> following example:
> <lst name="date.formats">
>  <str>yyyy-MM-dd</str>
>  <str>dd.MM.yyyy</str>
> </lst>
>
> This will cause a lazy loading error due to the following bug:
> https://issues.apache.org/jira/browse/SOLR-1756
>
> You have the following workaround:
> 1. Install the branch mentioned above and then install the following patch:
> https://issues.apache.org/jira/secure/attachment/12434831/SOLR-1756.patch
> 2. Install the latest version from trunk.
>
> Remember to rebuild Solr and place the necessary jar files in a separate
> folder which your application server has access to (apache-solr-cell*.jar,
> Tika and its depencencies).
>
> Erlend
>
> --
> Erlend Garåsen
> Center for Information Technology Services
> University of Oslo
> P.O. Box 1086 Blindern, N-0317 OSLO, Norway
> Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050
>

Mime
View raw message