manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Indexing Solr with the web crawler
Date Tue, 25 Jan 2011 15:30:13 GMT
Perhaps it is acceptable to use the release version of Solr, plus
specific patches for the ticket or tickets in question?  There should
be a Solr tag for the release - you might be able to svn export from
that tag and pull the release code into your local svn, before
applying the patch, and then committing that also.  That way you have
a reproducible image to work with.  That's often what we needed to do
at MetaCarta.  It's a pain I know but that's life in the open-source
world.

Karl


On Tue, Jan 25, 2011 at 4:59 AM, Erlend Garåsen <e.f.garasen@usit.uio.no> wrote:
> On 24.01.11 14.48, Karl Wright wrote:
>
>> Thanks for the information.
>> What I'd like to do is wait until your research is done and then post
>> the rough instructions to dev@lucene.apache.org for confirmation that
>> your approach is the preferred one.  I'd also like to know if you
>> check out the latest solr release from the svn tag and just build it,
>> whether you have any of these problems.  I've been building
>> solr/lucene trunk and not using the binary distribution, which may be
>> why I never noticed that this has gone away in the main distribution.
>
> OK, it might take a week or so, but here are some details I just figured
> out:
> - There is a bug with the current Solr release (1.4.1) which makes it
> impossible to extract the content by using the ExtractingRequestHandler. I
> think it is related to this Jira issue:
> https://issues.apache.org/jira/browse/SOLR-1902?page=com.atlassian.jira.plugin.ext.subversion%3Asubversion-commits-tabpanel
> - This issue is now fixed, and if I check out the latest release from trunk,
> content can now be extracted by Tika.
>
> What I need to test is whether I need to place the tika/extracting jars
> manually in a lib folder when I deploy solr.war on Resin by using the latest
> trunk version from SVN. When this is done, I can inform you.
>
> Anyway, I don't like to build a search application for my university by
> using the latest version from trunk, I would rather prefer to use an
> official release. So maybe I will try to implement the changes from trunk
> instead. I can already now see that Tika has a newer version in trunk
> compared to the official 1.4.1 release, i.e. tika-core-0.8.jar instead of
> tika-core-0.4.jar.
>
> Erlend
>
> --
> Erlend Garåsen
> Center for Information Technology Services
> University of Oslo
> P.O. Box 1086 Blindern, N-0317 OSLO, Norway
> Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050
>

Mime
View raw message