nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arcondo Dasilva <arcondo.dasi...@gmail.com>
Subject Re: Native Hadoop library not loaded and Cannot parse sites contents
Date Fri, 04 Jan 2013 06:38:58 GMT
Hi Lewis,

Thanks for your feedback. I went through the process step by step and I'm
still getting the error :

my plugins folder looks like this :

[image: Inline image 1]

When I ran the parse job it gave me this :

[image: Inline image 2]

when I look at the log file, I get this :

[image: Inline image 3]

My nutch-site.xml contains this :

<property>
  <name>plugin.includes</name>

<value>protocol-http|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|urlnormalizer-(pass|regex|basic)|scoring-opic</value>
 <description>Regular expression naming plugin directory names to
  include.  Any plugin not matching this expression is excluded.
  In any case you need at least include the nutch-extensionpoints plugin. By
  default Nutch includes crawling just HTML and plain text via HTTP,
  and basic indexing and search plugins. In order to use HTTPS please enable
  protocol-httpclient, but be aware of possible intermittent problems with
the
  underlying commons-httpclient library.
  </description>
</property>


am I missing something else ?

Thanks for your precious help.

Arcondo.



On Thu, Jan 3, 2013 at 11:20 PM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:

> Hi Arcondo,
>
> The nekohtml jar should be version 0.9.5, and should reside in
> build/plugins/lib-nekohtml once you build Nutch from source.
> Once you use the default 'runtime' target, the corresponding plugins
> folders should be copied into runtime/local/plugins
> Can you check that the jar is copied to this directory before attempting to
> parse th6e URLs in your segment(s) if using 1.x.
> I'm also assuming that you have parse-html included in the plugin.includes
> property within nutch-site.xml before building the source.
>
> Lewis
>
> On Thu, Jan 3, 2013 at 9:11 PM, Arcondo Dasilva
> <arcondo.dasilva@gmail.com>wrote:
>
> > Thanks for the explanation. I'm more a functional guy with no solid
> > background in Java.
> > Could you give some details on how to enforce it manually ?
> >
> > Thanks in advance, Arcondo
> >
> >
> >
> > On Thu, Jan 3, 2013 at 2:49 PM, Lewis John Mcgibbney <
> > lewis.mcgibbney@gmail.com> wrote:
> >
> > > the jar is not on the classpath
> >
>
>
>
> --
> *Lewis*
>

Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message