Well, if performance is low its likely not a Hadoop issue. Hadoop tuning is
only required if you start pushing it to limits.
I would indeed check the Nutch wiki. There are important settings such as
threads, queues etc that are very important.
> This is overwhelmingly weighted towards Hadoop configuration.
>
> There are some guidance notes on the Nutch wiki for performance issues
> so you may wish to give them a try first.
>
> On Thu, Dec 15, 2011 at 4:22 PM, Bai Shen <baishen.lists@gmail.com> wrote:
> > So I have Nutch running on a hadoop cluster with three data nodes. The
> > machines are all pretty beefy, but Nutch isn't performing any faster than
> > when I was running in pseudo mode on one machine.
> >
> > How to I set Nutch in order to take full advantage of the cluster?
> >
> > Thanks.
|