hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Milne <d.n.mi...@gmail.com>
Subject Re: Problems with HOD and HDFS
Date Mon, 14 Jun 2010 04:21:45 GMT
Ok, thanks Jeff.

This is pretty surprising though. I would have thought many people
would be in my position, where they have to use Hadoop on a general
purpose cluster, and need it to play nice with a resource manager?
What do other people do in this position, if they don't use HOD?
Deprecated normally means there is a better alternative.

- Dave

On Mon, Jun 14, 2010 at 2:39 PM, Jeff Hammerbacher <hammer@cloudera.com> wrote:
> Hey Dave,
>
> I can't speak for the folks at Yahoo!, but from watching the JIRA, I don't
> think HOD is actively used or developed anywhere these days. You're
> attempting to use a mostly deprecated project, and hence not receiving any
> support on the mailing list.
>
> Thanks,
> Jeff
>
> On Sun, Jun 13, 2010 at 7:33 PM, David Milne <d.n.milne@gmail.com> wrote:
>
>> Anybody? I am completely stuck here. I have no idea who else I can ask
>> or where I can go for more information. Is there somewhere specific
>> where I should be asking about HOD?
>>
>> Thank you,
>> Dave
>>
>> On Thu, Jun 10, 2010 at 2:56 PM, David Milne <d.n.milne@gmail.com> wrote:
>> > Hi there,
>> >
>> > I am trying to get Hadoop on Demand up and running, but am having
>> > problems with the ringmaster not being able to communicate with HDFS.
>> >
>> > The output from the hod allocate command ends with this, with full
>> verbosity:
>> >
>> > [2010-06-10 14:40:22,650] CRITICAL/50 hadoop:298 - Failed to retrieve
>> > 'hdfs' service address.
>> > [2010-06-10 14:40:22,654] DEBUG/10 hadoop:631 - Cleaning up cluster id
>> > 34029.symphony.cs.waikato.ac.nz, as cluster could not be allocated.
>> > [2010-06-10 14:40:22,655] DEBUG/10 hadoop:635 - Calling rm.stop()
>> > [2010-06-10 14:40:22,665] DEBUG/10 hadoop:637 - Returning from rm.stop()
>> > [2010-06-10 14:40:22,666] CRITICAL/50 hod:401 - Cannot allocate
>> > cluster /home/dmilne/hadoop/cluster
>> > [2010-06-10 14:40:23,090] DEBUG/10 hod:597 - return code: 7
>> >
>> >
>> > I've attached the hodrc file below, but briefly HOD is supposed to
>> > provision an HDFS cluster as well as a Map/Reduce cluster, and seems
>> > to be failing to do so. The ringmaster log looks like this:
>> >
>> > [2010-06-10 14:36:05,144] DEBUG/10 ringMaster:479 - getServiceAddr name:
>> hdfs
>> > [2010-06-10 14:36:05,145] DEBUG/10 ringMaster:487 - getServiceAddr
>> > service: <hodlib.GridServices.hdfs.Hdfs instance at 0x8f97e8>
>> > [2010-06-10 14:36:05,147] DEBUG/10 ringMaster:504 - getServiceAddr
>> > addr hdfs: not found
>> > [2010-06-10 14:36:06,195] DEBUG/10 ringMaster:479 - getServiceAddr name:
>> hdfs
>> > [2010-06-10 14:36:06,197] DEBUG/10 ringMaster:487 - getServiceAddr
>> > service: <hodlib.GridServices.hdfs.Hdfs instance at 0x8f97e8>
>> > [2010-06-10 14:36:06,198] DEBUG/10 ringMaster:504 - getServiceAddr
>> > addr hdfs: not found
>> >
>> > ... and so on, until it gives up
>> >
>> > Any ideas why? One red flag is that when running the allocate command,
>> > some of the variables echo-ed back look dodgy:
>> >
>> > --gridservice-hdfs.fs_port 0
>> > --gridservice-hdfs.host localhost
>> > --gridservice-hdfs.info_port 0
>> >
>> > These are not what I specified in the hodrc. Are the port numbers just
>> > set to 0 because I am not using an external HDFS, or is this a
>> > problem?
>> >
>> >
>> > The software versions involved are:
>> >  - Hadoop 0.20.2
>> >  - Python 2.5.2 (no Twisted)
>> >  - Java 1.6.0_20
>> >  - Torque 2.4.5
>> >
>> >
>> > The hodrc file looks like this:
>> >
>> > [hod]
>> > stream                          = True
>> > java-home                       = /opt/jdk1.6.0_20
>> > cluster                         = debian5
>> > cluster-factor                  = 1.8
>> > xrs-port-range                  = 32768-65536
>> > debug                           = 3
>> > allocate-wait-time              = 3600
>> > temp-dir                        = /scratch/local/dmilne/hod
>> >
>> > [ringmaster]
>> > register                        = True
>> > stream                          = False
>> > temp-dir                        = /scratch/local/dmilne/hod
>> > log-dir                         = /scratch/local/dmilne/hod/log
>> > http-port-range                 = 8000-9000
>> > idleness-limit                  = 864000
>> > work-dirs                       =
>> > /scratch/local/dmilne/hod/1,/scratch/local/dmilne/hod/2
>> > xrs-port-range                  = 32768-65536
>> > debug                           = 4
>> >
>> > [hodring]
>> > stream                          = False
>> > temp-dir                        = /scratch/local/dmilne/hod
>> > log-dir                         = /scratch/local/dmilne/hod/log
>> > register                        = True
>> > java-home                       = /opt/jdk1.6.0_20
>> > http-port-range                 = 8000-9000
>> > xrs-port-range                  = 32768-65536
>> > debug                           = 4
>> >
>> > [resource_manager]
>> > queue                           = express
>> > batch-home                      = /opt/torque-2.4.5
>> > id                              = torque
>> > options                         =
>> l:pmem=3812M,W:X="NACCESSPOLICY:SINGLEJOB"
>> > #env-vars                       =
>> > HOD_PYTHON_HOME=/foo/bar/python-2.5.1/bin/python
>> >
>> > [gridservice-mapred]
>> > external                        = False
>> > pkgs                            = /opt/hadoop-0.20.2
>> > tracker_port                    = 8030
>> > info_port                       = 50080
>> >
>> > [gridservice-hdfs]
>> > external                        = False
>> > pkgs                            = /opt/hadoop-0.20.2
>> > fs_port                         = 8020
>> > info_port                       = 50070
>> >
>> > Cheers,
>> > Dave
>> >
>>
>

Mime
View raw message