hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jakob Homan <jho...@yahoo-inc.com>
Subject Re: Contributing to hadoop
Date Wed, 04 Mar 2009 19:41:55 GMT
There is definitely something to be said for developing via TDD as  
Lohit mentioned.

Hadoop has an extensive set of tools for writing unit tests that run  
on simulated clusters (see http://www.cloudera.com/blog/2008/12/16/testing-hadoop/ 
  for an excellent tutorial).  This will save you time in the long run  
because your testing can be contributed as well as the actual patch  
and there's no need to muck about with configuring clusters, manually  
starting datanodes, etc.

Actually needing a cluster to test or develop patches against is  
pretty rare and indicative of a problem somewhere else.

-Jakob



On Mar 4, 2009, at 11:08 AM, Raghu Angadi wrote:

> Ajit Ratnaparkhi wrote:
>> Hi,
>> thanks for your help.
>> I tried the above mentioned script(one mentioned by Raghu), but  
>> whenever i
>> execute it, following message gets displayed,
>> *datanode running as process <process_id>. Stop it first*.
>> I am starting the single node cluster by command bin/start-dfs.sh  
>> first,
>> after which i am executing the above mentioned script to start second
>> datanode.
>
> Did you try to do what the error message asks you to? Better still,  
> you should try to find where the message is coming from. I realize  
> this is not particularly a useful reply for a user but for a  
> developer, I hope it is.
>
> I just wrote the example script in the mail editor. I did not test  
> it.. may be 'export' before setting HADOOP_* env variables in the  
> script is required. Currently I use a different (a bit less elegant)  
> method for starting multiple nodes. When I switch to this method, I  
> will post the script.
>
> better still, post your script once you get it to working.
>
> Raghu.
>
>> I also tried giving seperate changed configuration from a seperate  
>> directory
>> for config by executing command,
>> *bin/hadoop-daemons.sh --config <config-directory-path> start  
>> datanode*
>> Still it gives same message as above.
>> also in this thread before Ramya mentioned about  
>> DataNodeCluster.java. This
>> will help, but I am not getting how to execute this class. Can you  
>> please
>> help regarding this.
>> thanks,
>> -Ajit.
>> On Thu, Feb 26, 2009 at 6:43 PM, Raghu Angadi <rangadi@yahoo- 
>> inc.com> wrote:
>>> You can run with a small shell script. You need to override couple  
>>> of
>>> environment and config variables.
>>>
>>> something like :
>>>
>>> run_datanode () {
>>>       DN=$2
>>>       HADOOP_LOG_DIR=logs$DN
>>>       HADOOP_PID_DIR=$HADOOP_LOG_DIR
>>>       bin/hadoop-daemon.sh $1 datanode \
>>>         -Dhadoop.tmp.dir=/some/dir/dfs$DN \
>>>         -Ddfs.datanode.address=0.0.0.0:5001$DN \
>>>         -Ddfs.datanode.http.address=0.0.0.0:5008$DN \
>>>         -Ddfs.datanode.ipc.address=0.0.0.0:5002$DN
>>> }
>>>
>>> You can start second datanode like : run_datanode start 2
>>>
>>> Pretty useful for testing.
>>>
>>> Raghu.
>>>
>>>
>>> Ajit Ratnaparkhi wrote:
>>>
>>>> Raghu,
>>>>
>>>> Can you please tell me how to run multiple datanodes on one  
>>>> machine.
>>>>
>>>> thanks,
>>>> -Ajit.
>>>>
>>>> On Thu, Feb 26, 2009 at 9:23 AM, Pradeep Fernando <pradeepfn@gmail.com
>>>>> wrote:
>>>> Raghu,
>>>>> I guess you are asking if it would be more convenient if one had  
>>>>> access
>>>>> to
>>>>> a
>>>>>
>>>>>> larger cluster for development.
>>>>>>
>>>>> exactly.....
>>>>>
>>>>> I have access to many machines and clusters.. but about 99% of my
>>>>>
>>>>>> development happens using single machine for testing. I would  
>>>>>> guess that
>>>>>>
>>>>> is
>>>>>
>>>>>> true for most of the Hadoop developers.
>>>>>>
>>>>> well this is the answer I was looking for....  :D
>>>>> seems to be I have enough resources to contribute to this project.
>>>>> Thanks a lot raghu.
>>>>>
>>>>> regards,
>>>>> Pradeep Fernando.
>>>>>
>>>>>
>


Mime
View raw message