hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rita Liu <crystaldol...@gmail.com>
Subject Re: Hadoop basics
Date Tue, 17 Aug 2010 22:44:47 GMT
Hi Piyush and Amit:

Thanks so much for your kind suggestions!! I am trying log4j now :))
Here is a beginner-level question -- where should I put the loggers?

Say I start with a MapReduce application (say, WordCount.java), and I
want to trace the code so that I could know which methods (from which
classes) have been called and what have been done while they are being
called, before hadoop finishes executing the application. In order to
write log files to record those information, I have to know where
(i.e. in which files) to put my loggers. However, without knowing
which methods (from which classes) are called, how do I know where to
put the loggers? If I just put my logger inside the main method of
WordCount.java, it probably doesn't make too much sense ...

Is there any way to trace the call stack so that I would know where to
put my loggers (with log4j)? Or: there might be a smart way for me to
create a logger so that I would get what I need?

Also -- although I know debugging in a distributed system could be a
pain, I wonder if I could just load the whole hadoop project into
Eclipse, say, and trace the code locally without actually running the
application on the cluster. There are many libs and consequent
dependencies in hadoop -- how may I load hadoop so that I could
locally trace it?

If possible, please help me out ... Piyush, Amit, and all the experts
here? Thank you very much!

Best,
Rita :))


On Sun, Aug 15, 2010 at 11:27 PM, amit kumar verma <v.amit@verchaska.com> wrote:
>
>  Hi Rita,
>
> If you reached a place where you need to use api like hahoop, forget about the debugging
the code. Your code must be syntactically and logically error free, for rest of the things
logging is enough. Try log4j only.
>
> Thanks,
> Amit Kumar Verma
> Verchaska Infotech Pvt. Ltd.
>
>
>
> On 08/15/2010 11:10 AM, Rita Liu wrote:
>>
>> Hi Harsh and Piyush! Thank you very much. So it seems like it would be best
>> if I use log4j to trace, and debugging with a debugger is still possible if
>> I set "mapred.job.tracker" to be "local" and "fs.default.name" to be
>> "local", in hadoop-site.xml. Plus: in hadoop-env.sh, I should specify
>> HADOOP_OPTS to be:
>>
>> "-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8000" (why
>> 8000? also, what does "-agentlib:jdwp=transport=dt_socket" mean?)
>>
>> ... in order to use a debugger. Is my understanding correct? :)
>>
>> If so -- then which debugger do you use? May I know? Thanks a lot! I am also
>> going to try log4j now!
>>
>> Many thanks,
>> -Rita :))
>>
>> On Sat, Aug 14, 2010 at 10:22 PM, Piyush Garg<piyushgarg80@gmail.com>wrote:
>>
>>> Hi Smith,
>>>
>>> step debugging also works in hadoop as with other java applications.
>>> export
>>>
>>> HADOOP_OPTS="-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8000"
>>> 'suspend=y' is to let the jvm suspend until the remote debugger is
>>> attached.
>>>
>>> Thanks and Regards
>>> Piyush Garg
>>>
>>>
>>> On Sunday 15 August 2010 10:39 AM, smith jack wrote:
>>>>
>>>> that means you can only trace by log,
>>>> and not possible to debug hadoop using step debug, haha
>>>> distributed system always introduce extra complexity and confusing
>>>
>>> issues.
>>>>
>>>> 2010/8/15 Piyush Garg<piyushgarg80@gmail.com>:
>>>>
>>>>> Hi Rita,
>>>>>
>>>>> You can put log4j logger debug statements in the code. log4j library
is
>>>>> part of hadoop framework and there is already a log4j.properties file
in
>>>>> hadoop conf directory and all the output logs are saved in hadoop logs
>>>>> directory.
>>>>>
>>>>> Thanks and Regards
>>>>> Piyush Garg
>>>>>
>>>>>
>>>>> On Sunday 15 August 2010 10:20 AM, Rita Liu wrote:
>>>>>
>>>>>> Thank you very much, Piyush! :) May I know more about how to use
>>>
>>> "traces"?
>>>>>>
>>>>>> And -- yes, please teach me if possible, experts! :)
>>>>>>
>>>>>> Thanks a lot,
>>>>>> -Rita :))
>>>>>>
>>>>>> On Sat, Aug 14, 2010 at 9:42 PM, Piyush Garg<piyushgarg80@gmail.com>
>>>
>>> wrote:
>>>>>>
>>>>>>
>>>>>>> Hi Rita,
>>>>>>>
>>>>>>> I have just started to learn hadoop as well, I know there is
a long
>>>
>>> way
>>>>>>>
>>>>>>> to go.
>>>>>>> I found some useful links which I am sharing with you.
>>>>>>>
>>>>>>> Hadoop Tutorial - YDN
>>>>>>> <http://developer.yahoo.com/hadoop/tutorial/index.html>
 excellent
>>>>>>> beginners tutorial and well organized.
>>>>>>> Running Hadoop On Ubuntu Linux (Single-Node Cluster) - Michael
G. Noll
>>>>>>> <
>>>>>>>
>>> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29
>>>>>>>
>>>>>>> Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)
>>>>>>> <
>>>>>>>
>>> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29
>>>>>>>
>>>>>>> The tutorial on the hadoop wiki
>>>>>>> <http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html>
>>>
>>> is
>>>>>>>
>>>>>>> too much for a beginner.
>>>>>>>
>>>>>>> Debugger:
>>>>>>> I do not think you can easily do debugging using remote debugger.
This
>>>>>>> is natural since hadoop is not sequential programming, it would
be
>>>
>>> very
>>>>>>>
>>>>>>> difficult to debug its apps.
>>>>>>> The only way to debug is to use traces.
>>>>>>>
>>>>>>> I think you can learn how to setup multi-node cluster, but for
>>>
>>> practice
>>>>>>>
>>>>>>> session you can use single node setup.
>>>>>>>
>>>>>>> Lets see what the experts say.
>>>>>>>
>>>>>>> Thanks and Regards
>>>>>>> Piyush Garg
>>>>>>>
>>>>>>>
>>>>>>> On Sunday 15 August 2010 09:07 AM, Rita Liu wrote:
>>>>>>>
>>>>>>>
>>>>>>>> Hi!
>>>>>>>>
>>>>>>>> I am a total beginner, but I am very interested in hadoop.
I've
>>>
>>> already
>>>>>>>>
>>>>>>>> downloaded hadoop 0.19.2 and run on Ubuntu in single-node
mode. Now I
>>>>>>>>
>>>>>>>>
>>>>>>> want
>>>>>>>
>>>>>>>
>>>>>>>> to do two things:
>>>>>>>>
>>>>>>>> 1. Explore how hadoop works internally with one of the example
>>>>>>>>
>>>>>>>>
>>>>>>> applications
>>>>>>>
>>>>>>>
>>>>>>>> hadoop provides
>>>>>>>> 2. Write an application on my own
>>>>>>>>
>>>>>>>> Those two things bring me following questions:
>>>>>>>>
>>>>>>>> a. debugger?
>>>>>>>> I am stuck since I don't know how to "explore" hadoop. I
used to
>>>
>>> trace
>>>>>>>>
>>>>>>>> through the code using a debugger, but in this case, I don't
know if
>>>>>>>>
>>>>>>>>
>>>>>>> there
>>>>>>>
>>>>>>>
>>>>>>>> is a good debugger to use; or -- maybe a debugger is not
necessary
>>>
>>> for
>>>>>>>>
>>>>>>>> hadoop? If not, then how do you trace through the code to
either
>>>
>>> debug or
>>>>>>>>
>>>>>>>> just gain an understanding about the system? May I know what
you,
>>>>>>>> experienced experts, do? :)
>>>>>>>>
>>>>>>>> b. Where to run hadoop?
>>>>>>>> Also -- may I know where you run your hadoop? Do you run
on linux, or
>>>
>>> on
>>>>>>>>
>>>>>>> VM
>>>>>>>
>>>>>>>
>>>>>>>> -- in particular, Cloudera? I heard that Cloudera is good
for writing
>>>>>>>> mapreduce applications with hadoop itself as a blackbox;
is it true?
>>>
>>> If
>>>>>>>>
>>>>>>> my
>>>>>>>
>>>>>>>
>>>>>>>> ultimate goal is to understand how hadoop works internally,
would it
>>>
>>> be
>>>>>>>>
>>>>>>>> better if I directly run it on linux?
>>>>>>>>
>>>>>>>> c. Single-node or multi-node?
>>>>>>>> In the beginning (just like my case :p) would it be better
to use
>>>>>>>> single-node or multi-node? If the latter is true, should
I obtain
>>>
>>> more
>>>>>>>>
>>>>>>>> machines, or should I use more virtual machines to create
more nodes?
>>>>>>>>
>>>>>>>> As a newbie, I am sorry for all those basic (and silly, I
know :$)
>>>>>>>> questions. If possible, please help me out? Any suggestion
or advice
>>>
>>> will
>>>>>>>>
>>>>>>> be
>>>>>>>
>>>>>>>
>>>>>>>> greatly appreciated. Thank you very much!
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Rita :)
>>>>>>>>
>>>>>>>> P.S. If my questions are not suitable for this mailing-list,
please
>>>
>>> let
>>>>>>>>
>>>>>>> me
>>>>>>>
>>>>>>>
>>>>>>>> apologize, and then, could you please direct me to other
>>>
>>> mailing-lists?
>>>>>>>>
>>>>>>>> Sorry, and thanks a lot! :)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>

Mime
View raw message