Thanks to you duo. You solved my problem so easily. I want to

ask one more question; for reference. I have

1. hadoop the definitive guide
2. Hadoop In Action

Is it sufficient or do I need some more material to study

your suggested implementation??


Hey Mayur,

If you are collecting logs from multiple servers then you can use flume for the same. 

if the contents of the logs are different in format  then you can just use textfileinput format to read and write into any other format you want for your processing in later part of your projects 

first thing you need to learn is how to setup hadoop 
then you can try writing sample hadoop mapreduce jobs to read from text file and then process them and write the results into another file 
then you can integrate flume as your log collection mechanism 
once you get hold on the system then you can decide more on which paths you want to follow based on your requirements for storage, compute time, compute capacity, compression etc


Please read basics on how hadoop works.

Then start your hands on with map reduce coding.

The tool which has been made for you is flume , but don't see tool till you complete above two steps.

Good luck , keep us posted.


Jagat Singh

Sent from Mobile , short and crisp.

On 06-Feb-2013 8:32 AM, "Mayur Patil" <> wrote:

    I am new to Hadoop. I am doing a project in cloud in which I 

    have to use hadoop for Map-reduce. It is such that I am going 

    to collect logs from 2-3 machines having different locations.

    The logs are also in different formats such as .rtf .log .txt  

    Later, I have to collect and convert them to one format and 

    collect to one location.
    So I am asking which module of Hadoop that I need to study
    for this implementation?? Or whole framework should I need 

    to study ??

    Seeking for guidance,

    Thank you !!