Hey Mayur,If you are collecting logs from multiple servers then you can use flume for the same.if the contents of the logs are different in format then you can just use textfileinput format to read and write into any other format you want for your processing in later part of your projectsfirst thing you need to learn is how to setup hadoopthen you can try writing sample hadoop mapreduce jobs to read from text file and then process them and write the results into another filethen you can integrate flume as your log collection mechanismonce you get hold on the system then you can decide more on which paths you want to follow based on your requirements for storage, compute time, compute capacity, compression etc
Please read basics on how hadoop works.
Then start your hands on with map reduce coding.
The tool which has been made for you is flume , but don't see tool till you complete above two steps.
Good luck , keep us posted.
Sent from Mobile , short and crisp.On 06-Feb-2013 8:32 AM, "Mayur Patil" <firstname.lastname@example.org> wrote:Hello,
I am new to Hadoop. I am doing a project in cloud in which Ihave to use hadoop for Map-reduce. It is such that I am goingto collect logs from 2-3 machines having different locations.The logs are also in different formats such as .rtf .log .txtLater, I have to collect and convert them to one format andcollect to one location.So I am asking which module of Hadoop that I need to studyfor this implementation?? Or whole framework should I needto study ??Seeking for guidance,
Thank you !!--Cheers,