Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 00FFFE8AA for ; Tue, 5 Feb 2013 21:39:35 +0000 (UTC) Received: (qmail 70896 invoked by uid 500); 5 Feb 2013 21:39:30 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 70711 invoked by uid 500); 5 Feb 2013 21:39:29 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 70704 invoked by uid 99); 5 Feb 2013 21:39:29 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Feb 2013 21:39:29 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of nitinpawar432@gmail.com designates 209.85.128.172 as permitted sender) Received: from [209.85.128.172] (HELO mail-ve0-f172.google.com) (209.85.128.172) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Feb 2013 21:39:23 +0000 Received: by mail-ve0-f172.google.com with SMTP id cz11so558491veb.31 for ; Tue, 05 Feb 2013 13:39:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=AObTcwMsauw4K5P7FsgBnXZSCPWUPXmgXVC5KgD1Mbo=; b=hwoVPBM9Y4xcmfW5kebOrRYV+29EZK2iywG95vxWHlCDbmQ8EjgWmCm2ot9AJddAZd WYCf71XiA0Imqq+evVtlp66UrPvUkiq1Xg96ymYZfm69h/KecB/yFY55a4NBziuibNl6 6ZabdvPAcBH/TaGK+kl/XEt6LsKXURm2SsIpxTwxWDx28R6vF13Q9HGFZHqZaG/Rj9e0 k/aIvbzpOCCReUOJRVJKDRjIHFT1futN5L/MbdXXSthG7M5lNLXdxJUNSSUTrrDA5prY hkEqJQ8GvgQJaIFSfzuax3BNjcDnc2HyPWJIXh+IPyh9lxALLtD8tmZq2DT3XWtoevOP UcDw== MIME-Version: 1.0 X-Received: by 10.220.219.9 with SMTP id hs9mr29769213vcb.68.1360100342489; Tue, 05 Feb 2013 13:39:02 -0800 (PST) Received: by 10.59.9.67 with HTTP; Tue, 5 Feb 2013 13:39:02 -0800 (PST) In-Reply-To: References: Date: Wed, 6 Feb 2013 03:09:02 +0530 Message-ID: Subject: Re: [Hadoop-Help]About Map-Reduce implementation From: Nitin Pawar To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=14dae9cfce1ac5e25b04d50109c4 X-Virus-Checked: Checked by ClamAV on apache.org --14dae9cfce1ac5e25b04d50109c4 Content-Type: text/plain; charset=ISO-8859-1 Hey Mayur, If you are collecting logs from multiple servers then you can use flume for the same. if the contents of the logs are different in format then you can just use textfileinput format to read and write into any other format you want for your processing in later part of your projects first thing you need to learn is how to setup hadoop then you can try writing sample hadoop mapreduce jobs to read from text file and then process them and write the results into another file then you can integrate flume as your log collection mechanism once you get hold on the system then you can decide more on which paths you want to follow based on your requirements for storage, compute time, compute capacity, compression etc On Wed, Feb 6, 2013 at 3:01 AM, Mayur Patil wrote: > Hello, > > I am new to Hadoop. I am doing a project in cloud in which I > > have to use hadoop for Map-reduce. It is such that I am going > > to collect logs from 2-3 machines having different locations. > > The logs are also in different formats such as .rtf .log .txt > > Later, I have to collect and convert them to one format and > > collect to one location. > > So I am asking which module of Hadoop that I need to study > > for this implementation?? Or whole framework should I need > > to study ?? > > Seeking for guidance, > > Thank you !! > -- > *Cheers,* > *Mayur.* > -- Nitin Pawar --14dae9cfce1ac5e25b04d50109c4 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hey Mayur,

If you are collecting = logs from multiple servers then you can use flume for the same.=A0

if the contents of the logs are different in f= ormat =A0then you can just use textfileinput format to read and write into = any other format you want for your processing in later part of your project= s=A0

first thing you need to learn is how to set= up hadoop=A0
then you can try writing sample hadoop mapredu= ce jobs to read from text file and then process them and write the results = into another file=A0
then you can integrate flume as your log collection mechanism=A0=
once you get hold on the system then you can decide more o= n which paths you want to follow based on your requirements for storage, co= mpute time, compute capacity,=A0compression=A0etc=A0


On Wed,= Feb 6, 2013 at 3:01 AM, Mayur Patil <ram.nath241089@gmail.com&= gt; wrote:
Hello,

=A0 =A0 I am new to Hadoop. I = am doing a project in cloud in which I=A0

=A0 =A0 have t= o use hadoop for Map-reduce.=A0It is such that I am going=A0

=A0 =A0 to collect logs from 2-3 machines=A0having diff= erent locations.

=A0 =A0 The logs are also in different formats=A0such a= s .rtf .log .txt =A0

=A0 =A0 Later, I have to coll= ect and convert them=A0to one=A0format and=A0

=A0 = =A0 collect=A0to one location.
=A0
=A0 =A0 So I am asking which module of Hadoop that = I need to study
=A0 =A0
=A0 =A0 for this implementation= ?? Or whole framework should I need=A0

=A0 =A0 to = study ??

=A0 =A0 Seeking for guidance,

=A0 =A0 Tha= nk you !!
--=A0
Cheers,
Mayur.



--
Nitin Pawar<= br>
--14dae9cfce1ac5e25b04d50109c4--