Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of ouchwhisper@gmail.com
 designates 209.85.210.173 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAMVC6RNH=Nucw-U3uXEQdSbMBce1wC_9hc6vWKTWaeOZaXbQCQ@mail.gmail.com>
References: 
 <CA+NDPeeqUv0-Xrn6gVbp=RX+Qp2j5+7uewqJ9b0Z6GzLuuVFHA@mail.gmail.com>
	<CADK=Yxs9zCy0099NaGQDCTZpLknjLs+fDtpub0ckBXr=CB1rbg@mail.gmail.com>
	<CAMVC6RNxOtbSUFeC7=68oxniZQGmNrtH-L093+RwVkGgnk4tSw@mail.gmail.com>
	<CA+NDPec5xD+kXr5i=pRAnLXWOBALT5RWpuGF4BGxNo+ROma=Xg@mail.gmail.com>
	<CAMVC6RNH=Nucw-U3uXEQdSbMBce1wC_9hc6vWKTWaeOZaXbQCQ@mail.gmail.com>
Date: Thu, 7 Feb 2013 15:24:53 +0100
Message-ID: 
 <CA+NDPeeET01wfewbhqp2142r0Jbe-cWqe_v+0Knge+Wo-AEMjg@mail.gmail.com>
Subject: Re: MapReduce to load data in HBase
From: Panshul Whisper <ouchwhisper@gmail.com>
To: user@hbase.apache.org, user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=14dae9340bb9d3794e04d52334fb

--14dae9340bb9d3794e04d52334fb
Content-Type: text/plain; charset=ISO-8859-1

I am using the Map Reduce approach. I was looking into AVRO to create my
own custom Data types to pass from Mapper to Reducer.
With Avro I need to maintain the schema for all the types of Jason files I
am receiving and since there will be many different map reduce methods
running, so a different schema for every type.
1. Since the Json schema might change very frequently almost 3 times every
month. Is it advisable to use Avro to create custom data types? or I can
use the distributed cache and store the Java Object in the cache and pass
the key to the object to the Reducer?
2. Will there be any performance issues with using the distributed cache?
since the data will be very large and very high speed performance required.

Thanking You,
Regards,


On Thu, Feb 7, 2013 at 2:23 PM, Mohammad Tariq <dontariq@gmail.com> wrote:

> Size is not a prob, frequently changing schema might be.
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Thu, Feb 7, 2013 at 6:25 PM, Panshul Whisper <ouchwhisper@gmail.com
> >wrote:
>
> > Hello,
> >
> > Thank you for the replies.
> >
> > I have not used pig yet. I am looking into it. I wanted to implement both
> > the approaches.
> > Are pig scripts maintainable? Because the Json structure that I will be
> > receiving will be changing quite often. Almost 3 times a month.
> > I will be processing 24 million Json files per month.
> > I am getting one big file with almost 3 million Json files aggregated.
> One
> > Json per line. I need to process this file and store all values into
> HBase.
> >
> > Thanking You,
> >
> >
> >
> >
> > On Thu, Feb 7, 2013 at 12:59 PM, Mohammad Tariq <dontariq@gmail.com>
> > wrote:
> >
> > > Good point sir. If Pig fits into Panshul's requirements then it's a
> much
> > > better option.
> > >
> > > Warm Regards,
> > > Tariq
> > > https://mtariq.jux.com/
> > > cloudfront.blogspot.com
> > >
> > >
> > > On Thu, Feb 7, 2013 at 5:25 PM, Damien Hardy <dhardy@viadeoteam.com>
> > > wrote:
> > >
> > > > Hello,
> > > > Why not using a PIG script for that ?
> > > > make the json file available on HDFS
> > > > Load with
> > > >
> > > >
> > >
> >
> http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/builtin/JsonLoader.html
> > > > Store with
> > > >
> > > >
> > >
> >
> http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html
> > > >
> > > > http://pig.apache.org/docs/r0.10.0/
> > > >
> > > > Cheers,
> > > >
> > > > --
> > > > Damien
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> > Ouch Whisper
> > 010101010101
> >
>


-- 
Regards,
Ouch Whisper
010101010101

--14dae9340bb9d3794e04d52334fb
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">I am using the Map Reduce approach. I was looking into AVR=
O to create my own custom Data types to pass from Mapper to Reducer.<div st=
yle>With Avro I need to maintain the schema for all the types of Jason file=
s I am receiving and since there will be many different map reduce methods =
running, so a different schema for every type.</div>
<div style>1. Since the Json schema might change very frequently almost 3 t=
imes every month. Is it advisable to use Avro to create custom data types? =
or I can use the distributed cache and store the Java Object in the cache a=
nd pass the key to the object to the Reducer?</div>
<div style>2. Will there be any performance issues with using the distribut=
ed cache? since the data will be very large and very high speed performance=
 required.</div><div style><br></div><div style>Thanking You,</div><div sty=
le>
Regards,</div></div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_=
quote">On Thu, Feb 7, 2013 at 2:23 PM, Mohammad Tariq <span dir=3D"ltr">&lt=
;<a href=3D"mailto:dontariq@gmail.com" target=3D"_blank">dontariq@gmail.com=
</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">Size is not a prob, frequently changing sche=
ma might be.<br>
<div class=3D"im"><br>
Warm Regards,<br>
Tariq<br>
<a href=3D"https://mtariq.jux.com/" target=3D"_blank">https://mtariq.jux.co=
m/</a><br>
<a href=3D"http://cloudfront.blogspot.com" target=3D"_blank">cloudfront.blo=
gspot.com</a><br>
<br>
<br>
</div>On Thu, Feb 7, 2013 at 6:25 PM, Panshul Whisper &lt;<a href=3D"mailto=
:ouchwhisper@gmail.com">ouchwhisper@gmail.com</a>&gt;wrote:<br>
<div class=3D"HOEnZb"><div class=3D"h5"><br>
&gt; Hello,<br>
&gt;<br>
&gt; Thank you for the replies.<br>
&gt;<br>
&gt; I have not used pig yet. I am looking into it. I wanted to implement b=
oth<br>
&gt; the approaches.<br>
&gt; Are pig scripts maintainable? Because the Json structure that I will b=
e<br>
&gt; receiving will be changing quite often. Almost 3 times a month.<br>
&gt; I will be processing 24 million Json files per month.<br>
&gt; I am getting one big file with almost 3 million Json files aggregated.=
 One<br>
&gt; Json per line. I need to process this file and store all values into H=
Base.<br>
&gt;<br>
&gt; Thanking You,<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; On Thu, Feb 7, 2013 at 12:59 PM, Mohammad Tariq &lt;<a href=3D"mailto:=
dontariq@gmail.com">dontariq@gmail.com</a>&gt;<br>
&gt; wrote:<br>
&gt;<br>
&gt; &gt; Good point sir. If Pig fits into Panshul&#39;s requirements then =
it&#39;s a much<br>
&gt; &gt; better option.<br>
&gt; &gt;<br>
&gt; &gt; Warm Regards,<br>
&gt; &gt; Tariq<br>
&gt; &gt; <a href=3D"https://mtariq.jux.com/" target=3D"_blank">https://mta=
riq.jux.com/</a><br>
&gt; &gt; <a href=3D"http://cloudfront.blogspot.com" target=3D"_blank">clou=
dfront.blogspot.com</a><br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt; &gt; On Thu, Feb 7, 2013 at 5:25 PM, Damien Hardy &lt;<a href=3D"mailt=
o:dhardy@viadeoteam.com">dhardy@viadeoteam.com</a>&gt;<br>
&gt; &gt; wrote:<br>
&gt; &gt;<br>
&gt; &gt; &gt; Hello,<br>
&gt; &gt; &gt; Why not using a PIG script for that ?<br>
&gt; &gt; &gt; make the json file available on HDFS<br>
&gt; &gt; &gt; Load with<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt;<br>
&gt; &gt;<br>
&gt; <a href=3D"http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/built=
in/JsonLoader.html" target=3D"_blank">http://pig.apache.org/docs/r0.10.0/ap=
i/org/apache/pig/builtin/JsonLoader.html</a><br>
&gt; &gt; &gt; Store with<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt;<br>
&gt; &gt;<br>
&gt; <a href=3D"http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/backe=
nd/hadoop/hbase/HBaseStorage.html" target=3D"_blank">http://pig.apache.org/=
docs/r0.10.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html</a><=
br>

&gt; &gt; &gt;<br>
&gt; &gt; &gt; <a href=3D"http://pig.apache.org/docs/r0.10.0/" target=3D"_b=
lank">http://pig.apache.org/docs/r0.10.0/</a><br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; Cheers,<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; --<br>
&gt; &gt; &gt; Damien<br>
&gt; &gt; &gt;<br>
&gt; &gt;<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; --<br>
&gt; Regards,<br>
&gt; Ouch Whisper<br>
&gt; 010101010101<br>
&gt;<br>
</div></div></blockquote></div><br><br clear=3D"all"><div><br></div>-- <br>=
<div dir=3D"ltr"><div>Regards,</div>Ouch Whisper<div>010101010101</div></di=
v>
</div>

--14dae9340bb9d3794e04d52334fb--