Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of dontariq@gmail.com designates
 209.85.212.42 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAORpBsioKL09=2pUSve=u=UST5xUZm+EVstmuz3a_DSpYEPeJg@mail.gmail.com>
References: 
 <CAAu_zM9GDVf5L_UpO4oWmc=FGFVu01y=91iYntT6-8tJT-JQoQ@mail.gmail.com>
 <CALte62yvtQNPuRCKyv+LdNEC8a6Tbv929B_JEFtJXuan3BQO9w@mail.gmail.com>
 <CAAu_zM9CD+FXPDMxuU8vnuCnL6E76JWXctM_MsnnJq-VxpBLqQ@mail.gmail.com>
 <CALte62x_UZnF1ABxWM3V4z=_GSmySOpSRhDmkPm2HvPy0PLx-g@mail.gmail.com>
 <CAAu_zM94gj=Yk062+_Pz-f1RpFPA3xm38mV80A+dx0iZ-a+rCw@mail.gmail.com>
 <CAMVC6RNOAYLgbGWEfvACfRb=qnzSYx8Pfquf6XReXdVHn4hK=g@mail.gmail.com>
 <CAAu_zM9YUJC4-TzHs-27HcNcegRG61G87gSerNUjQROH2mZ=8g@mail.gmail.com>
 <CAORpBsioKL09=2pUSve=u=UST5xUZm+EVstmuz3a_DSpYEPeJg@mail.gmail.com>
From: Mohammad Tariq <dontariq@gmail.com>
Date: Thu, 21 Mar 2013 15:32:25 +0530
Message-ID: 
 <CAMVC6RML9BrYUwx8e20-TYQsm0qsPtPYke8i5WBKWBRHx7gysA@mail.gmail.com>
Subject: Re: HBase or Cassandra
To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Content-Type: multipart/alternative; boundary=bcaec54eed56e4e46304d86c7188

--bcaec54eed56e4e46304d86c7188
Content-Type: text/plain; charset=ISO-8859-1

Harsh has got a point. You should consider it. If you really need random
real time read/write, only then you should go for a DB.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Thu, Mar 21, 2013 at 3:29 PM, Nitin Pawar <nitinpawar432@gmail.com>wrote:

> Oozie is a workflow scheduling and processing engine.
>
> so suppose you have similar kind of incoming data and you want to do a
> bunch of data processing steps on this data as and when it arrives, oozie
> will give you the framework for same
>
>
> On Thu, Mar 21, 2013 at 3:27 PM, oualid ait wafli <
> oualid.aitwafli@gmail.com> wrote:
>
>> Thanks Mohammad,
>> but how can I use Oozie !
>>
>>
>> 2013/3/21 Mohammad Tariq <dontariq@gmail.com>
>>
>>> Hello there,
>>>
>>>   For your use case, Hbase seems to be a better choice. And you workflow
>>> looks good to me.
>>>
>>> Just one suggestion(in case you find it useful). Since, you are going to
>>> do a lot of operations,
>>> you might find it useful to schedule the jobs using Oozie.
>>>
>>> Warm Regards,
>>> Tariq
>>> https://mtariq.jux.com/
>>> cloudfront.blogspot.com
>>>
>>>
>>> On Thu, Mar 21, 2013 at 2:27 PM, oualid ait wafli <
>>> oualid.aitwafli@gmail.com> wrote:
>>>
>>>> I have the CDR files (call details record) as my data and I want read
>>>> from those files the data using Pig.
>>>>
>>>> firstly, I will import the data from sources using Flume, then use Pig
>>>> as an ETL and as a tool to run MapReduce jobs into HDFS. so now I want
>>>> store my data but I have to do a benchmark between HBase and Cassandra.
>>>>
>>>>  My questions:
>>>> - How do you find my idea to analyze, process my data ? Am I in the
>>>> best way ?
>>>> - which one is the best HBase or Cassandra ?
>>>>
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>>
>>>> 2013/3/20 Ted Yu <yuzhihong@gmail.com>
>>>>
>>>>> Can you give us more information about your use case ?
>>>>> e.g. approximate ratio between write vs. read load, amount of log, etc.
>>>>>
>>>>> Cheers
>>>>>
>>>>> On Wed, Mar 20, 2013 at 9:22 AM, oualid ait wafli <
>>>>> oualid.aitwafli@gmail.com> wrote:
>>>>>
>>>>>> Yes I have a data source which contains log files, I want to analyze
>>>>>> those files and store them
>>>>>> any idea ?
>>>>>> thanks
>>>>>>
>>>>>>
>>>>>> 2013/3/20 Ted Yu <yuzhihong@gmail.com>
>>>>>>
>>>>>>> The answer to second question would be subjective.
>>>>>>>
>>>>>>> Do you have specific use case in mind ?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli <
>>>>>>> oualid.aitwafli@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Which is the best HBase or Cassandra ?
>>>>>>>> Which are the criteria to compare those tools( HBase and Cassandra)
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
> Nitin Pawar
>

--bcaec54eed56e4e46304d86c7188
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Harsh has got a point. You should consider it. If you real=
ly need random real time read/write, only then you should go for a DB.</div=
><div class=3D"gmail_extra"><br clear=3D"all"><div><div dir=3D"ltr">Warm Re=
gards,<div>

Tariq</div><div><a href=3D"https://mtariq.jux.com/" target=3D"_blank">https=
://mtariq.jux.com/</a><br></div><div><a href=3D"http://cloudfront.blogspot.=
com" target=3D"_blank">cloudfront.blogspot.com</a><br></div></div></div>
<br><br><div class=3D"gmail_quote">On Thu, Mar 21, 2013 at 3:29 PM, Nitin P=
awar <span dir=3D"ltr">&lt;<a href=3D"mailto:nitinpawar432@gmail.com" targe=
t=3D"_blank">nitinpawar432@gmail.com</a>&gt;</span> wrote:<br><blockquote c=
lass=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;=
padding-left:1ex">

<div dir=3D"ltr">Oozie is a workflow scheduling and processing engine.=A0<d=
iv><br></div><div>so suppose you have similar kind of incoming data and you=
 want to do a bunch of data processing steps on this data as and when it ar=
rives, oozie will give you the framework for same=A0</div>


</div><div class=3D"gmail_extra"><div><div class=3D"h5"><br><br><div class=
=3D"gmail_quote">On Thu, Mar 21, 2013 at 3:27 PM, oualid ait wafli <span di=
r=3D"ltr">&lt;<a href=3D"mailto:oualid.aitwafli@gmail.com" target=3D"_blank=
">oualid.aitwafli@gmail.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div>Thanks Mohammad,<br></=
div>but how can I use Oozie ! <br></div><div><div><div class=3D"gmail_extra=
">
<br><br><div class=3D"gmail_quote">2013/3/21 Mohammad Tariq <span dir=3D"lt=
r">&lt;<a href=3D"mailto:dontariq@gmail.com" target=3D"_blank">dontariq@gma=
il.com</a>&gt;</span><br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">Hello there,<div><br></div>=
<div>=A0 For your use case, Hbase seems to be a better choice. And you work=
flow looks good to me.</div>


<div><br></div><div>Just one suggestion(in case you find it useful). Since,=
 you are going to do a lot of operations,</div>

<div>you might find it useful to schedule the jobs using Oozie.</div></div>=
<div class=3D"gmail_extra"><br clear=3D"all"><div><div dir=3D"ltr">Warm Reg=
ards,<div>Tariq</div><div><a href=3D"https://mtariq.jux.com/" target=3D"_bl=
ank">https://mtariq.jux.com/</a><br>


</div><div><a href=3D"http://cloudfront.blogspot.com" target=3D"_blank">clo=
udfront.blogspot.com</a><br></div></div></div><div><div>
<br><br><div class=3D"gmail_quote">On Thu, Mar 21, 2013 at 2:27 PM, oualid =
ait wafli <span dir=3D"ltr">&lt;<a href=3D"mailto:oualid.aitwafli@gmail.com=
" target=3D"_blank">oualid.aitwafli@gmail.com</a>&gt;</span> wrote:<br><blo=
ckquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #c=
cc solid;padding-left:1ex">


<div dir=3D"ltr"><div><div>I have the CDR files (call details record) as my=
 data and I want read from those files the data using Pig.<br><br></div>fir=
stly, I will import the data from sources using Flume, then use Pig as an E=
TL and as a tool to run MapReduce jobs into HDFS. so now I want store my da=
ta but I have to do a benchmark between HBase and Cassandra.<br>


<br></div><div>=A0My questions:<br></div><div>- How do you find my idea to =
analyze, process my data ? Am I in the best way ?<br></div><div>- which one=
 is the best HBase or Cassandra ?<br><br><br></div><div>Thanks<br></div>


<div>
<div><div><br><br></div></div></div></div><div><div><div class=3D"gmail_ext=
ra"><br><br><div class=3D"gmail_quote">2013/3/20 Ted Yu <span dir=3D"ltr">&=
lt;<a href=3D"mailto:yuzhihong@gmail.com" target=3D"_blank">yuzhihong@gmail=
.com</a>&gt;</span><br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">Can you give us more information about your =
use case ?<div>e.g. approximate ratio between write vs. read load, amount o=
f log, etc.</div>


<div><br></div><div>Cheers</div><div><div><div><br><div class=3D"gmail_quot=
e">On Wed, Mar 20, 2013 at 9:22 AM, oualid ait wafli <span dir=3D"ltr">&lt;=
<a href=3D"mailto:oualid.aitwafli@gmail.com" target=3D"_blank">oualid.aitwa=
fli@gmail.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div><div>Yes I have a data=
 source which contains log files, I want to analyze those files and store t=
hem<br>


</div>any idea ?<br></div>thanks<br></div><div><div><div class=3D"gmail_ext=
ra"><br><br><div class=3D"gmail_quote">
2013/3/20 Ted Yu <span dir=3D"ltr">&lt;<a href=3D"mailto:yuzhihong@gmail.co=
m" target=3D"_blank">yuzhihong@gmail.com</a>&gt;</span><br><blockquote clas=
s=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;pad=
ding-left:1ex">


The answer to second question would be subjective.<div><br></div><div>Do yo=
u have specific use case in mind ?</div><div><br></div><div>Thanks<div><div=
><br><br><div class=3D"gmail_quote">On Wed, Mar 20, 2013 at 9:07 AM, oualid=
 ait wafli <span dir=3D"ltr">&lt;<a href=3D"mailto:oualid.aitwafli@gmail.co=
m" target=3D"_blank">oualid.aitwafli@gmail.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div><div><div>Hi,<br><br><=
/div>Which is the best HBase or Cassandra ?<br></div>Which are the criteria=
 to compare those tools( HBase and Cassandra)<br>


<br></div>Thanks <br></div>
</blockquote></div><br></div></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br><br clear=3D"all"><div><br></div></div><=
/div><span class=3D"HOEnZb"><font color=3D"#888888">-- <br>Nitin Pawar<br>
</font></span></div>
</blockquote></div><br></div>

--bcaec54eed56e4e46304d86c7188--