Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 299BDF3EA for ; Thu, 21 Mar 2013 10:03:40 +0000 (UTC) Received: (qmail 32995 invoked by uid 500); 21 Mar 2013 10:03:35 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 32750 invoked by uid 500); 21 Mar 2013 10:03:34 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 32730 invoked by uid 99); 21 Mar 2013 10:03:34 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Mar 2013 10:03:34 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dontariq@gmail.com designates 209.85.212.42 as permitted sender) Received: from [209.85.212.42] (HELO mail-vb0-f42.google.com) (209.85.212.42) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Mar 2013 10:03:26 +0000 Received: by mail-vb0-f42.google.com with SMTP id ff1so1804345vbb.29 for ; Thu, 21 Mar 2013 03:03:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=ozwLAfAwwQ/kU6jCT5dG+UC8ok1MXDtwRgzmvEWO0Is=; b=onOWoIGTtBCj7PUnnPzc+JhAJlulfy1kzpGow8lHN/TyEUo7b+tXnRDlmf37r2LOXt XOP2naIje9oTSJSKIyApK4Eb3DHfMlMRzoW+UWF42SGGK4lg6CDgXRpkknKMa+wRDz/w X79EDfJImJ2Eb/XRRsfeqpu5jxqENBZRV/uaWaGotnsGQ8fWHkP+oDEXHhZX6e60uDvy vmXjOI4W6u/r/HMTDuv9XRzdsu1pIFy5fi2GpcWXCRgN09QAm/2p56WDcnmMJLqDl2v2 Ef8PLWS4WnBSzN5XW0jpxjBHIJDCyG1+yUFFW8Vq3RLuDlRrmyMiPBC70bh7JJ8DZVoB j/Wg== X-Received: by 10.220.119.200 with SMTP id a8mr12352449vcr.38.1363860185734; Thu, 21 Mar 2013 03:03:05 -0700 (PDT) MIME-Version: 1.0 Received: by 10.59.13.9 with HTTP; Thu, 21 Mar 2013 03:02:25 -0700 (PDT) In-Reply-To: References: From: Mohammad Tariq Date: Thu, 21 Mar 2013 15:32:25 +0530 Message-ID: Subject: Re: HBase or Cassandra To: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=bcaec54eed56e4e46304d86c7188 X-Virus-Checked: Checked by ClamAV on apache.org --bcaec54eed56e4e46304d86c7188 Content-Type: text/plain; charset=ISO-8859-1 Harsh has got a point. You should consider it. If you really need random real time read/write, only then you should go for a DB. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Thu, Mar 21, 2013 at 3:29 PM, Nitin Pawar wrote: > Oozie is a workflow scheduling and processing engine. > > so suppose you have similar kind of incoming data and you want to do a > bunch of data processing steps on this data as and when it arrives, oozie > will give you the framework for same > > > On Thu, Mar 21, 2013 at 3:27 PM, oualid ait wafli < > oualid.aitwafli@gmail.com> wrote: > >> Thanks Mohammad, >> but how can I use Oozie ! >> >> >> 2013/3/21 Mohammad Tariq >> >>> Hello there, >>> >>> For your use case, Hbase seems to be a better choice. And you workflow >>> looks good to me. >>> >>> Just one suggestion(in case you find it useful). Since, you are going to >>> do a lot of operations, >>> you might find it useful to schedule the jobs using Oozie. >>> >>> Warm Regards, >>> Tariq >>> https://mtariq.jux.com/ >>> cloudfront.blogspot.com >>> >>> >>> On Thu, Mar 21, 2013 at 2:27 PM, oualid ait wafli < >>> oualid.aitwafli@gmail.com> wrote: >>> >>>> I have the CDR files (call details record) as my data and I want read >>>> from those files the data using Pig. >>>> >>>> firstly, I will import the data from sources using Flume, then use Pig >>>> as an ETL and as a tool to run MapReduce jobs into HDFS. so now I want >>>> store my data but I have to do a benchmark between HBase and Cassandra. >>>> >>>> My questions: >>>> - How do you find my idea to analyze, process my data ? Am I in the >>>> best way ? >>>> - which one is the best HBase or Cassandra ? >>>> >>>> >>>> Thanks >>>> >>>> >>>> >>>> >>>> 2013/3/20 Ted Yu >>>> >>>>> Can you give us more information about your use case ? >>>>> e.g. approximate ratio between write vs. read load, amount of log, etc. >>>>> >>>>> Cheers >>>>> >>>>> On Wed, Mar 20, 2013 at 9:22 AM, oualid ait wafli < >>>>> oualid.aitwafli@gmail.com> wrote: >>>>> >>>>>> Yes I have a data source which contains log files, I want to analyze >>>>>> those files and store them >>>>>> any idea ? >>>>>> thanks >>>>>> >>>>>> >>>>>> 2013/3/20 Ted Yu >>>>>> >>>>>>> The answer to second question would be subjective. >>>>>>> >>>>>>> Do you have specific use case in mind ? >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> >>>>>>> On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli < >>>>>>> oualid.aitwafli@gmail.com> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> Which is the best HBase or Cassandra ? >>>>>>>> Which are the criteria to compare those tools( HBase and Cassandra) >>>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > > > -- > Nitin Pawar > --bcaec54eed56e4e46304d86c7188 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Harsh has got a point. You should consider it. If you real= ly need random real time read/write, only then you should go for a DB.



On Thu, Mar 21, 2013 at 3:29 PM, Nitin P= awar <nitinpawar432@gmail.com> wrote:
Oozie is a workflow scheduling and processing engine.=A0
so suppose you have similar kind of incoming data and you= want to do a bunch of data processing steps on this data as and when it ar= rives, oozie will give you the framework for same=A0


On Thu, Mar 21, 2013 at 3:27 PM, oualid ait wafli <oualid.aitwafli@gmail.com> wrote:
Thanks Mohammad,
but how can I use Oozie !


2013/3/21 Mohammad Tariq <dontariq@gma= il.com>
Hello there,

=
=A0 For your use case, Hbase seems to be a better choice. And you work= flow looks good to me.

Just one suggestion(in case you find it useful). Since,= you are going to do a lot of operations,
you might find it useful to schedule the jobs using Oozie.
=



On Thu, Mar 21, 2013 at 2:27 PM, oualid = ait wafli <oualid.aitwafli@gmail.com> wrote:
I have the CDR files (call details record) as my= data and I want read from those files the data using Pig.

fir= stly, I will import the data from sources using Flume, then use Pig as an E= TL and as a tool to run MapReduce jobs into HDFS. so now I want store my da= ta but I have to do a benchmark between HBase and Cassandra.

=A0My questions:
- How do you find my idea to = analyze, process my data ? Am I in the best way ?
- which one= is the best HBase or Cassandra ?


Thanks




2013/3/20 Ted Yu &= lt;yuzhihong@gmail= .com>
Can you give us more information about your = use case ?
e.g. approximate ratio between write vs. read load, amount o= f log, etc.

Cheers

On Wed, Mar 20, 2013 at 9:22 AM, oualid ait wafli <= oualid.aitwa= fli@gmail.com> wrote:
Yes I have a data= source which contains log files, I want to analyze those files and store t= hem
any idea ?
thanks


2013/3/20 Ted Yu <yuzhihong@gmail.com>
The answer to second question would be subjective.

Do yo= u have specific use case in mind ?

Thanks


On Wed, Mar 20, 2013 at 9:07 AM, oualid= ait wafli <oualid.aitwafli@gmail.com> wrote:
Hi,

<= /div>Which is the best HBase or Cassandra ?
Which are the criteria= to compare those tools( HBase and Cassandra)

Thanks









<= /div>--
Nitin Pawar

--bcaec54eed56e4e46304d86c7188--