Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A325AF089 for ; Thu, 21 Mar 2013 09:21:53 +0000 (UTC) Received: (qmail 79516 invoked by uid 500); 21 Mar 2013 09:21:49 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 79087 invoked by uid 500); 21 Mar 2013 09:21:46 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 79060 invoked by uid 99); 21 Mar 2013 09:21:45 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Mar 2013 09:21:45 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dontariq@gmail.com designates 209.85.212.42 as permitted sender) Received: from [209.85.212.42] (HELO mail-vb0-f42.google.com) (209.85.212.42) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Mar 2013 09:21:38 +0000 Received: by mail-vb0-f42.google.com with SMTP id ff1so1784164vbb.29 for ; Thu, 21 Mar 2013 02:21:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=XoEir6g4y99Rs+KQ3oAtuDF85psVcrwnzKaa/KEHkN8=; b=DYNtGXuYZor4r2jj5PWPxOE6fZ1Yeux0WPhT6bau6/Ac1gQueoCXJtDBPf64QKxwmq LtdXJsNcv93KI18WEye4np+d2CY1cyuK5VB2BAtvNQvrTnCO7FFfWgyNlzksZpwjU4AI LW/dOxScPa3mJzALIdzJN9hPGyUL89VOyg4FxVCU/CaR9ddYC4KOE0gY6KJ8pLvgK17V cUXM6qRfDwMjxZ3pEyoWzMOjO+SvC7Xn3+m1XC+nUCreEnFtUWd6ShJQqMlx8jRyfjtb GtfWo5bqsNX8PSmgjET5Y08tJJBrLgjlYHsG/CFZFvxYBMiy/E0QC+t+ehGzpObp5Ak2 8zWg== X-Received: by 10.220.119.200 with SMTP id a8mr12215756vcr.38.1363857677223; Thu, 21 Mar 2013 02:21:17 -0700 (PDT) MIME-Version: 1.0 Received: by 10.59.13.9 with HTTP; Thu, 21 Mar 2013 02:20:37 -0700 (PDT) In-Reply-To: References: From: Mohammad Tariq Date: Thu, 21 Mar 2013 14:50:37 +0530 Message-ID: Subject: Re: HBase or Cassandra To: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=bcaec54eed56600c9604d86bdcf9 X-Virus-Checked: Checked by ClamAV on apache.org --bcaec54eed56600c9604d86bdcf9 Content-Type: text/plain; charset=ISO-8859-1 Hello there, For your use case, Hbase seems to be a better choice. And you workflow looks good to me. Just one suggestion(in case you find it useful). Since, you are going to do a lot of operations, you might find it useful to schedule the jobs using Oozie. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Thu, Mar 21, 2013 at 2:27 PM, oualid ait wafli wrote: > I have the CDR files (call details record) as my data and I want read from > those files the data using Pig. > > firstly, I will import the data from sources using Flume, then use Pig as > an ETL and as a tool to run MapReduce jobs into HDFS. so now I want store > my data but I have to do a benchmark between HBase and Cassandra. > > My questions: > - How do you find my idea to analyze, process my data ? Am I in the best > way ? > - which one is the best HBase or Cassandra ? > > > Thanks > > > > > 2013/3/20 Ted Yu > >> Can you give us more information about your use case ? >> e.g. approximate ratio between write vs. read load, amount of log, etc. >> >> Cheers >> >> On Wed, Mar 20, 2013 at 9:22 AM, oualid ait wafli < >> oualid.aitwafli@gmail.com> wrote: >> >>> Yes I have a data source which contains log files, I want to analyze >>> those files and store them >>> any idea ? >>> thanks >>> >>> >>> 2013/3/20 Ted Yu >>> >>>> The answer to second question would be subjective. >>>> >>>> Do you have specific use case in mind ? >>>> >>>> Thanks >>>> >>>> >>>> On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli < >>>> oualid.aitwafli@gmail.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> Which is the best HBase or Cassandra ? >>>>> Which are the criteria to compare those tools( HBase and Cassandra) >>>>> >>>>> Thanks >>>>> >>>> >>>> >>> >> > --bcaec54eed56600c9604d86bdcf9 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hello there,

=A0 For your use cas= e, Hbase seems to be a better choice. And you workflow looks good to me.

Just one suggestion(in case you find it = useful). Since, you are going to do a lot of operations,
you might find it useful to schedule the jobs using Oozie.
=



On Thu, Mar 21, 2013 at 2:27 PM, oualid = ait wafli <oualid.aitwafli@gmail.com> wrote:
I have the CDR files (call details record) as my= data and I want read from those files the data using Pig.

fir= stly, I will import the data from sources using Flume, then use Pig as an E= TL and as a tool to run MapReduce jobs into HDFS. so now I want store my da= ta but I have to do a benchmark between HBase and Cassandra.

=A0My questions:
- How do you find my idea to = analyze, process my data ? Am I in the best way ?
- which one= is the best HBase or Cassandra ?


Thanks




2013/= 3/20 Ted Yu <yuzhihong@gmail.com>
Can you give us more information about your = use case ?
e.g. approximate ratio between write vs. read load, amount o= f log, etc.

Cheers

On Wed, Mar 20, 2013 at 9:22 AM, oualid ait wafli <= oualid.aitwa= fli@gmail.com> wrote:
Yes I have a data= source which contains log files, I want to analyze those files and store t= hem
any idea ?
thanks


2013/3/20 Ted Yu <yuzhihong@gmail.com>
The answer to second question would be subjective.

Do yo= u have specific use case in mind ?

Thanks


On Wed, Mar 20, 2013 at 9:07 AM, oualid= ait wafli <oualid.aitwafli@gmail.com> wrote:
Hi,

<= /div>Which is the best HBase or Cassandra ?
Which are the criteria= to compare those tools( HBase and Cassandra)

Thanks





--bcaec54eed56600c9604d86bdcf9--