hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: HBase or Cassandra
Date Thu, 21 Mar 2013 09:58:35 GMT
If your use case is merely to process these files in batch, then why
use HBase and/or Cassandra? What you've described seems to already
address the need.

I do not know too much about Cassandra so I will refrain commenting on
it, however the following may apply to it as well. HBase is useful for
more "realtime" lookup needs - like if you have a need to do random
reads and writes in realtime, such as looking up a specific customer's
records for a specific date without querying the whole dataset. Or
editing the a customer's balance, perhaps, without rewriting the whole
data. The operation requirements also includes deletes of specific
records, etc. and support for unstructured data R/W. HBase is not
something used for processing data, just for storing and retrieving it
and optionally managing the storage/retrieval at a per-record level.

If your needs do not involve random reads/writes and your operation is
batch oriented then neither Cassandra nor HBase would give you the
speeds of raw HDFS files based MR running the logic on the input files.

On Thu, Mar 21, 2013 at 2:27 PM, oualid ait wafli
<oualid.aitwafli@gmail.com> wrote:
> I have the CDR files (call details record) as my data and I want read from
> those files the data using Pig.
>
> firstly, I will import the data from sources using Flume, then use Pig as an
> ETL and as a tool to run MapReduce jobs into HDFS. so now I want store my
> data but I have to do a benchmark between HBase and Cassandra.
>
>  My questions:
> - How do you find my idea to analyze, process my data ? Am I in the best way
> ?
> - which one is the best HBase or Cassandra ?
>
>
> Thanks
>
>
>
>
> 2013/3/20 Ted Yu <yuzhihong@gmail.com>
>>
>> Can you give us more information about your use case ?
>> e.g. approximate ratio between write vs. read load, amount of log, etc.
>>
>> Cheers
>>
>> On Wed, Mar 20, 2013 at 9:22 AM, oualid ait wafli
>> <oualid.aitwafli@gmail.com> wrote:
>>>
>>> Yes I have a data source which contains log files, I want to analyze
>>> those files and store them
>>> any idea ?
>>> thanks
>>>
>>>
>>> 2013/3/20 Ted Yu <yuzhihong@gmail.com>
>>>>
>>>> The answer to second question would be subjective.
>>>>
>>>> Do you have specific use case in mind ?
>>>>
>>>> Thanks
>>>>
>>>>
>>>> On Wed, Mar 20, 2013 at 9:07 AM, oualid ait wafli
>>>> <oualid.aitwafli@gmail.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Which is the best HBase or Cassandra ?
>>>>> Which are the criteria to compare those tools( HBase and Cassandra)
>>>>>
>>>>> Thanks
>>>>
>>>>
>>>
>>
>



--
Harsh J

Mime
View raw message