hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From y_823...@tsmc.com
Subject Re: Hbase as Map/Reduce source
Date Fri, 29 Jan 2010 05:06:51 GMT
What about if I want to analyse the data which have update and delete
record.
In this scenario, hbase is a good M/R source better than hdfs raw file , is
it correct?

Fleming Chiu(邱宏明)
707-6128
y_823910@tsmc.com
週一無肉日吃素救地球(Meat Free Monday Taiwan)




                                                                                         
                                                            
                      Kay Kay                                                            
                                                            
                      <kaykay.unique@gm        To:      hbase-user@hadoop.apache.org  
                                                               
                      ail.com>                 cc:      (bcc: Y_823910/TSMC)          
                                                               
                                               Subject: Re: Hbase as Map/Reduce source   
                                                            
                      2010/01/29 11:05                                                   
                                                            
                      AM                                                                 
                                                            
                      Please respond to                                                  
                                                            
                      hbase-user                                                         
                                                            
                                                                                         
                                                            
                                                                                         
                                                            




HDFS is a double-edged sword . Being a raw file system - you can feed it
to a Map Reduce program although it might be necessary to define
InputSplit-s as appropriate to chop down the input size.

OTOH, HBase is structured data ( well - sort of ! ) using a file format
on top of HDFS to store the schema and hence comes with predefined
InputSplit-s that make it easy to get started on a MapReduce program.
 From an API simplicity point of view - HBase can get you started
relatively faster because of it ( assuming you have your data in hbase).

Refer to -
http://wiki.apache.org/hadoop/Hbase/MapReduce .

Although the wiki says deprecated - in reality - it is suggested to
stick with  *.mapred.* packages for some time since the underlying
.mapreduce.* packages are not mature enough at this point.

The decision is to entirely do with - the kind of the data you have and
identifying the data by a primary key amenable to your application,
which is all hbase in its rudimentary form needs.

On the other hand - if having a schema and defining a primary key for
your data seems non-orthogonal for your app - you can stick with HDFS
and a custom InputSplit depending on your data.  Especially since HBase
provides a lot more than HDFS in terms of scanning / row id ordering and
if these features are not necessary for what you do - then storing data
in HDFS should be just about ok.




On 1/28/10 6:20 PM, Otis Gospodnetic wrote:
> I asked a similar question recently:
>
http://search-hadoop.com/m?id=843956.53875.qm@web50305.mail.re2.yahoo.com||hbase%20mapreduce%20otis%20TableInputFormat

>
>
> Otis
>
>
>
> ----- Original Message ----
>
>> From: "y_823910@tsmc.com"<y_823910@tsmc.com>
>> To: hbase-user@hadoop.apache.org
>> Sent: Thu, January 28, 2010 8:02:39 PM
>> Subject: Hbase as Map/Reduce source
>>
>> Hi,
>>
>> I want to understand clearly about Hbase as Map/Reduce source.
>> Basicly, if a table with 100 regions, it means 100 map will be started,
>> right?
>> What's the difference between hdfs and hbase as a Map/Reduce source?
>> Thanks
>>
>>
>>
>>
>> Fleming Chiu(邱宏明)
>> 707-6128
>> y_823910@tsmc.com
>> 週一無肉日吃素救地球(Meat Free Monday Taiwan)
>>
>>
>>
---------------------------------------------------------------------------
>>                                                           TSMC PROPERTY
>> This email communication (and any attachments) is proprietary
information
>> for the sole use of its
>> intended recipient. Any unauthorized review, use or distribution by
anyone
>> other than the intended
>> recipient is strictly prohibited.  If you are not the intended
recipient,
>> please notify the sender by
>> replying to this email, and then delete this email and any copies of it
>> immediately. Thank you.
>>
---------------------------------------------------------------------------
>>
>





 --------------------------------------------------------------------------- 
                                                         TSMC PROPERTY       
 This email communication (and any attachments) is proprietary information   
 for the sole use of its                                                     
 intended recipient. Any unauthorized review, use or distribution by anyone  
 other than the intended                                                     
 recipient is strictly prohibited.  If you are not the intended recipient,   
 please notify the sender by                                                 
 replying to this email, and then delete this email and any copies of it     
 immediately. Thank you.                                                     
 --------------------------------------------------------------------------- 




Mime
View raw message