hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Tarnas <...@tarnas.org>
Subject Re: Is there any way to disable WAL while keeping data safety
Date Fri, 27 May 2011 05:18:52 GMT
Yes, it does deal with data merging and yes, doing a major compaction would be needed to guarantee
the store files are as small as possible. 

-chris



On May 26, 2011, at 7:00 PM, Weihua JIANG <weihua.jiang@gmail.com> wrote:

> Thanks. It seems quite useful.
> 
> Does bulk load support data merging? I.e. there is a table with
> existing data and I want to add more data into it. The new data row
> key range is mixed with the existing data row key range. So, the final
> effect is the new data shall be inserted into existing regions.
> 
> If bulk load supports this feature, then it is the ideal solution to me?
> 
> And do I need to perform a major compact after bulk load to ensure
> store file number is small?
> 
> 
> Thanks
> Weihua
> 
> 2011/5/27 Chris Tarnas <cft@email.com>:
>> Your second solution sounds quite similar to the bulk loader. Actually the bulk load
is a bit simpler and bypasses even more of the regionserver's overhead:
>> 
>> http://hbase.apache.org/bulk-loads.html
>> 
>> Using M/R it creates HFiles in HDFS directly, then add the Hfiles them to the existing
regionservers.
>> 
>> -chris
>> 
>> 
>> On May 26, 2011, at 12:38 AM, Weihua JIANG wrote:
>> 
>>> Hi all,
>>> 
>>> As I know, WAL is used to ensure the data is safe even if certain RS
>>> or the whole HBase cluster is down. But, it is anyway a burden on each
>>> put.
>>> 
>>> I am wondering: is there any way to disable WAL while keeping data safety.
>>> 
>>> An ideal solution to me looks like this:
>>> 1. clients continuely put records with WAL disabled.
>>> 2. clients call a certain HBase method to ensure all the
>>> previously-put records are safely stored persistently, then it can
>>> remove the records at client side.
>>> 3. on errror, client re-put the maybe-lost records.
>>> 
>>> Or a slightly different solution is:
>>> 1. clients continuely put records on HDFS using sequential file.
>>> 2. clients periodly flush HDFS file and remove the previously put
>>> records at client side.
>>> 3. after all records are stored on HDFS, use a map-reduce job to put
>>> the records into HBase with WAL disabled.
>>> 4. before each map-reduce task finish, a certain HBase method is
>>> called to flush the memory data onto HDFS.
>>> 5. if on error, certain map-reduce task is re-executed (equvalent to
>>> replay log).
>>> 
>>> Is there any way to do so in HBase? If no, do you have any plan to
>>> support such usage model in near future?
>>> 
>>> 
>>> Thanks
>>> Weihua
>> 
>> 

Mime
View raw message