hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Is there any way to disable WAL while keeping data safety
Date Tue, 31 May 2011 02:08:02 GMT
Xiyun:
Take a look at https://issues.apache.org/jira/browse/HBASE-3871 for parallel
HFile splitting.

On Mon, May 30, 2011 at 6:31 PM, Gan, Xiyun <ganxiyun@gmail.com> wrote:

> I used BulkLoad to import data. The step of writing HFiles using m/r is
> fast, but the step of loading HFiles to hbase takes lots of time. It
> says  HFile at ****** no longer fits inside a single region. Splitting....
> Even worth, sometimes it throws Region is not online Exception.
>
> Thanks
>
> On Fri, May 27, 2011 at 1:18 PM, Chris Tarnas <cft@tarnas.org> wrote:
>
> > Yes, it does deal with data merging and yes, doing a major compaction
> would
> > be needed to guarantee the store files are as small as possible.
> >
> > -chris
> >
> >
> >
> > On May 26, 2011, at 7:00 PM, Weihua JIANG <weihua.jiang@gmail.com>
> wrote:
> >
> > > Thanks. It seems quite useful.
> > >
> > > Does bulk load support data merging? I.e. there is a table with
> > > existing data and I want to add more data into it. The new data row
> > > key range is mixed with the existing data row key range. So, the final
> > > effect is the new data shall be inserted into existing regions.
> > >
> > > If bulk load supports this feature, then it is the ideal solution to
> me?
> > >
> > > And do I need to perform a major compact after bulk load to ensure
> > > store file number is small?
> > >
> > >
> > > Thanks
> > > Weihua
> > >
> > > 2011/5/27 Chris Tarnas <cft@email.com>:
> > >> Your second solution sounds quite similar to the bulk loader. Actually
> > the bulk load is a bit simpler and bypasses even more of the
> regionserver's
> > overhead:
> > >>
> > >> http://hbase.apache.org/bulk-loads.html
> > >>
> > >> Using M/R it creates HFiles in HDFS directly, then add the Hfiles them
> > to the existing regionservers.
> > >>
> > >> -chris
> > >>
> > >>
> > >> On May 26, 2011, at 12:38 AM, Weihua JIANG wrote:
> > >>
> > >>> Hi all,
> > >>>
> > >>> As I know, WAL is used to ensure the data is safe even if certain RS
> > >>> or the whole HBase cluster is down. But, it is anyway a burden on
> each
> > >>> put.
> > >>>
> > >>> I am wondering: is there any way to disable WAL while keeping data
> > safety.
> > >>>
> > >>> An ideal solution to me looks like this:
> > >>> 1. clients continuely put records with WAL disabled.
> > >>> 2. clients call a certain HBase method to ensure all the
> > >>> previously-put records are safely stored persistently, then it can
> > >>> remove the records at client side.
> > >>> 3. on errror, client re-put the maybe-lost records.
> > >>>
> > >>> Or a slightly different solution is:
> > >>> 1. clients continuely put records on HDFS using sequential file.
> > >>> 2. clients periodly flush HDFS file and remove the previously put
> > >>> records at client side.
> > >>> 3. after all records are stored on HDFS, use a map-reduce job to put
> > >>> the records into HBase with WAL disabled.
> > >>> 4. before each map-reduce task finish, a certain HBase method is
> > >>> called to flush the memory data onto HDFS.
> > >>> 5. if on error, certain map-reduce task is re-executed (equvalent to
> > >>> replay log).
> > >>>
> > >>> Is there any way to do so in HBase? If no, do you have any plan to
> > >>> support such usage model in near future?
> > >>>
> > >>>
> > >>> Thanks
> > >>> Weihua
> > >>
> > >>
> >
>
>
>
> --
> Best wishes
> Gan, Xiyun
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message