hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Rename tables or swap alias
Date Tue, 16 Feb 2016 14:48:53 GMT
Please see http://hbase.apache.org/book.html#ops.snapshots for background
on snapshots.

In Anil's description, table_old is the result of cloning the snapshot
which is taken in step #1. See
http://hbase.apache.org/book.html#ops.snapshots.clone

Cheers

On Tue, Feb 16, 2016 at 6:35 AM, Pat Ferrel <pat@occamsmachete.com> wrote:

> I think I can work out the algorithm if I knew precisely what a “snapshot"
> does. From my reading it seems to be a lightweight fast alias (for lack of
> a better word) since it creates something that refers to the same physical
> data.So if I create a new table with cleaned data, call it table_new. Then
> I drop table_old and “snapshot” table_new into table_old? Is this what is
> suggested?
>
> This leaves me with a small time where there is no table_old, which is the
> time between dropping table_old and creating a snapshot. Is it feasible to
> lock the DB for this time?
>
> > On Feb 15, 2016, at 7:13 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> > Keep in mind that if the writes to this table are not paused, there would
> > be some data coming in between steps #1 and #2 which would not be in the
> > snapshot.
> >
> > Cheers
> >
> > On Mon, Feb 15, 2016 at 6:21 PM, Anil Gupta <anilgupta84@gmail.com>
> wrote:
> >
> >> I dont think there is any atomic operations in hbase to support ddl
> across
> >> 2 tables.
> >>
> >> But, maybe you can use hbase snapshots.
> >> 1.Create a hbase snapshot.
> >> 2.Truncate the table.
> >> 3.Write data to the table.
> >> 4.Create a table from snapshot taken in step #1 as table_old.
> >>
> >> Now you have two tables. One with current run data and other with last
> run
> >> data.
> >> I think above process will suffice. But, keep in mind that it is not
> >> atomic.
> >>
> >> HTH,
> >> Anil
> >> Sent from my iPhone
> >>
> >>> On Feb 15, 2016, at 4:25 PM, Pat Ferrel <pat@occamsmachete.com> wrote:
> >>>
> >>> Any other way to do what I was asking. With Spark this is a very normal
> >> thing to treat a table as immutable and create another to replace the
> old.
> >>>
> >>> Can you lock two tables and rename them in 2 actions then unlock in a
> >> very short period of time?
> >>>
> >>> Or an alias for table names?
> >>>
> >>> Didn’t see these in any docs or Googling, any help is appreciated.
> >> Writing all this data back to the original table would be a huge load
> on a
> >> table being written to by external processes and therefore under large
> load
> >> to begin with.
> >>>
> >>>> On Feb 14, 2016, at 5:03 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> >>>>
> >>>> There is currently no native support for renaming two tables in one
> >> atomic
> >>>> action.
> >>>>
> >>>> FYI
> >>>>
> >>>>> On Sun, Feb 14, 2016 at 4:18 PM, Pat Ferrel <pat@occamsmachete.com>
> >> wrote:
> >>>>>
> >>>>> I use Spark to take an old table, clean it up to create an RDD of
> >> cleaned
> >>>>> data. What I’d like to do is write all of the data to a new table
in
> >> HBase,
> >>>>> then rename the table to the old name. If possible it could be done
> by
> >>>>> changing an alias to point to the new table as long as all external
> >> code
> >>>>> uses the alias, or by a 2 table rename operation. But I don’t
see how
> >> to do
> >>>>> this for HBase. I am dealing with a lot of data so don’t want
to do
> >> table
> >>>>> modifications with deletes and upserts, this would be incredibly
> slow.
> >>>>> Furthermore I don’t want to disable the table for more than a
tiny
> >> span of
> >>>>> time.
> >>>>>
> >>>>> Is it possible to have 2 tables and rename both in an atomic action,
> or
> >>>>> change some alias to point to the new table in an atomic action.
If
> not
> >>>>> what is the quickest way to achieve this to minimize time disabled.
> >>>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message