hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matteo Bertozzi <theo.berto...@gmail.com>
Subject Re: hbase hdfs snapshots
Date Sat, 11 Jul 2015 00:09:05 GMT
@Vladimir there is no hfile link creation on snapshot. we create 1 manifest
per region

Matteo


On Fri, Jul 10, 2015 at 5:06 PM, Vladimir Rodionov <vladrodionov@gmail.com>
wrote:

> Being not very familiar with snapshot code, I could speculate only on where
> most time is spent ...
>
> In creating 60K x 4 x K (K is average # of store files per region) small
> HFileLInks? This can be very large # of files.
>
> -Vlad
>
>
>
> On Fri, Jul 10, 2015 at 4:57 PM, Matteo Bertozzi <theo.bertozzi@gmail.com>
> wrote:
>
> > the total time taken by a snapshot should be bounded by the slowest
> > machine.
> > we send a notification to each RS and each RS execute the snapshot
> > operation for each region.
> > can you track down what is slow in your case?
> >
> > clone has to create a reference for each file, and that is a master
> > operation, and these calls may all go away if we change the layout in a
> > proper way instead of doing what is proposed in HBASE-13991.
> > Most of the time should be spent on the enableTable phase of the clone.
> >
> >
> > On Fri, Jul 10, 2015 at 4:36 PM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org> wrote:
> >
> > > Hi Rahul,
> > >
> > > Have you identified with it takes those 30 minutes? Is the table
> balances
> > > correctly across the servers? Form the logs, are you able to identify
> > what
> > > takes that much time?
> > >
> > > JM
> > >
> > > 2015-07-10 18:46 GMT-04:00 rahul gidwani <rahul.gidwani@gmail.com>:
> > >
> > > > Hi Matteo,
> > > >
> > > > We do SKIP_FLUSH.  We have 1200+ regionservers with a single table
> with
> > > 60k
> > > > regions and 4 column families.  It takes around 30 minutes to
> snapshot
> > > this
> > > > table using manifests compared to just seconds doing this with hdfs.
> > > > Cloning this table takes considerably longer.
> > > >
> > > > For cases where someone would want to run Map/Reduce over snapshots
> > this
> > > > could be much faster as we could take an hdfs snapshot and bypass the
> > > > clone.
> > > >
> > > > rahul
> > > >
> > > >
> > > > On Thu, Jul 9, 2015 at 12:20 PM, Matteo Bertozzi <
> > > theo.bertozzi@gmail.com>
> > > > wrote:
> > > >
> > > > > On Thu, Jul 9, 2015 at 12:12 PM, rahul gidwani <
> > > rahul.gidwani@gmail.com>
> > > > >  wrote:
> > > > >
> > > > > > Even with manifests (Snapshot V2) for our larger tables it can
> take
> > > > hours
> > > > > > to Snapshot and Clone a table.
> > > > > >
> > > > >
> > > > > on snapshot time the only thing that can take hours, is "flush".
> > > > > if you don't need that (which is what you get with hdfs snapshots)
> > you
> > > > can
> > > > > specify SKIP_FLUSH => true
> > > > >
> > > > >
> > > > > Matteo
> > > > >
> > > > >
> > > > > On Thu, Jul 9, 2015 at 12:12 PM, rahul gidwani <
> > > rahul.gidwani@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > HBase snapshots are a very useful feature. but it was implemented
> > > back
> > > > > > before there was the ability to snapshot via HDFS.
> > > > > >
> > > > > > Newer versions of Hadoop support HDFS snapshots.  I was wondering
> > if
> > > > the
> > > > > > community would be interested in something like a Snapshot V3
> where
> > > we
> > > > > use
> > > > > > HDFS to take these snapshots.
> > > > > >
> > > > > > Even with manifests (Snapshot V2) for our larger tables it can
> take
> > > > hours
> > > > > > to Snapshot and Clone a table.
> > > > > >
> > > > > > Would this feature be of use to anyone?
> > > > > >
> > > > > > thanks
> > > > > > rahul
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message