hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <la...@apache.org>
Subject Re: A better way to migrate the whole cluster?
Date Fri, 15 Aug 2014 17:57:45 GMT
Well, 1.8TB in 24h is 1.8/24/3600TB/s ~ 22KB/s. That seems pretty slow to me. :)

My bet is still on scanner caching set to 1 by default in 0.94 (and hence each mapper does
an RPC for every single row, making this is a local latency problem not a bandwidth problem).

As stated in other email, try adding these two to the CopyTable command:


-Dhbase.client.scanner.caching=100
-Dmapred.map.tasks.speculative.execution=false


-- Lars



________________________________
 From: Esteban Gutierrez <esteban@cloudera.com>
To: "dev@hbase.apache.org" <dev@hbase.apache.org> 
Cc: lars hofhansl <larsh@apache.org> 
Sent: Friday, August 15, 2014 10:11 AM
Subject: Re: A better way to migrate the whole cluster?
 

1.8TB in a day is not terrible slow if that number comes from the CopyTable
counters and you are moving data across data centers using public networks,
that should be about 20MB/sec. Also, CopyTable won't compress anything on
the wire so the network overhead should be a lot. If you use anything like
snappy for block compression and/or fast_diff for block encoding the
HFiles, then using snapshots and export them using the ExportSnapshot tool
should be the way to go.

cheers,
esteban.



--
Cloudera, Inc.




On Thu, Aug 14, 2014 at 11:24 PM, tobe <tobeg3oogle@gmail.com> wrote:

> Thank @lars.
>
> We're using HBase 0.94.11 and follow the instruction to run `./bin/hbase
> org.apache.hadoop.hbase.mapreduce.CopyTable --peer.adr=hbase://cluster_name
> table_name`. We have namespace service to find the ZooKeeper with
> "hbase://cluster_name". And the job ran on a shared yarn cluster.
>
> The performance is affected by many factors, but we haven't found out the
> reason. It would be great to see your suggestions.
>
>
> On Fri, Aug 15, 2014 at 1:34 PM, lars hofhansl <larsh@apache.org> wrote:
>
> > What version of HBase? How are you running CopyTable? A day for 1.8T is
> > not what we would expect.
> > You can definitely take a snapshot and then export the snapshot to
> another
> > cluster, which will move the actual files; but CopyTable should not be so
> > slow.
> >
> >
> > -- Lars
> >
> >
> >
> > ________________________________
> >  From: tobe <tobeg3oogle@gmail.com>
> > To: "user@hbase.apache.org" <user@hbase.apache.org>
> > Cc: dev@hbase.apache.org
> > Sent: Thursday, August 14, 2014 8:18 PM
> > Subject: A better way to migrate the whole cluster?
> >
> >
> > Sometimes our users want to upgrade their servers or move to a new
> > datacenter, then we have to migrate the data from HBase. Currently we
> > enable the replication from the old cluster to the new cluster, and run
> > CopyTable to move the older data.
> >
> > It's a little inefficient. It takes more than one day to migrate 1.8T
> data
> > and more time to verify. Can we have a better way to do that, like
> snapshot
> > or purely HDFS files?
> >
> > And what's the best practise or your valuable experience?
> >
>
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message