hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Which is the fastest way to dump the content of Hbase table?
Date Wed, 19 Aug 2015 15:51:53 GMT
bq. 'spilling map output' occupied most of whole time.

Do you mind giving more detail on the above (percentage of job runtime) ?

Which release of hadoop / hbase are you using ?

Cheers

On Tue, Aug 18, 2015 at 11:11 PM, dong.yajun <dongtalk@gmail.com> wrote:

> Hello,
>
> Which is the fastest way to dump  the content of Hbase table to Hdfs?  is
> it possible to use the hbase snapshot + Spark to do this?
>
> now we have already use the hbase snapshot + mapreduce-v2(does not via the
> Htable) to convert the HFiles to OrcFile, but we found the 'spilling map
> output' occupied most of whole time.  so the spark can decrease the cost?
>
> map task: read the hfile, and convert it to KeyValues
>
> reduce task: merge the keyvalues of same rowkey
>
> thanks.
>
> --
> *Ric Dong*
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message