hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun Allamsetty <arun.allamse...@gmail.com>
Subject Re: Best practice for writing to HFileOutputFormat(2) with multiple Column Families
Date Fri, 01 Aug 2014 16:24:44 GMT
Hi Jianshi,

Do you mean that you want to sort the row keys? If yes, then you don't have
to worry about it because HBase sorts the row keys on its own but
lexicographically.

Cheers,
Arun

Sent from a mobile device. Please don't mind the typos.
On Jul 30, 2014 9:02 PM, "Jianshi Huang" <jianshi.huang@gmail.com> wrote:

> I need to generate from a 2TB dataset and exploded it to 4 Column Families.
>
> The result dataset is likely to be 20TB or more. I'm currently using Spark
> so I sorted the (rk, cf, cq) myself. It's huge and I'm considering how to
> optimize it.
>
> My question is:
> Should I sort and write each column family one by one, or should I put them
> all together then do sort and write?
>
> Does my question make sense?
>
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message