hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Xie <nick.xie.had...@gmail.com>
Subject Re: HBase 6x bigger than raw data
Date Mon, 27 Jan 2014 22:35:58 GMT
Hi Ted,

it is 0.92.1. Does the version matter?

Thanks,

Nick


On Mon, Jan 27, 2014 at 2:32 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> Which HBase release are you using ?
>
>
> On Mon, Jan 27, 2014 at 2:12 PM, Nick Xie <nick.xie.hadoop@gmail.com>
> wrote:
>
> > I'm importing a set of data into HBase. The CSV file contains 82 entries
> > per line. Starting with 8 byte ID, followed by 16 byte date and the rest
> > are 80 numbers with 4 bytes each.
> >
> > The current HBase schema is: ID as row key, date as a 'date' family with
> > 'value' qualifier, the rest is in another family called 'readings' with
> > 'P0', 'P1', 'P2', ... through 'P79' as qualifiers.
> >
> > I'm testing this on a single node cluster with HBase running in pseudo
> > distributed mode (no replication, no compression for HBase)...After
> > importing a CSV file with 150MB of size in HDFS(no replication), I
> checked
> > the the table size, and it shows ~900MB which is 6x times larger than it
> is
> > in HDFS....
> >
> > Why there is so large overhead on this? Am I doing anything wrong here?
> >
> > Thanks,
> >
> > Nick
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message