hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hari Sreekumar <hsreeku...@clickable.com>
Subject Re: Data taking up too much space when put into HBase
Date Wed, 10 Nov 2010 06:54:59 GMT
I checked the "browse filesystem" link in the web interface (50070). HBase
creates a directly named after the table ,and in the directory, there are
files which are 5-6 MB in size, on average. Some are in kbs, and there are
some of 12-13 MB size, but most are around  6 MB. I was thinking these files
are stored in 64 MB blocks, leading to the space usage.

hari

On Wed, Nov 10, 2010 at 11:56 AM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:

> I'm pretty sure that's not how it's reported by the "du" command, but
> I wouldn't expect to see files of 5MB on average. Can you be more
> specific?
>
> J-D
>
> On Tue, Nov 9, 2010 at 9:58 PM, Hari Sreekumar <hsreekumar@clickable.com>
> wrote:
> > Ah, so the bloat is not because of the files being 5-6 MB in size?
> Wouldn't
> > a 6 MB file occupy 64 MB if I set block size as 64 MB?
> >
> > hari
> >
> > On Wed, Nov 10, 2010 at 11:16 AM, Jean-Daniel Cryans <
> jdcryans@apache.org>wrote:
> >
> >> Each value is stored with it's full key e.g. row key + family +
> >> qualifier + timestamp + offsets. You don't give any information
> >> regarding how you stored the data, but if you have large enough keys
> >> then it should easily explain the bloat.
> >>
> >> J-D
> >>
> >> On Tue, Nov 9, 2010 at 9:21 PM, Hari Sreekumar <
> hsreekumar@clickable.com>
> >> wrote:
> >> > Hi,
> >> >
> >> >     Data seems to be taking up too much space when I put into HBase.
> e.g,
> >> I
> >> > have a 2 GB text file which seems to be taking up ~70 GB when I dump
> into
> >> > HBase. I have block size set to 64 MB and replication=3, which I think
> is
> >> > the possible reason for this expansion. But if that is the case, how
> can
> >> I
> >> > prevent it? Decreasing the block size will have a negative impact on
> >> > performance, so is there a way I can increase the average size on
> >> > HBase-created  files to be comparable to 64 MB. Right now they are ~5
> MB
> >> on
> >> > average. Or is this an entirely different thing at work here?
> >> >
> >> > thanks,
> >> > hari
> >> >
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message