hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Corgan <mcor...@hotpads.com>
Subject Re: Data size
Date Thu, 01 Apr 2010 02:05:59 GMT
Out of curiousity, why is it necessary to store the family and row with
every cell?  Aren't all the contents of a family confined to the same file,
and couldn't a row length be stored at the beginning of each row or in a
block index?  Is this true for values in the caches and memstore as well?

It could have drastic implications for storing rows with many small values
but with long keys, long column names, and innocently verbose column family
names.

Matt

2010/3/31 alex kamil <alex.kamil@gmail.com>

> i would also suggest to chk dfs.*replication* setting in hdfs (in /conf/*
> hdfs*-site.xml)
>
> A-K
>
> 2010/3/31 Jean-Daniel Cryans <jdcryans@apache.org>
>
> > HBase is column-oriented; every cell is stored with the row, family,
> > qualifier and timestamp so every pieces of data will bring a larger
> > disk usage. Without any knowledge of your keys, I can't comment much
> > more.
> >
> > Then HDFS keeps a trash so every file compacted will end up there...
> > if you just did the import, there will be a lot of these.
> >
> > Finally if you imported the data more than once, hbase keeps 3
> > versions by default.
> >
> > So in short, is it reasonable? Answer: it depends!
> >
> > J-D
> >
> > 2010/3/31  <y_823910@tsmc.com>:
> > > Hi,
> > >
> > > We've dumped oracele data to files then put these files into different
> > > hbase table.
> > > The size of these files is 35G; we saw the HDFS usage up to 562G after
> > > putting it into hbase.
> > > Is that reasonable?
> > > Thanks
> > >
> > >
> > >
> > > Fleming Chiu(邱宏明)
> > > 707-6128
> > > y_823910@tsmc.com
> > > 週一無肉日吃素救地球(Meat Free Monday Taiwan)
> > >
> > >
> > >
> >
>  ---------------------------------------------------------------------------
> > >                                                         TSMC PROPERTY
> > >  This email communication (and any attachments) is proprietary
> > information
> > >  for the sole use of its
> > >  intended recipient. Any unauthorized review, use or distribution by
> > anyone
> > >  other than the intended
> > >  recipient is strictly prohibited.  If you are not the intended
> > recipient,
> > >  please notify the sender by
> > >  replying to this email, and then delete this email and any copies of
> it
> > >  immediately. Thank you.
> > >
> >
>  ---------------------------------------------------------------------------
> > >
> > >
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message