hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Kang <weliam.cl...@gmail.com>
Subject Re: when Hbase open a region, what does it do? problem with super big row(1.5GB data in one row)
Date Tue, 09 Mar 2010 02:58:38 GMT
Hi,
Can you give me some more details about how the information in a row can be
fetched? I understand that a file like 1.5 G may have multiple HFiles in a
region server. If the client want to access a column label value in  that
row, what is going to happen? After HBase found the region store this row,
it goes to region .meta and find the index of the HFile that store the
column family. And the HFile has the offset of keyvalue pairs. Then HBase
can go to the keyvalue pair and get the value for a certain column label.

Why the whole row needs to be read in memory?

If HBase does not read the whole row at once, what caused its inefficiency?
Thanks.


William

On Mon, Mar 8, 2010 at 3:44 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:

> Hi,
>
> At this time, truly massive massive rows such as the one you described
> may behave non-optimally in hbase. While in previous versions of
> HBase, reading an entire row required you to be able to actually read
> and send the entire row in one go, there is a new API that allows you
> to get effectively stream rows.  There are still some read paths that
> may read more data than necessary, so your performance milage may
> vary.
>
>
>
> On Sun, Mar 7, 2010 at 3:56 AM, Ahmed Suhail Manzoor
> <suhailski@gmail.com> wrote:
> > Hi,
> >
> > This might prove to be a blatantly obvious questions but wouldn't it make
> > sense to store large files directly in HDFS and keep the metadata about
> the
> > file in HBase? One could for instance serialize set the details of the
> hdfs
> > file in a java object and store that in hbase. This object could export
> the
> > reading of the hdfs file for instance so that one is left with clean
> code.
> > Anything wrong in implementing things this way?
> >
> > Cheers
> > su./hail
> >
> > On 07/03/2010 09:21, tsuna wrote:
> >>
> >> On Sat, Mar 6, 2010 at 9:14 PM, steven zhuang
> >> <steven.zhuang.1984@gmail.com>  wrote:
> >>
> >>>
> >>>          I have a table which may contain super big rows, e.g. with
> >>> millions of cells in one row, 1.5GB in size.
> >>>
> >>>          now I have problem at emitting data into the table, probably
> >>> because of these super big rows are too large for my regionserver(with
> >>> only
> >>> 1GB heap)
> >>>
> >>
> >> A row can't be split and whatever you do that needs that row (like
> >> reading it) requires that HBase loads the entire row in memory.  If
> >> the row is 1.5GB and your regionserver has only 1G of memory, it won't
> >> be able to use that row.
> >>
> >> I'm not 100% sure about that because I'm still a HBase n00b too, but
> >> that's my understanding.
> >>
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message