hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Kang <weliam.cl...@gmail.com>
Subject Re: when Hbase open a region, what does it do? problem with super big row(1.5GB data in one row)
Date Wed, 10 Mar 2010 03:52:50 GMT
Hi St.Ack,
Thanks for your reply. It is very helpful.


William

On Tue, Mar 9, 2010 at 1:26 AM, Stack <stack@duboce.net> wrote:

> On Mon, Mar 8, 2010 at 6:58 PM, William Kang <weliam.cloud@gmail.com>
> wrote:
> > Hi,
> > Can you give me some more details about how the information in a row can
> be
> > fetched? I understand that a file like 1.5 G may have multiple HFiles in
> a
> > region server. If the client want to access a column label value in  that
> > row, what is going to happen?
>
> Only that cell is fetched if you specify an explicity column name
> (column family + qualifier).
>
> After HBase found the region store this row,
> > it goes to region .meta and find the index of the HFile that store the
> > column family. And the HFile has the offset of keyvalue pairs. Then HBase
> > can go to the keyvalue pair and get the value for a certain column label.
> >
>
> Yes.
>
>
> > Why the whole row needs to be read in memory?
> >
>
> If you ask for the whole row, it will try to load it all to deliver it
> all to you.  There is no "streaming" api per se.  Rather a Result
> object is passed from server to client which has in it all in a row
> keyed by column name.
>
> That said, if you want the whole row and you are scanning as opposed
> to getting, TRUNK has hbase-1537 applied which allows for intra-row
> scanning -- you call setBatch to set maximum returned within a row and
> the 0.20 branch has HBASE-1996, which allows you set maximum size
> returned on a next invocation (in both cases, if the row is not
> exhausted, the next 'next' invocation will return more out of the
> current row, and so on, until the row is exhausted).
>
> > If HBase does not read the whole row at once, what caused its
> inefficiency?
>
> I think Ryan is just allowing that the above means of scanning parts
> of rows may have bugs that we've not yet squashed.
>
> St.Ack
>
>
> > Thanks.
> >
> >
> > William
> >
> > On Mon, Mar 8, 2010 at 3:44 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> At this time, truly massive massive rows such as the one you described
> >> may behave non-optimally in hbase. While in previous versions of
> >> HBase, reading an entire row required you to be able to actually read
> >> and send the entire row in one go, there is a new API that allows you
> >> to get effectively stream rows.  There are still some read paths that
> >> may read more data than necessary, so your performance milage may
> >> vary.
> >>
> >>
> >>
> >> On Sun, Mar 7, 2010 at 3:56 AM, Ahmed Suhail Manzoor
> >> <suhailski@gmail.com> wrote:
> >> > Hi,
> >> >
> >> > This might prove to be a blatantly obvious questions but wouldn't it
> make
> >> > sense to store large files directly in HDFS and keep the metadata
> about
> >> the
> >> > file in HBase? One could for instance serialize set the details of the
> >> hdfs
> >> > file in a java object and store that in hbase. This object could
> export
> >> the
> >> > reading of the hdfs file for instance so that one is left with clean
> >> code.
> >> > Anything wrong in implementing things this way?
> >> >
> >> > Cheers
> >> > su./hail
> >> >
> >> > On 07/03/2010 09:21, tsuna wrote:
> >> >>
> >> >> On Sat, Mar 6, 2010 at 9:14 PM, steven zhuang
> >> >> <steven.zhuang.1984@gmail.com>  wrote:
> >> >>
> >> >>>
> >> >>>          I have a table which may contain super big rows, e.g.
with
> >> >>> millions of cells in one row, 1.5GB in size.
> >> >>>
> >> >>>          now I have problem at emitting data into the table,
> probably
> >> >>> because of these super big rows are too large for my
> regionserver(with
> >> >>> only
> >> >>> 1GB heap)
> >> >>>
> >> >>
> >> >> A row can't be split and whatever you do that needs that row (like
> >> >> reading it) requires that HBase loads the entire row in memory.  If
> >> >> the row is 1.5GB and your regionserver has only 1G of memory, it
> won't
> >> >> be able to use that row.
> >> >>
> >> >> I'm not 100% sure about that because I'm still a HBase n00b too, but
> >> >> that's my understanding.
> >> >>
> >> >>
> >> >
> >> >
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message