hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: when Hbase open a region, what does it do? problem with super big row(1.5GB data in one row)
Date Thu, 11 Mar 2010 15:47:46 GMT
Yes.

Specify a column family or a column family + column qualifier to load
less than total row.

St.Ack

On Wed, Mar 10, 2010 at 11:36 PM, William Kang <weliam.cloud@gmail.com> wrote:
> Hi,
> I have another question. If we do things like following:
>
> "Get g = new Get(Bytes.toBytes("rowname"));
>
>
> Result r = table.get(g);"
>
>
> Will HBase load the entire row into memory? Thanks.
>
>
>
> William
>
>
> On Tue, Mar 9, 2010 at 1:26 AM, Stack <stack@duboce.net> wrote:
>
>> On Mon, Mar 8, 2010 at 6:58 PM, William Kang <weliam.cloud@gmail.com>
>> wrote:
>> > Hi,
>> > Can you give me some more details about how the information in a row can
>> be
>> > fetched? I understand that a file like 1.5 G may have multiple HFiles in
>> a
>> > region server. If the client want to access a column label value in  that
>> > row, what is going to happen?
>>
>> Only that cell is fetched if you specify an explicity column name
>> (column family + qualifier).
>>
>> After HBase found the region store this row,
>> > it goes to region .meta and find the index of the HFile that store the
>> > column family. And the HFile has the offset of keyvalue pairs. Then HBase
>> > can go to the keyvalue pair and get the value for a certain column label.
>> >
>>
>> Yes.
>>
>>
>> > Why the whole row needs to be read in memory?
>> >
>>
>> If you ask for the whole row, it will try to load it all to deliver it
>> all to you.  There is no "streaming" api per se.  Rather a Result
>> object is passed from server to client which has in it all in a row
>> keyed by column name.
>>
>> That said, if you want the whole row and you are scanning as opposed
>> to getting, TRUNK has hbase-1537 applied which allows for intra-row
>> scanning -- you call setBatch to set maximum returned within a row and
>> the 0.20 branch has HBASE-1996, which allows you set maximum size
>> returned on a next invocation (in both cases, if the row is not
>> exhausted, the next 'next' invocation will return more out of the
>> current row, and so on, until the row is exhausted).
>>
>> > If HBase does not read the whole row at once, what caused its
>> inefficiency?
>>
>> I think Ryan is just allowing that the above means of scanning parts
>> of rows may have bugs that we've not yet squashed.
>>
>> St.Ack
>>
>>
>> > Thanks.
>> >
>> >
>> > William
>> >
>> > On Mon, Mar 8, 2010 at 3:44 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
>> >
>> >> Hi,
>> >>
>> >> At this time, truly massive massive rows such as the one you described
>> >> may behave non-optimally in hbase. While in previous versions of
>> >> HBase, reading an entire row required you to be able to actually read
>> >> and send the entire row in one go, there is a new API that allows you
>> >> to get effectively stream rows.  There are still some read paths that
>> >> may read more data than necessary, so your performance milage may
>> >> vary.
>> >>
>> >>
>> >>
>> >> On Sun, Mar 7, 2010 at 3:56 AM, Ahmed Suhail Manzoor
>> >> <suhailski@gmail.com> wrote:
>> >> > Hi,
>> >> >
>> >> > This might prove to be a blatantly obvious questions but wouldn't it
>> make
>> >> > sense to store large files directly in HDFS and keep the metadata
>> about
>> >> the
>> >> > file in HBase? One could for instance serialize set the details of
the
>> >> hdfs
>> >> > file in a java object and store that in hbase. This object could
>> export
>> >> the
>> >> > reading of the hdfs file for instance so that one is left with clean
>> >> code.
>> >> > Anything wrong in implementing things this way?
>> >> >
>> >> > Cheers
>> >> > su./hail
>> >> >
>> >> > On 07/03/2010 09:21, tsuna wrote:
>> >> >>
>> >> >> On Sat, Mar 6, 2010 at 9:14 PM, steven zhuang
>> >> >> <steven.zhuang.1984@gmail.com>  wrote:
>> >> >>
>> >> >>>
>> >> >>>          I have a table which may contain super big rows,
e.g. with
>> >> >>> millions of cells in one row, 1.5GB in size.
>> >> >>>
>> >> >>>          now I have problem at emitting data into the
table,
>> probably
>> >> >>> because of these super big rows are too large for my
>> regionserver(with
>> >> >>> only
>> >> >>> 1GB heap)
>> >> >>>
>> >> >>
>> >> >> A row can't be split and whatever you do that needs that row (like
>> >> >> reading it) requires that HBase loads the entire row in memory.
 If
>> >> >> the row is 1.5GB and your regionserver has only 1G of memory, it
>> won't
>> >> >> be able to use that row.
>> >> >>
>> >> >> I'm not 100% sure about that because I'm still a HBase n00b too,
but
>> >> >> that's my understanding.
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >>
>> >
>>
>

Mime
View raw message