hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lars George <lars.geo...@gmail.com>
Subject Re: How to put() and get() when setAutoFlush(false)?
Date Tue, 23 Nov 2010 09:34:25 GMT
Hi Xin,

You can always ask for the write buffer from the table using
HTable.getWriteBuffer(), but yll you get is a list of the uncommitted
Puts. You would need to handle them yourself to get values back.

Lars

On Tue, Nov 23, 2010 at 4:12 AM, Xin Wang <andywx@gmail.com> wrote:
> Hi Ryan,
>    Thank you for your reply. Actually, my source data file is a sequence of
> triples. Each line is a triple of the form (k, p, v), which means the key k
> has a property p whose value is v. A key k can have multiple different
> properties and values. And the triples for the same key may not occur in the
> data file consecutively. However, I want to store a key k with all its
> properties and values, p1, v1, p2, v2, ... pn, vn, as a row in HTable. My
> HTable structure is as follows:
>
> ----------------------------------------------------------------------------------------
>                 MyColumnFamily
>
> ----------------------------------------------------------------------------------------
>    | numOfCol   |   p1   |   v1   |   p2   |   v2   |   ...   |   pn  
|
> vn |
>
> ----------------------------------------------------------------------------------------
> where numOfCol records the number of the following column pairs p1, v1, p2,
> v2, ..., pn, vn. I need numOfCol because when I read in a triple later on
> and it is the (n+1) th property and value for key k, then I use (numberOfCol
> + 1) to make the column name of the (n+1) th property and value.
>
> So as your code snippet shown, using a reference to the lastPut is not
> enough for my case. This is why I have to use HTable.get() to retrieve a row
> key (i.e., a key k) that was HTable.put() before. Of course I don't want to
> use setAutoFlush(true), it's too slow. Do you have any suggestion?
>
> Thank you so much!
>
> Best wishes,
>
> --
> Xin
>
> 2010/11/23 Ryan Rawson <ryanobjc@gmail.com>
>
>> Hi,
>>
>> You could implement this in a code structure like so:
>>
>> HTable table = new HTable(tableName, conf);
>> Put lastPut = null;
>> while ( moreData ) {
>>    Put put = makeNewPutBasedOnLastPutToo( lastPut, dataSource );
>>    table.put(put);
>>    lastPut = put;
>>    dataSource.next();
>> }
>>
>> if that is unsatisfactory you may access the write buffer via
>> HTable.getWRiteBuffer().
>>
>> -ryan
>>
>>
>> On Mon, Nov 22, 2010 at 5:41 PM, Xin Wang <andywx@gmail.com> wrote:
>> > Hello everyone,
>> >
>> >  I am a beginner to HBase. I want to load a data file of 2 million lines
>> > into a HBase table.
>> >  I want to load data as fast as possible, so I called
>> > HTable.setAutoFlush(false) at the beginning. However, when I HTable.put()
>> a
>> > row and then HTable.get() the same row, the result is empty. I know this
>> is
>> > because the setAutoFlush(false) make put() write into the buffer. But the
>> > algorithm in my loading process requires to read the value of the
>> previous
>> > one that just is put into the HTable cell. I have tried to make
>> > setAutoFlush(true), although the previous value can be read but the
>> loading
>> > process is slower down by about an order of magnitude. Can I get() value
>> > directly from the write buffer? Are there any other solutions to this
>> > problem that I do not know? Thank you in advance!
>> >
>> >  Best regards,
>> >
>> > Xin Wang
>> >
>>
>

Mime
View raw message