hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xin Wang <and...@gmail.com>
Subject Re: How to put() and get() when setAutoFlush(false)?
Date Tue, 23 Nov 2010 03:12:10 GMT
Hi Ryan,
    Thank you for your reply. Actually, my source data file is a sequence of
triples. Each line is a triple of the form (k, p, v), which means the key k
has a property p whose value is v. A key k can have multiple different
properties and values. And the triples for the same key may not occur in the
data file consecutively. However, I want to store a key k with all its
properties and values, p1, v1, p2, v2, ... pn, vn, as a row in HTable. My
HTable structure is as follows:


    | numOfCol   |   p1   |   v1   |   p2   |   v2   |   ...   |   pn   |
vn |

where numOfCol records the number of the following column pairs p1, v1, p2,
v2, ..., pn, vn. I need numOfCol because when I read in a triple later on
and it is the (n+1) th property and value for key k, then I use (numberOfCol
+ 1) to make the column name of the (n+1) th property and value.

So as your code snippet shown, using a reference to the lastPut is not
enough for my case. This is why I have to use HTable.get() to retrieve a row
key (i.e., a key k) that was HTable.put() before. Of course I don't want to
use setAutoFlush(true), it's too slow. Do you have any suggestion?

Thank you so much!

Best wishes,


2010/11/23 Ryan Rawson <ryanobjc@gmail.com>

> Hi,
> You could implement this in a code structure like so:
> HTable table = new HTable(tableName, conf);
> Put lastPut = null;
> while ( moreData ) {
>    Put put = makeNewPutBasedOnLastPutToo( lastPut, dataSource );
>    table.put(put);
>    lastPut = put;
>    dataSource.next();
> }
> if that is unsatisfactory you may access the write buffer via
> HTable.getWRiteBuffer().
> -ryan
> On Mon, Nov 22, 2010 at 5:41 PM, Xin Wang <andywx@gmail.com> wrote:
> > Hello everyone,
> >
> >  I am a beginner to HBase. I want to load a data file of 2 million lines
> > into a HBase table.
> >  I want to load data as fast as possible, so I called
> > HTable.setAutoFlush(false) at the beginning. However, when I HTable.put()
> a
> > row and then HTable.get() the same row, the result is empty. I know this
> is
> > because the setAutoFlush(false) make put() write into the buffer. But the
> > algorithm in my loading process requires to read the value of the
> previous
> > one that just is put into the HTable cell. I have tried to make
> > setAutoFlush(true), although the previous value can be read but the
> loading
> > process is slower down by about an order of magnitude. Can I get() value
> > directly from the write buffer? Are there any other solutions to this
> > problem that I do not know? Thank you in advance!
> >
> >  Best regards,
> >
> > Xin Wang
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message