hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: Sequental put/get/delete don't work
Date Mon, 01 Mar 2010 21:49:27 GMT
HBase is not like your typical database. It doesn't overwrite data in
situ, it doesn't delete data from disk right away, it uses delete
markers (aka tombstones).  What it does do is keep multiple versions,
and uses timestamps with millisecond accuracy to discern new and old
data. When you do rapid put/deletes you can run into the situation:

Put TS=0
Delete TS=0 (applies to the previous line)
Put TS=0  (oops, this is being masked by the previous delete marker)

In a production environment, this is less of an issue because of the
ping time between machines.  The best ping time is about 0.1 ms, then
there are application processing time and another RPC.

A potential solution is to use microsecond resolution, but this is not
as easy because it would require serious changes to the data format
and invalidate old installations.  There may be a solution with clever
bit-level stuff that can cause two formats to co-exist, but that is
just speculation.

Another strategy is to set timestamps by hand, but be careful.  You
might do this:
long now = System.currentTimeMillis();
Delete delete = new Delete(row, now);
// specify what to delete
table.delete(delete);
Put = new Put(row, now+1);
// specify what to put
table.put(put);

-ryan

On Mon, Mar 1, 2010 at 1:27 PM, Dan Washusen <dan@reactive.org> wrote:
> Comments inline...
>
> On 2 March 2010 08:05, Yura Taras <yura.taras@gmail.com> wrote:
>
>> Thanks, Dan
>>
>> 2. flush/autoCommit didn't help.
>>
>> 1. Do I understand correctly that deleting a row will ensure that I
>> won't be able to insert data to given row again? IMO it's weird.
>>
>
> Deleted rows are removed when a compaction occurs.  To be clear, you can put
> the same row again as long as you use a different timestamp.  It does take
> some getting used to... :)
>
>
>> Anyway, this wouldn't touch me, but I wanted to write unit tests which
>> use HBase and delete data between runs. Looks like I've got either to
>> drop/create tables, or to use store data timestamp as value, not
>> version.
>>
>
> Another option is to delete then
> call org.apache.hadoop.hbase.client.HBaseAdmin#majorCompact.  However, it's
> an async operation and it might be hard to track...
>
>
>> BTW, if I change
>>  table.delete(new Delete("row-id".getBytes()));
>> to
>> table.delete(new
>> Delete("row-id".getBytes()).deleteColumn("f1:c1".getBytes()));
>> test passes. It fails, if I try to delete family:
>> table.delete(new
>> Delete("row-id".getBytes()).deleteFamily("f1".getBytes()));
>>
>> On Mon, Mar 1, 2010 at 10:43 PM, Dan Washusen <dan@reactive.org> wrote:
>> > Hi Yura,
>> > Having a quick look at your code I can see the following issues;
>> >
>> >   1. Deletes are actually just flags to tell HBase that
>> timestamps/versions
>> >   of a row are to be deleted eventually.  In your test you are putting
>> the
>> >   same version twice with a delete in between.  Instead of calling
>> >   Put#add(byte[], byte [], long, byte[]) and providing a specific
>> timestamp
>> >   call Put#add(byte[], byte [], byte[]).  This way you get a new
>> >   timestamp/version for each put and your delete call will do what your
>> >   expecting it to do...
>> >   2. HTable batches up puts into a buffer and sends them to the server
>> once
>> >   the configured buffer size has been reached.  In your test you can
>> either
>> >   call org.apache.hadoop.hbase.client.HTable#flushCommits after you do
>> each
>> >   put or turn on auto flush
>> >   (org.apache.hadoop.hbase.client.HTable#setAutoFlush).
>> >
>> > Cheers,
>> > Dan
>> >
>> > On 2 March 2010 06:44, Yura Taras <yura.taras@gmail.com> wrote:
>> >
>> >> Hi all
>> >>
>> >> I'm learning HBase and I faced a following problem. I'm running HBase
>> >> 0.20.3 on Windows+Cygwin (just bin/start-hbase.sh). I'm creating
>> >> simple table in a shell:
>> >> hbase(main):033:0> create 't1', 'f1'
>> >> 0 row(s) in 2.0630 seconds
>> >>
>> >> Then I'm trying to execute following JUnit test - and it fails on last
>> >> assert. Literally - it inserts a value, deletes a row and inserts
>> >> value to same id again. When it tries to Get data for given row, null
>> >> is returned. When I try to rerun test again, it fails on first assert.
>> >> To fix it I have to run 'truncate 't1'' in my shell.
>> >> I can't believe there's a bug in such straightforward use case, so
>> >> either I do something wrong in a code or there's a problem with my
>> >> configuration.
>> >>
>> >> Thanks.
>> >> JUnit test:
>> >>    @Test
>> >>    public void sequentUses() throws IOException {
>> >>        HTable table = pool.getTable("t1");
>> >>        try {
>> >>            Put put = new Put("row-id".getBytes());
>> >>            put.add("f1".getBytes(), "c1".getBytes(), 1L,
>> "v1".getBytes());
>> >>            table.put(put);
>> >>
>> >>            Get get = new Get("row-id".getBytes());
>> >>            get.setMaxVersions(Integer.MAX_VALUE);
>> >>            Result res = table.get(get);
>> >>            assertNotNull(res.list());
>> >>            table.delete(new Delete("row-id".getBytes()));
>> >>
>> >>
>> >>
>> >>            put = new Put("row-id".getBytes());
>> >>            put.add("f1".getBytes(), "c1".getBytes(), 1L,
>> "v2".getBytes());
>> >>            table.put(put);
>> >>
>> >>            res = table.get(get);
>> >>            assertNotNull(res.list());
>> >>        } finally {
>> >>            pool.putTable(table);
>> >>        }
>> >>    }
>> >>
>> >
>>
>

Mime
View raw message