hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yura Taras <yura.ta...@gmail.com>
Subject Re: Sequental put/get/delete don't work
Date Tue, 02 Mar 2010 10:13:58 GMT
Thanks for clarification, guys! Now I understand why this issue exists
and will rewrite my code to avoid it.

On Mon, Mar 1, 2010 at 11:49 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
> HBase is not like your typical database. It doesn't overwrite data in
> situ, it doesn't delete data from disk right away, it uses delete
> markers (aka tombstones).  What it does do is keep multiple versions,
> and uses timestamps with millisecond accuracy to discern new and old
> data. When you do rapid put/deletes you can run into the situation:
>
> Put TS=0
> Delete TS=0 (applies to the previous line)
> Put TS=0  (oops, this is being masked by the previous delete marker)
>
> In a production environment, this is less of an issue because of the
> ping time between machines.  The best ping time is about 0.1 ms, then
> there are application processing time and another RPC.
>
> A potential solution is to use microsecond resolution, but this is not
> as easy because it would require serious changes to the data format
> and invalidate old installations.  There may be a solution with clever
> bit-level stuff that can cause two formats to co-exist, but that is
> just speculation.
>
> Another strategy is to set timestamps by hand, but be careful.  You
> might do this:
> long now = System.currentTimeMillis();
> Delete delete = new Delete(row, now);
> // specify what to delete
> table.delete(delete);
> Put = new Put(row, now+1);
> // specify what to put
> table.put(put);
>
> -ryan
>
> On Mon, Mar 1, 2010 at 1:27 PM, Dan Washusen <dan@reactive.org> wrote:
>> Comments inline...
>>
>> On 2 March 2010 08:05, Yura Taras <yura.taras@gmail.com> wrote:
>>
>>> Thanks, Dan
>>>
>>> 2. flush/autoCommit didn't help.
>>>
>>> 1. Do I understand correctly that deleting a row will ensure that I
>>> won't be able to insert data to given row again? IMO it's weird.
>>>
>>
>> Deleted rows are removed when a compaction occurs.  To be clear, you can put
>> the same row again as long as you use a different timestamp.  It does take
>> some getting used to... :)
>>
>>
>>> Anyway, this wouldn't touch me, but I wanted to write unit tests which
>>> use HBase and delete data between runs. Looks like I've got either to
>>> drop/create tables, or to use store data timestamp as value, not
>>> version.
>>>
>>
>> Another option is to delete then
>> call org.apache.hadoop.hbase.client.HBaseAdmin#majorCompact.  However, it's
>> an async operation and it might be hard to track...
>>
>>
>>> BTW, if I change
>>>  table.delete(new Delete("row-id".getBytes()));
>>> to
>>> table.delete(new
>>> Delete("row-id".getBytes()).deleteColumn("f1:c1".getBytes()));
>>> test passes. It fails, if I try to delete family:
>>> table.delete(new
>>> Delete("row-id".getBytes()).deleteFamily("f1".getBytes()));
>>>
>>> On Mon, Mar 1, 2010 at 10:43 PM, Dan Washusen <dan@reactive.org> wrote:
>>> > Hi Yura,
>>> > Having a quick look at your code I can see the following issues;
>>> >
>>> >   1. Deletes are actually just flags to tell HBase that
>>> timestamps/versions
>>> >   of a row are to be deleted eventually.  In your test you are putting
>>> the
>>> >   same version twice with a delete in between.  Instead of calling
>>> >   Put#add(byte[], byte [], long, byte[]) and providing a specific
>>> timestamp
>>> >   call Put#add(byte[], byte [], byte[]).  This way you get a new
>>> >   timestamp/version for each put and your delete call will do what your
>>> >   expecting it to do...
>>> >   2. HTable batches up puts into a buffer and sends them to the server
>>> once
>>> >   the configured buffer size has been reached.  In your test you can
>>> either
>>> >   call org.apache.hadoop.hbase.client.HTable#flushCommits after you do
>>> each
>>> >   put or turn on auto flush
>>> >   (org.apache.hadoop.hbase.client.HTable#setAutoFlush).
>>> >
>>> > Cheers,
>>> > Dan
>>> >
>>> > On 2 March 2010 06:44, Yura Taras <yura.taras@gmail.com> wrote:
>>> >
>>> >> Hi all
>>> >>
>>> >> I'm learning HBase and I faced a following problem. I'm running HBase
>>> >> 0.20.3 on Windows+Cygwin (just bin/start-hbase.sh). I'm creating
>>> >> simple table in a shell:
>>> >> hbase(main):033:0> create 't1', 'f1'
>>> >> 0 row(s) in 2.0630 seconds
>>> >>
>>> >> Then I'm trying to execute following JUnit test - and it fails on last
>>> >> assert. Literally - it inserts a value, deletes a row and inserts
>>> >> value to same id again. When it tries to Get data for given row, null
>>> >> is returned. When I try to rerun test again, it fails on first assert.
>>> >> To fix it I have to run 'truncate 't1'' in my shell.
>>> >> I can't believe there's a bug in such straightforward use case, so
>>> >> either I do something wrong in a code or there's a problem with my
>>> >> configuration.
>>> >>
>>> >> Thanks.
>>> >> JUnit test:
>>> >>    @Test
>>> >>    public void sequentUses() throws IOException {
>>> >>        HTable table = pool.getTable("t1");
>>> >>        try {
>>> >>            Put put = new Put("row-id".getBytes());
>>> >>            put.add("f1".getBytes(), "c1".getBytes(), 1L,
>>> "v1".getBytes());
>>> >>            table.put(put);
>>> >>
>>> >>            Get get = new Get("row-id".getBytes());
>>> >>            get.setMaxVersions(Integer.MAX_VALUE);
>>> >>            Result res = table.get(get);
>>> >>            assertNotNull(res.list());
>>> >>            table.delete(new Delete("row-id".getBytes()));
>>> >>
>>> >>
>>> >>
>>> >>            put = new Put("row-id".getBytes());
>>> >>            put.add("f1".getBytes(), "c1".getBytes(), 1L,
>>> "v2".getBytes());
>>> >>            table.put(put);
>>> >>
>>> >>            res = table.get(get);
>>> >>            assertNotNull(res.list());
>>> >>        } finally {
>>> >>            pool.putTable(table);
>>> >>        }
>>> >>    }
>>> >>
>>> >
>>>
>>
>

Mime
View raw message