hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Slava Gorelik" <slava.gore...@gmail.com>
Subject Re: BatchUpdate
Date Thu, 18 Sep 2008 19:00:04 GMT
Yes, exactly what i'm trying to implement by myself, but in the link i
didn't found any notification in which version this functionality will be
implemented.
P.S What i'm trying to implement is the same, but if i'll work with HTable i
will consume much more RPC than if i'll do it directly in HRegionServer.

On Thu, Sep 18, 2008 at 9:38 PM, Billy Pearson
<sales@pearsonwholesale.com>wrote:

> I thank what you are looking for is here
> HBASE-493
> https://issues.apache.org/jira/browse/HBASE-493
>
> Billy Pearson
>
> "Slava Gorelik" <slava.gorelik@gmail.com> wrote in message
> news:fdc46e690809181053l1a14459fv55389f6c564cfd46@mail.gmail.com...
>
>  Hi.Thank You for a quick response.
>> About question 3, i want to clarify my self:
>> For example, i have a row that i need to update (latest one), i read the
>> row, proceed some operations on some cells and now i want to update,
>> before
>> i'm going to update i want to check may be another user (application
>> instance) already changed this specific row and my update will written
>> over
>> his changes, that will lead to loose his data. So avoid this i want to
>> check
>> i row (specific cells) that i'm going to update has the same timestamp
>> that
>> i hold and nobody changed them.
>>
>> Best Regards.
>>
>>
>> On Thu, Sep 18, 2008 at 7:50 PM, Jean-Daniel Cryans <jdcryans@apache.org
>> >wrote:
>>
>>  Slava,
>>>
>>> Answers in-line.
>>>
>>> J-D
>>>
>>> On Wed, Sep 17, 2008 at 2:49 PM, Slava Gorelik <slava.gorelik@gmail.com
>>> >wrote:
>>>
>>> > Hi.Few small questions:
>>> > 1) BatchUpdate.*getTimestamp<
>>> >
>>>
>>> http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/io/BatchUpdate.html#getTimestamp()
>>> <
>>>
>>> http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/io/BatchUpdate.html#getTimestamp%28%29
>>> >
>>> > >
>>> > *() - If i understand correct, this method should return the timestamp
>>> that
>>> > row will be committed with.
>>> >  But how the BatchUpdate will now the timestamp ? Isn't this timestamp
>>> > should be only known after the row is written ?
>>> >  Any way, the value returned is always the same and not correct.
>>>
>>>
>>> If you do not specify a timestamp, the value returned will be
>>> HConstants.LATEST_TIMESTAMP which is Long.MAX_VALUE. HBase interprets
>>> this
>>> as "if BU.timestamp = LATEST_TIMESTAMP, replace it with current
>>> timestamp".
>>> The timestamp returned will be different if you created the BatchUpdate
>>> with
>>> a specified timestamp, see my answer to your second question.
>>>
>>>
>>> >
>>> >
>>> > 2) Delete Cell - i saw in the FAQ that need to add a delete record and
>>> > commit it with exactly the same timestamp like the original
>>> >   row, but i didn't found any commit method with timestamp.
>>>
>>>
>>> See the BatchUpdate
>>> constructor<
>>>
>>> http://hadoop.apache.org/hbase/docs/r0.2.1/api/org/apache/hadoop/hbase/io/BatchUpdate.html#BatchUpdate%28java.lang.String,%20long%29
>>> >that
>>> uses a timestamp.
>>>
>>>
>>> >
>>> >
>>> > 3) For my update operation i need to check if the row that my >
>>> application
>>> > holds is still contains most recent data and only in this
>>> >   case i'll update some cells, to do this i need to lock the row -> >
>>> check
>>> > the timestamp of the particular cell -> update it if
>>> >   timestamp is the same that application holds. All those operation, if
>>> > they are perform on HTable will be perform by numbers of
>>> >   RPC. I think, if it's possible to do those operation directly on
>>> > HRegsionServer, will help me to get rid off all extra RPCs. Is
>>> >   there some way to work with specific HRegionServer that row is >
>>> belongs
>>> to
>>> > it ? If yes - how can i get the HRegionServer for this
>>> >   specific row.
>>>
>>>
>>> It is best to abstract how HBase works in client or this could be a mess.
>>> For example, you would have to reimplement the finding of a region server
>>> for a region, with retries. Instead of updating by deleting/inserting,
>>> you
>>> should just do a put so it will be inserted with current timestamp and,
>>> by
>>> default, HBase retrieves the cell with the latest timestamp for a get or
>>> a
>>> scan. How HBase works is very different from your typical RDBMS ;)
>>>
>>>
>>> >
>>> >
>>> >
>>> > Thank You and Best Regards.
>>> > Slava.
>>> >
>>>
>>>
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message