hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: BatchUpdate
Date Mon, 22 Sep 2008 22:40:11 GMT
Slava Gorelik wrote:
> Yes, exactly what i'm trying to implement by myself, but in the link i
> didn't found any notification in which version this functionality will be
> implemented.
>   
Slava:  The issue doesn't have a version nor person assigned so won't be 
done till someone takes up the cause.

> P.S What i'm trying to implement is the same, but if i'll work with HTable i
> will consume much more RPC than if i'll do it directly in HRegionServer.
>   

Agreed.  Anything in the related issue, hbase-803, that you might work 
with getting a patch together either for yourself or to apply to hbase?

Thanks,
St.Ack

> On Thu, Sep 18, 2008 at 9:38 PM, Billy Pearson
> <sales@pearsonwholesale.com>wrote:
>
>   
>> I thank what you are looking for is here
>> HBASE-493
>> https://issues.apache.org/jira/browse/HBASE-493
>>
>> Billy Pearson
>>
>> "Slava Gorelik" <slava.gorelik@gmail.com> wrote in message
>> news:fdc46e690809181053l1a14459fv55389f6c564cfd46@mail.gmail.com...
>>
>>  Hi.Thank You for a quick response.
>>     
>>> About question 3, i want to clarify my self:
>>> For example, i have a row that i need to update (latest one), i read the
>>> row, proceed some operations on some cells and now i want to update,
>>> before
>>> i'm going to update i want to check may be another user (application
>>> instance) already changed this specific row and my update will written
>>> over
>>> his changes, that will lead to loose his data. So avoid this i want to
>>> check
>>> i row (specific cells) that i'm going to update has the same timestamp
>>> that
>>> i hold and nobody changed them.
>>>
>>> Best Regards.
>>>
>>>
>>> On Thu, Sep 18, 2008 at 7:50 PM, Jean-Daniel Cryans <jdcryans@apache.org
>>>       
>>>> wrote:
>>>>         
>>>  Slava,
>>>       
>>>> Answers in-line.
>>>>
>>>> J-D
>>>>
>>>> On Wed, Sep 17, 2008 at 2:49 PM, Slava Gorelik <slava.gorelik@gmail.com
>>>>         
>>>>> wrote:
>>>>>           
>>>>> Hi.Few small questions:
>>>>> 1) BatchUpdate.*getTimestamp<
>>>>>
>>>>>           
>>>> http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/io/BatchUpdate.html#getTimestamp()
>>>> <
>>>>
>>>> http://hadoop.apache.org/hbase/docs/current/api/org/apache/hadoop/hbase/io/BatchUpdate.html#getTimestamp%28%29
>>>>         
>>>>> *() - If i understand correct, this method should return the timestamp
>>>>>           
>>>> that
>>>>         
>>>>> row will be committed with.
>>>>>  But how the BatchUpdate will now the timestamp ? Isn't this timestamp
>>>>> should be only known after the row is written ?
>>>>>  Any way, the value returned is always the same and not correct.
>>>>>           
>>>> If you do not specify a timestamp, the value returned will be
>>>> HConstants.LATEST_TIMESTAMP which is Long.MAX_VALUE. HBase interprets
>>>> this
>>>> as "if BU.timestamp = LATEST_TIMESTAMP, replace it with current
>>>> timestamp".
>>>> The timestamp returned will be different if you created the BatchUpdate
>>>> with
>>>> a specified timestamp, see my answer to your second question.
>>>>
>>>>
>>>>         
>>>>> 2) Delete Cell - i saw in the FAQ that need to add a delete record and
>>>>> commit it with exactly the same timestamp like the original
>>>>>   row, but i didn't found any commit method with timestamp.
>>>>>           
>>>> See the BatchUpdate
>>>> constructor<
>>>>
>>>> http://hadoop.apache.org/hbase/docs/r0.2.1/api/org/apache/hadoop/hbase/io/BatchUpdate.html#BatchUpdate%28java.lang.String,%20long%29
>>>>         
>>>>> that
>>>>>           
>>>> uses a timestamp.
>>>>
>>>>
>>>>         
>>>>> 3) For my update operation i need to check if the row that my >
>>>>>           
>>>> application
>>>>         
>>>>> holds is still contains most recent data and only in this
>>>>>   case i'll update some cells, to do this i need to lock the row ->
>
>>>>>           
>>>> check
>>>>         
>>>>> the timestamp of the particular cell -> update it if
>>>>>   timestamp is the same that application holds. All those operation,
if
>>>>> they are perform on HTable will be perform by numbers of
>>>>>   RPC. I think, if it's possible to do those operation directly on
>>>>> HRegsionServer, will help me to get rid off all extra RPCs. Is
>>>>>   there some way to work with specific HRegionServer that row is >
>>>>>           
>>>> belongs
>>>> to
>>>>         
>>>>> it ? If yes - how can i get the HRegionServer for this
>>>>>   specific row.
>>>>>           
>>>> It is best to abstract how HBase works in client or this could be a mess.
>>>> For example, you would have to reimplement the finding of a region server
>>>> for a region, with retries. Instead of updating by deleting/inserting,
>>>> you
>>>> should just do a put so it will be inserted with current timestamp and,
>>>> by
>>>> default, HBase retrieves the cell with the latest timestamp for a get or
>>>> a
>>>> scan. How HBase works is very different from your typical RDBMS ;)
>>>>
>>>>
>>>>         
>>>>>
>>>>> Thank You and Best Regards.
>>>>> Slava.
>>>>>
>>>>>           
>>>>         
>>     
>
>   


Mime
View raw message