phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pedro Boado <pedro.bo...@gmail.com>
Subject Re: Help: setting hbase row timestamp in phoenix upserts ?
Date Tue, 10 Jul 2018 20:31:21 GMT
Hi guys, just a refloat from the @user list.

May it be of interest having this functionality for defining HBase
timestamps in a per row basis as part of an UPSERT VALUES?

For a table defined as
CREATE TABLE T0001 ( k VARCHAR PRIMARY KEY, v INTEGER)

Allow a hint to extract and override hbase put timestamp through a
"virtual" column?
UPSERT /*+ ROW_TIMESTAMP(ts) */ INTO T0001(k,v,ts) VALUES
('a',1, 1531253959043)

If the column existed and had appropiate type it would also be populated
with the same value.

Thanks,
Pedro.


On Fri, 1 Dec 2017 at 07:15, James Taylor <jamestaylor@apache.org> wrote:

> The only way I can think of accomplishing this is by using the raw HBase
> APIs to write the data but using our utilities to write it in a Phoenix
> compatible manner. For example, you could run an UPSERT VALUES statement,
> use the PhoenixRuntime.getUncommittedDataIterator()method to get the Cells
> that would have been written, update the Cell timestamp as needed, and do
> an htable.batch() call to commit them.
>
> On Wed, Nov 29, 2017 at 11:46 AM Pedro Boado <pedro.boado@gmail.com>
> wrote:
>
>> Hi,
>>
>> I'm looking for a little bit of help trying to get some light over
>> ROW_TIMESTAMP.
>>
>> Some background over the problem ( simplified ) : I'm working in a
>> project that needs to create a "enriched" replica of a RBDMS table based on
>> a stream of cdc changes off that table.
>>
>> Each cdc event contains the timestamp of the change plus all the column
>> values 'before' and 'after' the change . And each event is pushed to a
>> kafka topic.  Because of certain "non-negotiable" design decisions kafka
>> guarantees delivering each event at least once, but doesn't guarantee
>> ordering for changes over the same row in the source table.
>>
>> The final step of the kafka-based flow is sinking the information into
>> HBase/Phoenix.
>>
>> As I cannot get in order delivery guarantee from Kafka I need to use the
>> cdc event timestamp to ensure that HBase keeps the latest change over a row.
>>
>> This fits perfectly well with an HBase table design with VERSIONS=1 and
>> using the source event timestamp as HBase row/cells timestamp
>>
>> The thing is that I cannot find a way to define the value of the HBase
>> cell from a Phoenix upsert.
>>
>> I came across the ROW_TIMESTAMP functionality, but I've just found ( I'm
>> devastated now ) that the ROW_TIMESTAMP columns store the date in both
>> hbase's cell timestamp and in the primary key, meaning that I cannot
>> leverage that functionality to keep only the latest change.
>>
>> Is there a way of defining hbase's row timestamp when doing the UPSERT -
>> even by setting it through some obscure hidden jdbc property - ?
>>
>> I want to avoid by all means doing a checkAndPut as the volume of changes
>> is going to be quite bug.
>>
>>
>>
>> --
>> Un saludo.
>> Pedro Boado.
>>
>

-- 
Un saludo.
Pedro Boado.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message