phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Taylor <jamestay...@apache.org>
Subject Re: Help: setting hbase row timestamp in phoenix upserts ?
Date Wed, 11 Jul 2018 14:49:13 GMT
I think the answer is PHOENIX-4552. There's an outline of the work involved
on the JIRA. I think passing through data like that for hints would get
unwieldy quickly.

On Tue, Jul 10, 2018 at 1:31 PM, Pedro Boado <pedro.boado@gmail.com> wrote:

> Hi guys, just a refloat from the @user list.
>
> May it be of interest having this functionality for defining HBase
> timestamps in a per row basis as part of an UPSERT VALUES?
>
> For a table defined as
> CREATE TABLE T0001 ( k VARCHAR PRIMARY KEY, v INTEGER)
>
> Allow a hint to extract and override hbase put timestamp through a
> "virtual" column?
> UPSERT /*+ ROW_TIMESTAMP(ts) */ INTO T0001(k,v,ts) VALUES
> ('a',1, 1531253959043)
>
> If the column existed and had appropiate type it would also be populated
> with the same value.
>
> Thanks,
> Pedro.
>
>
> On Fri, 1 Dec 2017 at 07:15, James Taylor <jamestaylor@apache.org> wrote:
>
> > The only way I can think of accomplishing this is by using the raw HBase
> > APIs to write the data but using our utilities to write it in a Phoenix
> > compatible manner. For example, you could run an UPSERT VALUES statement,
> > use the PhoenixRuntime.getUncommittedDataIterator()method to get the
> Cells
> > that would have been written, update the Cell timestamp as needed, and do
> > an htable.batch() call to commit them.
> >
> > On Wed, Nov 29, 2017 at 11:46 AM Pedro Boado <pedro.boado@gmail.com>
> > wrote:
> >
> >> Hi,
> >>
> >> I'm looking for a little bit of help trying to get some light over
> >> ROW_TIMESTAMP.
> >>
> >> Some background over the problem ( simplified ) : I'm working in a
> >> project that needs to create a "enriched" replica of a RBDMS table
> based on
> >> a stream of cdc changes off that table.
> >>
> >> Each cdc event contains the timestamp of the change plus all the column
> >> values 'before' and 'after' the change . And each event is pushed to a
> >> kafka topic.  Because of certain "non-negotiable" design decisions kafka
> >> guarantees delivering each event at least once, but doesn't guarantee
> >> ordering for changes over the same row in the source table.
> >>
> >> The final step of the kafka-based flow is sinking the information into
> >> HBase/Phoenix.
> >>
> >> As I cannot get in order delivery guarantee from Kafka I need to use the
> >> cdc event timestamp to ensure that HBase keeps the latest change over a
> row.
> >>
> >> This fits perfectly well with an HBase table design with VERSIONS=1 and
> >> using the source event timestamp as HBase row/cells timestamp
> >>
> >> The thing is that I cannot find a way to define the value of the HBase
> >> cell from a Phoenix upsert.
> >>
> >> I came across the ROW_TIMESTAMP functionality, but I've just found ( I'm
> >> devastated now ) that the ROW_TIMESTAMP columns store the date in both
> >> hbase's cell timestamp and in the primary key, meaning that I cannot
> >> leverage that functionality to keep only the latest change.
> >>
> >> Is there a way of defining hbase's row timestamp when doing the UPSERT -
> >> even by setting it through some obscure hidden jdbc property - ?
> >>
> >> I want to avoid by all means doing a checkAndPut as the volume of
> changes
> >> is going to be quite bug.
> >>
> >>
> >>
> >> --
> >> Un saludo.
> >> Pedro Boado.
> >>
> >
>
> --
> Un saludo.
> Pedro Boado.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message