From dev-return-52935-archive-asf-public=cust-asf.ponee.io@phoenix.apache.org Wed Jul 11 16:49:16 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 2774A18062A for ; Wed, 11 Jul 2018 16:49:15 +0200 (CEST) Received: (qmail 29590 invoked by uid 500); 11 Jul 2018 14:49:15 -0000 Mailing-List: contact dev-help@phoenix.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@phoenix.apache.org Delivered-To: mailing list dev@phoenix.apache.org Received: (qmail 29576 invoked by uid 99); 11 Jul 2018 14:49:15 -0000 Received: from mail-relay.apache.org (HELO mailrelay1-lw-us.apache.org) (207.244.88.152) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Jul 2018 14:49:15 +0000 Received: from mail-oi0-f41.google.com (mail-oi0-f41.google.com [209.85.218.41]) by mailrelay1-lw-us.apache.org (ASF Mail Server at mailrelay1-lw-us.apache.org) with ESMTPSA id 285FB6C5 for ; Wed, 11 Jul 2018 14:49:14 +0000 (UTC) Received: by mail-oi0-f41.google.com with SMTP id v8-v6so49745186oie.5 for ; Wed, 11 Jul 2018 07:49:13 -0700 (PDT) X-Gm-Message-State: APt69E3LgeihRICCPaBGpZCexECcwTxsuR7nvQ98T9KSYqOzSVabzFHx 3N9onGMmKrpYpierKBxsXbeoizcYqLjNDUbQTfs= X-Google-Smtp-Source: AAOMgpcpVHgHfQaWnuaszexVQ5CH/QUIZckMmCLg4nmTjKhA5TOM03w9IQ5JlDHVXodX8Ha+/HmB+hNBTvTvwWKWwNA= X-Received: by 2002:aca:4286:: with SMTP id p128-v6mr30124615oia.242.1531320553472; Wed, 11 Jul 2018 07:49:13 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a4a:2b09:0:0:0:0:0 with HTTP; Wed, 11 Jul 2018 07:49:13 -0700 (PDT) In-Reply-To: References: From: James Taylor Date: Wed, 11 Jul 2018 07:49:13 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Help: setting hbase row timestamp in phoenix upserts ? To: dev@phoenix.apache.org Content-Type: multipart/alternative; boundary="000000000000a08a2e0570ba5992" --000000000000a08a2e0570ba5992 Content-Type: text/plain; charset="UTF-8" I think the answer is PHOENIX-4552. There's an outline of the work involved on the JIRA. I think passing through data like that for hints would get unwieldy quickly. On Tue, Jul 10, 2018 at 1:31 PM, Pedro Boado wrote: > Hi guys, just a refloat from the @user list. > > May it be of interest having this functionality for defining HBase > timestamps in a per row basis as part of an UPSERT VALUES? > > For a table defined as > CREATE TABLE T0001 ( k VARCHAR PRIMARY KEY, v INTEGER) > > Allow a hint to extract and override hbase put timestamp through a > "virtual" column? > UPSERT /*+ ROW_TIMESTAMP(ts) */ INTO T0001(k,v,ts) VALUES > ('a',1, 1531253959043) > > If the column existed and had appropiate type it would also be populated > with the same value. > > Thanks, > Pedro. > > > On Fri, 1 Dec 2017 at 07:15, James Taylor wrote: > > > The only way I can think of accomplishing this is by using the raw HBase > > APIs to write the data but using our utilities to write it in a Phoenix > > compatible manner. For example, you could run an UPSERT VALUES statement, > > use the PhoenixRuntime.getUncommittedDataIterator()method to get the > Cells > > that would have been written, update the Cell timestamp as needed, and do > > an htable.batch() call to commit them. > > > > On Wed, Nov 29, 2017 at 11:46 AM Pedro Boado > > wrote: > > > >> Hi, > >> > >> I'm looking for a little bit of help trying to get some light over > >> ROW_TIMESTAMP. > >> > >> Some background over the problem ( simplified ) : I'm working in a > >> project that needs to create a "enriched" replica of a RBDMS table > based on > >> a stream of cdc changes off that table. > >> > >> Each cdc event contains the timestamp of the change plus all the column > >> values 'before' and 'after' the change . And each event is pushed to a > >> kafka topic. Because of certain "non-negotiable" design decisions kafka > >> guarantees delivering each event at least once, but doesn't guarantee > >> ordering for changes over the same row in the source table. > >> > >> The final step of the kafka-based flow is sinking the information into > >> HBase/Phoenix. > >> > >> As I cannot get in order delivery guarantee from Kafka I need to use the > >> cdc event timestamp to ensure that HBase keeps the latest change over a > row. > >> > >> This fits perfectly well with an HBase table design with VERSIONS=1 and > >> using the source event timestamp as HBase row/cells timestamp > >> > >> The thing is that I cannot find a way to define the value of the HBase > >> cell from a Phoenix upsert. > >> > >> I came across the ROW_TIMESTAMP functionality, but I've just found ( I'm > >> devastated now ) that the ROW_TIMESTAMP columns store the date in both > >> hbase's cell timestamp and in the primary key, meaning that I cannot > >> leverage that functionality to keep only the latest change. > >> > >> Is there a way of defining hbase's row timestamp when doing the UPSERT - > >> even by setting it through some obscure hidden jdbc property - ? > >> > >> I want to avoid by all means doing a checkAndPut as the volume of > changes > >> is going to be quite bug. > >> > >> > >> > >> -- > >> Un saludo. > >> Pedro Boado. > >> > > > > -- > Un saludo. > Pedro Boado. > --000000000000a08a2e0570ba5992--