From dev-return-683-archive-asf-public=cust-asf.ponee.io@hudi.apache.org Tue Jun 11 16:16:03 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 6CF05180627 for ; Tue, 11 Jun 2019 18:16:03 +0200 (CEST) Received: (qmail 19357 invoked by uid 500); 11 Jun 2019 16:16:02 -0000 Mailing-List: contact dev-help@hudi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hudi.apache.org Delivered-To: mailing list dev@hudi.apache.org Received: (qmail 19346 invoked by uid 99); 11 Jun 2019 16:16:02 -0000 Received: from Unknown (HELO mailrelay1-lw-us.apache.org) (10.10.3.159) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Jun 2019 16:16:02 +0000 Received: from mail-wr1-f52.google.com (mail-wr1-f52.google.com [209.85.221.52]) by mailrelay1-lw-us.apache.org (ASF Mail Server at mailrelay1-lw-us.apache.org) with ESMTPSA id 77D678B3C for ; Tue, 11 Jun 2019 16:16:02 +0000 (UTC) Received: by mail-wr1-f52.google.com with SMTP id p13so3615568wru.10 for ; Tue, 11 Jun 2019 09:16:02 -0700 (PDT) X-Gm-Message-State: APjAAAWpoyqShlCFPhPZhc/zJrppf8yrj62L/isHmbMbugsyCkqYPvaj 2lKfkV39JJ5OLiPESXV6pVIds80dtEsDt3o+F3c= X-Google-Smtp-Source: APXvYqxxbrGFXREhJBrsFfU/+6LSBB1sdN7ml+JQ3izxEs+yT2OhcwukYAnNzzVHeHMfwz6Gx7UcT5BwZOOsmrNsRq4= X-Received: by 2002:a5d:4302:: with SMTP id h2mr23188161wrq.137.1560269761690; Tue, 11 Jun 2019 09:16:01 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Vinoth Chandar Date: Tue, 11 Jun 2019 09:15:50 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Possible ambiguity in HoodieKey To: dev@hudi.apache.org Content-Type: multipart/alternative; boundary="000000000000e64276058b0e9caa" --000000000000e64276058b0e9caa Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks for the link. I was grabbing in parallel as well :) So, the KeyGenerator class works off a GenericRecord and JSON-> GenericRecord is already built in to the DeltaStreamer. I don't think this will add any particular performance overhead per se. It may be worth pulling this class (+ tweaked to suit your needs) out of the PR and merge it into master? On Tue, Jun 11, 2019 at 9:11 AM Jaimin Shah wrote: > Forgot to add class link > > https://github.com/apache/incubator-hudi/blob/e916b21cc5989ab00791467fcc1= 1a02bb0de093a/hoodie-bench/src/main/java/com/uber/hoodie/integrationsuite/g= enerator/ComplexKeyGenerator.java > This is the class I am referring to. > > On Tuesday, 11 June 2019, Jaimin Shah wrote: > > > Hi Vinoth, > > > > Thanks for the prompt reply. This class was shared earlier on the > > mailing list by someone to handle complex key. I was thinking maybe we > can > > create a Jason object and then parse it as string to create key then it > > will be full proof because we don=E2=80=99t control the characters in t= he input > > data. > > > > I am not sure about the performance implications of doing so maybe yo= u > > can help there. > > > > Thanks, > > Jaimin > > > > On Tuesday, 11 June 2019, Vinoth Chandar wrote: > > > >> Hi Jaimin, > >> > >> True. Is this a custom class you have? if we separate the concatenatio= n > by > >> a standard special character, it should be fine? for e.g CA#US, C#AU= S > ? > >> > >> Thanks > >> Vinoth > >> > >> On Mon, Jun 10, 2019 at 4:53 AM Jaimin Shah > >> wrote: > >> > >> > Hi > >> > I was going through the ComplexKeyGenerator class. I found that th= e > >> class > >> > generates key by concatenating the all keys to make compound key. Bu= t > I > >> am > >> > wondering that some cases can arise later which can create problems. > >> > > >> > For example our data has 2 attributes as key > >> > key1 key2 data > >> > CA US xyz > >> > C AUS abc > >> > > >> > In this case key for both rows will be same will it cause any proble= m? > >> > Instead of keeping keys as string keeping them as map will solve the > >> > problem? > >> > > >> > Thanks > >> > > >> > > > --000000000000e64276058b0e9caa--