From dev-return-684-archive-asf-public=cust-asf.ponee.io@hudi.apache.org Tue Jun 11 16:19:56 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 8EC92180627 for ; Tue, 11 Jun 2019 18:19:56 +0200 (CEST) Received: (qmail 23269 invoked by uid 500); 11 Jun 2019 16:19:56 -0000 Mailing-List: contact dev-help@hudi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hudi.apache.org Delivered-To: mailing list dev@hudi.apache.org Received: (qmail 23254 invoked by uid 99); 11 Jun 2019 16:19:55 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Jun 2019 16:19:55 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 3E226C08CA for ; Tue, 11 Jun 2019 16:19:55 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.05 X-Spam-Level: ** X-Spam-Status: No, score=2.05 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id yg9qa1F6fxB9 for ; Tue, 11 Jun 2019 16:19:53 +0000 (UTC) Received: from mail-it1-f178.google.com (mail-it1-f178.google.com [209.85.166.178]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id C8B0D5F5B9 for ; Tue, 11 Jun 2019 16:19:52 +0000 (UTC) Received: by mail-it1-f178.google.com with SMTP id m3so5937834itl.1 for ; Tue, 11 Jun 2019 09:19:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=ZA1wDTYTON8kteu06LjaHwyS6clFcNHKpOIdPQS35Bs=; b=NiozNQZcll+Re9m6vRuVEN8cDtUVmTWPG5jIKPOaOYgBIfS+FInXbWjvG90pvtST4u gZoMHN1R81xidmtyIoYUG61w8N1G5OZhyUF/M2XgcXEcqqqdySTCIQWc0aQa0rSmWYWb g7XnKDibOze9lGRs0zypWZnN9FWbZteTMfPfVqmdJq+/OKjj0mqKtbd7eajEKFcGNK3s T7mV7WFZ0bnsI8FeIcS2p7myZiHnNAxCvKAlIursKZ5XVvBXAOVuXKylA99oHf15wYFq qgvUHHukubPiuRpXbYiGFq3YP5wSIC9JH+PiaMdw0ZV+x2qYmh5BEAMAxupKjqUv8YWj b7Kg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=ZA1wDTYTON8kteu06LjaHwyS6clFcNHKpOIdPQS35Bs=; b=HNQ/pb+oUNQ4fhAB6GhmuQiFmp2aDMJZa7HNwMVgpQ77qNc2ZrFtRNvacm1WrVyIfR aUHyZ6ooAymxXjXQrPfrPeRv5AXOUFdrISZHP7zkjpO5/UhYRdqzBKt1nX0ZwQ+LqqmN 58zZMWu1sb4kBOJwIxDlkcVEXaQbRRlm3JjOkJDslPOdD5C9ySEtcHZxFh7CIy0OVjt1 MtrUpHhe6iLRmo4LUTN6nQXU02L0QA7d/M708rh/dBw83KjcE6NfcRtux/rPs1mi/6f8 TNVsbS8kU2xEGQKXZxMNP3UBEiwZ0VKWo3w+PQlj9YcCFv0MXcIS8PBP8Q3/60e2m+gS bn/Q== X-Gm-Message-State: APjAAAV38MykkEO4AoCB7RCWtAQaBWd2qTZL3K+mYMrAO/YzKViwzwfo KDJYIZizkl/4sV4Q7Guw+XaeOe9u9eOPSqPAsY2YaA== X-Google-Smtp-Source: APXvYqxY346fOCgYAJEuOJ7ZMFmnBYbgbUx3t+i5UH3mOy+ruzHbeCRax9XP+NLkN7cT7Gx7pxROxh1M8du4DRt3Mcc= X-Received: by 2002:a24:6812:: with SMTP id v18mr20545568itb.26.1560269991393; Tue, 11 Jun 2019 09:19:51 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a02:5e4a:0:0:0:0:0 with HTTP; Tue, 11 Jun 2019 09:19:50 -0700 (PDT) In-Reply-To: References: From: Jaimin Shah Date: Tue, 11 Jun 2019 21:49:50 +0530 Message-ID: Subject: Re: Possible ambiguity in HoodieKey To: "dev@hudi.apache.org" Content-Type: multipart/alternative; boundary="000000000000973fc7058b0eaa96" --000000000000973fc7058b0eaa96 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sure I will update once I make those changes thanks. On Tuesday, 11 June 2019, Vinoth Chandar wrote: > Thanks for the link. I was grabbing in parallel as well :) > > So, the KeyGenerator class works off a GenericRecord and JSON-> > GenericRecord is already built in to the DeltaStreamer. > I don't think this will add any particular performance overhead per se. > > It may be worth pulling this class (+ tweaked to suit your needs) out of > the PR and merge it into master? > > > > On Tue, Jun 11, 2019 at 9:11 AM Jaimin Shah > wrote: > > > Forgot to add class link > > > > https://github.com/apache/incubator-hudi/blob/ > e916b21cc5989ab00791467fcc11a02bb0de093a/hoodie-bench/src/ > main/java/com/uber/hoodie/integrationsuite/generator/ > ComplexKeyGenerator.java > > This is the class I am referring to. > > > > On Tuesday, 11 June 2019, Jaimin Shah wrote: > > > > > Hi Vinoth, > > > > > > Thanks for the prompt reply. This class was shared earlier on the > > > mailing list by someone to handle complex key. I was thinking maybe w= e > > can > > > create a Jason object and then parse it as string to create key then = it > > > will be full proof because we don=E2=80=99t control the characters in= the input > > > data. > > > > > > I am not sure about the performance implications of doing so maybe > you > > > can help there. > > > > > > Thanks, > > > Jaimin > > > > > > On Tuesday, 11 June 2019, Vinoth Chandar wrote: > > > > > >> Hi Jaimin, > > >> > > >> True. Is this a custom class you have? if we separate the > concatenation > > by > > >> a standard special character, it should be fine? for e.g CA#US, > C#AUS > > ? > > >> > > >> Thanks > > >> Vinoth > > >> > > >> On Mon, Jun 10, 2019 at 4:53 AM Jaimin Shah > > > >> wrote: > > >> > > >> > Hi > > >> > I was going through the ComplexKeyGenerator class. I found that > the > > >> class > > >> > generates key by concatenating the all keys to make compound key. > But > > I > > >> am > > >> > wondering that some cases can arise later which can create problem= s. > > >> > > > >> > For example our data has 2 attributes as key > > >> > key1 key2 data > > >> > CA US xyz > > >> > C AUS abc > > >> > > > >> > In this case key for both rows will be same will it cause any > problem? > > >> > Instead of keeping keys as string keeping them as map will solve t= he > > >> > problem? > > >> > > > >> > Thanks > > >> > > > >> > > > > > > --000000000000973fc7058b0eaa96--