From user-return-1620-archive-asf-public=cust-asf.ponee.io@kudu.apache.org  Wed Mar  6 09:14:14 2019
Return-Path: <user-return-1620-archive-asf-public=cust-asf.ponee.io@kudu.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 7981C180656
	for <archive-asf-public@cust-asf.ponee.io>; Wed,  6 Mar 2019 10:14:13 +0100 (CET)
Received: (qmail 36056 invoked by uid 500); 6 Mar 2019 09:14:12 -0000
Mailing-List: contact user-help@kudu.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:user-help@kudu.apache.org>
List-Unsubscribe: <mailto:user-unsubscribe@kudu.apache.org>
List-Post: <mailto:user@kudu.apache.org>
List-Id: <user.kudu.apache.org>
Reply-To: user@kudu.apache.org
Delivered-To: mailing list user@kudu.apache.org
Received: (qmail 36043 invoked by uid 99); 6 Mar 2019 09:14:11 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Mar 2019 09:14:11 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id DB84EC23C3
	for <user@kudu.apache.org>; Wed,  6 Mar 2019 09:14:10 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: 1.81
X-Spam-Level: *
X-Spam-Status: No, score=1.81 tagged_above=-999 required=6.31
	tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
	DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001,
	SPF_PASS=-0.001, T_REMOTE_IMAGE=0.01, URIBL_BLOCKED=0.001]
	autolearn=disabled
Authentication-Results: spamd4-us-west.apache.org (amavisd-new);
	dkim=pass (2048-bit key) header.d=cloudera.com
Received: from mx1-lw-eu.apache.org ([10.40.0.8])
	by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024)
	with ESMTP id fCZ1h8XiqnbI for <user@kudu.apache.org>;
	Wed,  6 Mar 2019 09:14:08 +0000 (UTC)
Received: from mail-pf1-f182.google.com (mail-pf1-f182.google.com [209.85.210.182])
	by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 104D361108
	for <user@kudu.apache.org>; Wed,  6 Mar 2019 09:07:33 +0000 (UTC)
Received: by mail-pf1-f182.google.com with SMTP id n125so8067737pfn.5
        for <user@kudu.apache.org>; Wed, 06 Mar 2019 01:07:32 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=cloudera.com; s=google;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to;
        bh=yrWhuV8GY+rNLq0lPwHVG/SwPyoMt42vDoF+l+Gs3hA=;
        b=KepDnWiG3cfQ9TVOxRL9QEQfq5DwhyWFofjJYgf47kYnkmd528/ubED/Z9n+0462C6
         6KzgFRie38OLuG60yPax/EsNh0/NEPmF1cad3xlLVcNmW/kIEr0vBUmmc/mQOCONAXzG
         65257cV7aCQtlHNS1eBHmBbGrbb0RoloFqwunA4p60LAj1sL4JeCmow3LjAxRlWtMdmt
         JR+UoQ41WVymnG/EO5ZTKXtFXBKLe76wnKcgBkb57nvRdrFsC+LI7fjp5SWL0NWFMS/c
         m/kQKVJkkZ+nn9o22Q7TdTNk6Xy5+TOY34ouYNvplGwlz980hNWvD2z1Hl7VRFG1WAUn
         a/CQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to;
        bh=yrWhuV8GY+rNLq0lPwHVG/SwPyoMt42vDoF+l+Gs3hA=;
        b=EDVUSSxVdI87AR1FOskvwCPx2iOMPMqyfIA6VQOr6lo9OF84d5/j5PP/Ziq/vkkuYF
         rm71aom69JuJDJv0XMWviTcKVTV1GuBTQ58Ju0QCGWcQYuiPLldzFj+BNmV35UuVx3xE
         YzVyaNuNLby0eEz6JFsZ1lM+MMB0CGibnS/kzuuXHYNZIx/jOpIVrqUC/LJM5jqasY4H
         GVSOymgsY6LWL5QmVwCp3GAQVQ/Lc9HkcAn4o8QPeacdx/Tfgt+Ans3PoAg+c3J3n7oF
         YsxfoAyEP3+QElJ+ZSvGzuLHPBjnr7jzVQTFnMyRrTudMUtGco4yrW9+EcHcyWVwQ49C
         UFKQ==
X-Gm-Message-State: APjAAAUNLNt/RKjs+IgGSWjpi8Xfztq/CBxvB+jn0xLh34VAgdWOJCp9
	NLpQvQ2xFzdEnnPypVrGNcrTSz+MnZXGnGboh8QWkxmuJag=
X-Google-Smtp-Source: APXvYqwCG9ZDInTm94oa2i4vRL9im2lksdv8a5s/Nv72SOQSYwzAgTm4lI5g/kZS0bx9wDIqzDU8jc+04k7ZkAfhyOw=
X-Received: by 2002:a17:902:9a98:: with SMTP id w24mr5748361plp.247.1551863251064;
 Wed, 06 Mar 2019 01:07:31 -0800 (PST)
MIME-Version: 1.0
References: <CACENP0aPbNUoWMVGSPCh44OCFGBAHk78LE+=koB5eMQwr-oTig@mail.gmail.com>
 <CAMcOB6OVDkBRLzP6kGStsHkauQWQ0ouBCByKwY_fb5TtgwMOLw@mail.gmail.com> <CACENP0YGvf-K+Fx9dPDsMmN0M0Chi6dbbHH6018DG6dvVKOLJQ@mail.gmail.com>
In-Reply-To: <CACENP0YGvf-K+Fx9dPDsMmN0M0Chi6dbbHH6018DG6dvVKOLJQ@mail.gmail.com>
From: Adar Lieber-Dembo <adar@cloudera.com>
Date: Wed, 6 Mar 2019 01:07:20 -0800
Message-ID: <CAMcOB6PNZhNUay=HQFqmh-K3gAVJO_g9uFKLNANvRwwF1bHwhA@mail.gmail.com>
Subject: Re: Check existing range partitions using the Java API
To: user@kudu.apache.org
Content-Type: multipart/alternative; boundary="000000000000d2021205836951de"

--000000000000d2021205836951de
Content-Type: text/plain; charset="UTF-8"

FWIW, you can use a newer Kudu client with an older server as we take care
to preserve backwards compatibility. The decoupling of client and server
artifacts sort of makes sense anyway, because the server artifacts are
found on the cluster nodes and the client artifacts are typically
distributed along with the application.

In any case, I agree that I don't see an obvious way to get at the
underlying per-row errors if you're using the KuduContext. Maybe someone
more familiar with the Kudu Spark bindings can chime in with suggestions.

On Wed, Mar 6, 2019 at 12:57 AM Nabeelah Harris <nabeelah.harris@impact.com>
wrote:

> Hi Adar
>
> Thanks
>
> Option 1 isn't really viable, since we're running Cloudera with Kudu 1.7,
> thus using the 1.7 client libraries. Option 2 seems to be the way to go,
> though since I am using KuduContext, I'm not sure that there is a clean way
> for me to check for errors row by row. Based on naively wrapping my
> kukuContext.upsert call in a try...catch, and running an alterTable if a
> SparkException is caught - I'm able to catch the SparkException that occurs
> with 'java.lang.RuntimeException: failed to write 1 rows from DataFrame to
> Kudu; sample errors: Not found: non-covered range' on the tasks, but of
> course I still end up with a bunch of failed tasks, and the partition is
> only added once all my tasks have failed.
>
> Do you perhaps have some guidance in this regard?
>
> On Wed, Mar 6, 2019 at 7:58 AM Adar Lieber-Dembo <adar@cloudera.com>
> wrote:
>
>> Here are some other options:
>> 1. Use the new KuduPartitioner class, available in master but not yet
>> in any releases. Given a PartialRow (i.e. a row to be inserted), you
>> can find its "partition index" and, more importantly for your use
>> case, receive an exception if no partition exists for the row.
>> 2. Insert the data anyway, and rely on per-row errors to tell you that
>> a partition is missing. This is a more "optimistic" approach, but a
>> somewhat expensive one at that.
>>
>> Would either of these work for you?
>>
>> On Tue, Mar 5, 2019 at 6:33 AM Nabeelah Harris
>> <nabeelah.harris@impact.com> wrote:
>> >
>> > Hi there
>> >
>> > Currently, the only method available on KuduTable to check which
>> > partitions already exist is 'KuduTable.getFormattedRangePartitions'.
>> > This however looks to be experimental and only intended for use by
>> > Impala. Other than replicating the logic used in the above-mentioned
>> > method, is there any way I can easily retrieve the range partitions
>> > (or partitions at all) using the Java API? My use-case at the moment
>> > is to create range partitions based on the data I am about to insert,
>> > and to do so I want to first check if that range partition already
>> > exists, to prevent errors.
>> >
>> > Thanks
>> > Nabeelah
>>
>
>
> --
> Nabeelah Harris
> nabeelah.harris@impact.com |
> https://impact.com
> <https://www.linkedin.com/company/impact-martech/>
> <https://www.facebook.com/ImpactMarTech/>
> <https://twitter.com/impactmartech>
> <https://www.youtube.com/c/impactmartech>
> <https://impactgrowth.com/>
>

--000000000000d2021205836951de
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>FWIW, you can use a newer Kudu client with an older s=
erver as we take care to preserve backwards compatibility. The decoupling o=
f client and server artifacts sort of makes sense anyway, because the serve=
r artifacts are found on the cluster nodes and the client artifacts are typ=
ically distributed along with the application.</div><div><br></div><div>In =
any case, I agree that I don&#39;t see an obvious way to get at the underly=
ing per-row errors if you&#39;re using the KuduContext. Maybe someone more =
familiar with the Kudu Spark bindings can chime in with suggestions.</div><=
br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Wed,=
 Mar 6, 2019 at 12:57 AM Nabeelah Harris &lt;<a href=3D"mailto:nabeelah.har=
ris@impact.com">nabeelah.harris@impact.com</a>&gt; wrote:<br></div><blockqu=
ote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px=
 solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div dir=3D"ltr"=
>Hi Adar<div><br></div><div>Thanks</div><div><br></div><div>Option 1 isn=
9;t really viable, since we&#39;re running Cloudera with Kudu 1.7, thus usi=
ng the 1.7 client libraries. Option 2 seems to be the way to go, though sin=
ce I am using KuduContext, I&#39;m not sure that there is a clean way for m=
e to check for errors row by row. Based on naively wrapping my kukuContext.=
upsert call in a try...catch, and running an alterTable if a SparkException=
 is caught - I&#39;m able to catch the SparkException that occurs with &#39=
;java.lang.RuntimeException: failed to write 1 rows from DataFrame to Kudu;=
 sample errors: Not found: non-covered range&#39; on the tasks, but of cour=
se I still end up with a bunch of failed tasks, and the partition is only a=
dded once all my tasks have failed.</div><div><br></div><div>Do you perhaps=
 have some guidance in this=C2=A0regard?</div></div></div><br><div class=3D=
"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Wed, Mar 6, 2019 at =
7:58 AM Adar Lieber-Dembo &lt;<a href=3D"mailto:adar@cloudera.com" target=
=3D"_blank">adar@cloudera.com</a>&gt; wrote:<br></div><blockquote class=3D"=
gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(20=
4,204,204);padding-left:1ex">Here are some other options:<br>
1. Use the new KuduPartitioner class, available in master but not yet<br>
in any releases. Given a PartialRow (i.e. a row to be inserted), you<br>
can find its &quot;partition index&quot; and, more importantly for your use=
<br>
case, receive an exception if no partition exists for the row.<br>
2. Insert the data anyway, and rely on per-row errors to tell you that<br>
a partition is missing. This is a more &quot;optimistic&quot; approach, but=
 a<br>
somewhat expensive one at that.<br>
<br>
Would either of these work for you?<br>
<br>
On Tue, Mar 5, 2019 at 6:33 AM Nabeelah Harris<br>
&lt;<a href=3D"mailto:nabeelah.harris@impact.com" target=3D"_blank">nabeela=
h.harris@impact.com</a>&gt; wrote:<br>
&gt;<br>
&gt; Hi there<br>
&gt;<br>
&gt; Currently, the only method available on KuduTable to check which<br>
&gt; partitions already exist is &#39;KuduTable.getFormattedRangePartitions=
&#39;.<br>
&gt; This however looks to be experimental and only intended for use by<br>
&gt; Impala. Other than replicating the logic used in the above-mentioned<b=
r>
&gt; method, is there any way I can easily retrieve the range partitions<br=
>
&gt; (or partitions at all) using the Java API? My use-case at the moment<b=
r>
&gt; is to create range partitions based on the data I am about to insert,<=
br>
&gt; and to do so I want to first check if that range partition already<br>
&gt; exists, to prevent errors.<br>
&gt;<br>
&gt; Thanks<br>
&gt; Nabeelah<br>
</blockquote></div><br clear=3D"all"><div><br></div>-- <br><div dir=3D"ltr"=
 class=3D"gmail-m_-7414343730866684853gmail_signature"><table style=3D"padd=
ing:0px;margin:10px 0px;border:none"><tbody><tr><td style=3D"vertical-align=
:middle;padding:0px 7px 0px 0px"><img height=3D"65" src=3D"https://storage.=
googleapis.com/signaturesatori/customer-C03aim2pl/images/companyLogo/736a34=
783bd0967fcfb23efa3490fed14fda68d9f9f27569e9eb67c3fbab98f.png"></td><td sty=
le=3D"border-left:3px solid rgb(232,232,232);padding:7px 0px 0px 10px">
			<div style=3D"font-family:tahoma,sans-serif;font-size:14px;line-height:1=
7px;font-weight:bold;color:rgb(228,0,70);margin-bottom:3px">Nabeelah Harris=
</div>

			<div style=3D"font-family:tahoma,sans-serif;font-size:12px;line-height:1=
4px;font-weight:normal;color:rgb(45,62,80);margin-bottom:5px;font-style:obl=
ique"></div>

			<div style=3D"font-family:tahoma,sans-serif;font-size:12px;line-height:1=
4px;font-weight:normal;color:rgb(45,62,80);margin-bottom:5px"><a href=3D"ma=
ilto:nabeelah.harris@impact.com" style=3D"text-decoration:none;color:rgb(45=
,62,80)" rel=3D"nofollow" target=3D"_blank">nabeelah.harris@impact.com</a> =
<span style=3D"color:rgb(150,150,150)">|</span> <a style=3D"text-decoration=
:none;color:rgb(45,62,80)" rel=3D"nofollow"></a></div>

			<div style=3D"font-family:tahoma,sans-serif;font-size:12px;line-height:1=
4px;font-weight:normal;color:rgb(45,62,80);margin-bottom:7px"><a href=3D"ht=
tps://impact.com" style=3D"text-decoration:none;color:rgb(45,62,80)" rel=3D=
"nofollow" target=3D"_blank">https://impact.com</a></div>
			<a href=3D"https://www.linkedin.com/company/impact-martech/" rel=3D"nofo=
llow" target=3D"_blank"><img src=3D"https://storage.googleapis.com/signatur=
esatori/icons/linkedin.png"></a>=C2=A0=C2=A0<a href=3D"https://www.facebook=
.com/ImpactMarTech/" rel=3D"nofollow" target=3D"_blank"><img src=3D"https:/=
/storage.googleapis.com/signaturesatori/icons/facebook.png"></a>=C2=A0=C2=
=A0<a href=3D"https://twitter.com/impactmartech" rel=3D"nofollow" target=3D=
"_blank"><img src=3D"https://storage.googleapis.com/signaturesatori/icons/t=
witter.png"></a>=C2=A0<a href=3D"https://www.youtube.com/c/impactmartech" r=
el=3D"nofollow" target=3D"_blank"><img alt=3D"" height=3D"16" src=3D"https:=
//storage.googleapis.com/signaturesatori/customer-C03aim2pl/images/b291bd04=
ed51bb081dd52b1b1225fb07e2aaf0192d3bd04d6b46162b02ea66d.png" width=3D"16"><=
/a></td></tr></tbody></table>
<br>
<a href=3D"https://impactgrowth.com/" rel=3D"nofollow" target=3D"_blank"><i=
mg height=3D"150" src=3D"https://storage.googleapis.com/signaturesatori/cus=
tomer-C03aim2pl/images/46ee8bcbca75397f8afa4741f0e12ee97c83678e713a09d1ae0d=
9f7ac05abe05.png" width=3D"600"></a></div>
</blockquote></div></div>

--000000000000d2021205836951de--