From user-return-1620-archive-asf-public=cust-asf.ponee.io@kudu.apache.org Wed Mar 6 09:14:14 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 7981C180656 for ; Wed, 6 Mar 2019 10:14:13 +0100 (CET) Received: (qmail 36056 invoked by uid 500); 6 Mar 2019 09:14:12 -0000 Mailing-List: contact user-help@kudu.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@kudu.apache.org Delivered-To: mailing list user@kudu.apache.org Received: (qmail 36043 invoked by uid 99); 6 Mar 2019 09:14:11 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Mar 2019 09:14:11 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id DB84EC23C3 for ; Wed, 6 Mar 2019 09:14:10 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.81 X-Spam-Level: * X-Spam-Status: No, score=1.81 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, T_REMOTE_IMAGE=0.01, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=cloudera.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id fCZ1h8XiqnbI for ; Wed, 6 Mar 2019 09:14:08 +0000 (UTC) Received: from mail-pf1-f182.google.com (mail-pf1-f182.google.com [209.85.210.182]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 104D361108 for ; Wed, 6 Mar 2019 09:07:33 +0000 (UTC) Received: by mail-pf1-f182.google.com with SMTP id n125so8067737pfn.5 for ; Wed, 06 Mar 2019 01:07:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudera.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=yrWhuV8GY+rNLq0lPwHVG/SwPyoMt42vDoF+l+Gs3hA=; b=KepDnWiG3cfQ9TVOxRL9QEQfq5DwhyWFofjJYgf47kYnkmd528/ubED/Z9n+0462C6 6KzgFRie38OLuG60yPax/EsNh0/NEPmF1cad3xlLVcNmW/kIEr0vBUmmc/mQOCONAXzG 65257cV7aCQtlHNS1eBHmBbGrbb0RoloFqwunA4p60LAj1sL4JeCmow3LjAxRlWtMdmt JR+UoQ41WVymnG/EO5ZTKXtFXBKLe76wnKcgBkb57nvRdrFsC+LI7fjp5SWL0NWFMS/c m/kQKVJkkZ+nn9o22Q7TdTNk6Xy5+TOY34ouYNvplGwlz980hNWvD2z1Hl7VRFG1WAUn a/CQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=yrWhuV8GY+rNLq0lPwHVG/SwPyoMt42vDoF+l+Gs3hA=; b=EDVUSSxVdI87AR1FOskvwCPx2iOMPMqyfIA6VQOr6lo9OF84d5/j5PP/Ziq/vkkuYF rm71aom69JuJDJv0XMWviTcKVTV1GuBTQ58Ju0QCGWcQYuiPLldzFj+BNmV35UuVx3xE YzVyaNuNLby0eEz6JFsZ1lM+MMB0CGibnS/kzuuXHYNZIx/jOpIVrqUC/LJM5jqasY4H GVSOymgsY6LWL5QmVwCp3GAQVQ/Lc9HkcAn4o8QPeacdx/Tfgt+Ans3PoAg+c3J3n7oF YsxfoAyEP3+QElJ+ZSvGzuLHPBjnr7jzVQTFnMyRrTudMUtGco4yrW9+EcHcyWVwQ49C UFKQ== X-Gm-Message-State: APjAAAUNLNt/RKjs+IgGSWjpi8Xfztq/CBxvB+jn0xLh34VAgdWOJCp9 NLpQvQ2xFzdEnnPypVrGNcrTSz+MnZXGnGboh8QWkxmuJag= X-Google-Smtp-Source: APXvYqwCG9ZDInTm94oa2i4vRL9im2lksdv8a5s/Nv72SOQSYwzAgTm4lI5g/kZS0bx9wDIqzDU8jc+04k7ZkAfhyOw= X-Received: by 2002:a17:902:9a98:: with SMTP id w24mr5748361plp.247.1551863251064; Wed, 06 Mar 2019 01:07:31 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Adar Lieber-Dembo Date: Wed, 6 Mar 2019 01:07:20 -0800 Message-ID: Subject: Re: Check existing range partitions using the Java API To: user@kudu.apache.org Content-Type: multipart/alternative; boundary="000000000000d2021205836951de" --000000000000d2021205836951de Content-Type: text/plain; charset="UTF-8" FWIW, you can use a newer Kudu client with an older server as we take care to preserve backwards compatibility. The decoupling of client and server artifacts sort of makes sense anyway, because the server artifacts are found on the cluster nodes and the client artifacts are typically distributed along with the application. In any case, I agree that I don't see an obvious way to get at the underlying per-row errors if you're using the KuduContext. Maybe someone more familiar with the Kudu Spark bindings can chime in with suggestions. On Wed, Mar 6, 2019 at 12:57 AM Nabeelah Harris wrote: > Hi Adar > > Thanks > > Option 1 isn't really viable, since we're running Cloudera with Kudu 1.7, > thus using the 1.7 client libraries. Option 2 seems to be the way to go, > though since I am using KuduContext, I'm not sure that there is a clean way > for me to check for errors row by row. Based on naively wrapping my > kukuContext.upsert call in a try...catch, and running an alterTable if a > SparkException is caught - I'm able to catch the SparkException that occurs > with 'java.lang.RuntimeException: failed to write 1 rows from DataFrame to > Kudu; sample errors: Not found: non-covered range' on the tasks, but of > course I still end up with a bunch of failed tasks, and the partition is > only added once all my tasks have failed. > > Do you perhaps have some guidance in this regard? > > On Wed, Mar 6, 2019 at 7:58 AM Adar Lieber-Dembo > wrote: > >> Here are some other options: >> 1. Use the new KuduPartitioner class, available in master but not yet >> in any releases. Given a PartialRow (i.e. a row to be inserted), you >> can find its "partition index" and, more importantly for your use >> case, receive an exception if no partition exists for the row. >> 2. Insert the data anyway, and rely on per-row errors to tell you that >> a partition is missing. This is a more "optimistic" approach, but a >> somewhat expensive one at that. >> >> Would either of these work for you? >> >> On Tue, Mar 5, 2019 at 6:33 AM Nabeelah Harris >> wrote: >> > >> > Hi there >> > >> > Currently, the only method available on KuduTable to check which >> > partitions already exist is 'KuduTable.getFormattedRangePartitions'. >> > This however looks to be experimental and only intended for use by >> > Impala. Other than replicating the logic used in the above-mentioned >> > method, is there any way I can easily retrieve the range partitions >> > (or partitions at all) using the Java API? My use-case at the moment >> > is to create range partitions based on the data I am about to insert, >> > and to do so I want to first check if that range partition already >> > exists, to prevent errors. >> > >> > Thanks >> > Nabeelah >> > > > -- > Nabeelah Harris > nabeelah.harris@impact.com | > https://impact.com > > > > > > --000000000000d2021205836951de Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
FWIW, you can use a newer Kudu client with an older s= erver as we take care to preserve backwards compatibility. The decoupling o= f client and server artifacts sort of makes sense anyway, because the serve= r artifacts are found on the cluster nodes and the client artifacts are typ= ically distributed along with the application.

In = any case, I agree that I don't see an obvious way to get at the underly= ing per-row errors if you're using the KuduContext. Maybe someone more = familiar with the Kudu Spark bindings can chime in with suggestions.
<= br>
On Wed,= Mar 6, 2019 at 12:57 AM Nabeelah Harris <nabeelah.harris@impact.com> wrote:
Hi Adar

Thanks

Option 1 isn= 9;t really viable, since we're running Cloudera with Kudu 1.7, thus usi= ng the 1.7 client libraries. Option 2 seems to be the way to go, though sin= ce I am using KuduContext, I'm not sure that there is a clean way for m= e to check for errors row by row. Based on naively wrapping my kukuContext.= upsert call in a try...catch, and running an alterTable if a SparkException= is caught - I'm able to catch the SparkException that occurs with '= ;java.lang.RuntimeException: failed to write 1 rows from DataFrame to Kudu;= sample errors: Not found: non-covered range' on the tasks, but of cour= se I still end up with a bunch of failed tasks, and the partition is only a= dded once all my tasks have failed.

Do you perhaps= have some guidance in this=C2=A0regard?

On Wed, Mar 6, 2019 at = 7:58 AM Adar Lieber-Dembo <adar@cloudera.com> wrote:
Here are some other options:
1. Use the new KuduPartitioner class, available in master but not yet
in any releases. Given a PartialRow (i.e. a row to be inserted), you
can find its "partition index" and, more importantly for your use=
case, receive an exception if no partition exists for the row.
2. Insert the data anyway, and rely on per-row errors to tell you that
a partition is missing. This is a more "optimistic" approach, but= a
somewhat expensive one at that.

Would either of these work for you?

On Tue, Mar 5, 2019 at 6:33 AM Nabeelah Harris
<nabeela= h.harris@impact.com> wrote:
>
> Hi there
>
> Currently, the only method available on KuduTable to check which
> partitions already exist is 'KuduTable.getFormattedRangePartitions= '.
> This however looks to be experimental and only intended for use by
> Impala. Other than replicating the logic used in the above-mentioned > method, is there any way I can easily retrieve the range partitions > (or partitions at all) using the Java API? My use-case at the moment > is to create range partitions based on the data I am about to insert,<= br> > and to do so I want to first check if that range partition already
> exists, to prevent errors.
>
> Thanks
> Nabeelah


--
Nabeelah Harris=
=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A03D""<= /a>

--000000000000d2021205836951de--