Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id A6409200C28 for ; Mon, 27 Feb 2017 00:53:57 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id A4CDB160B77; Sun, 26 Feb 2017 23:53:57 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A0FA2160B6E for ; Mon, 27 Feb 2017 00:53:56 +0100 (CET) Received: (qmail 57346 invoked by uid 500); 26 Feb 2017 23:53:55 -0000 Mailing-List: contact user-help@kudu.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@kudu.apache.org Delivered-To: mailing list user@kudu.apache.org Received: (qmail 57332 invoked by uid 99); 26 Feb 2017 23:53:55 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 26 Feb 2017 23:53:55 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 1045DC0F28 for ; Sun, 26 Feb 2017 23:53:55 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.498 X-Spam-Level: ** X-Spam-Status: No, score=2.498 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=thesystech-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id vk2T8hGUTAUo for ; Sun, 26 Feb 2017 23:53:53 +0000 (UTC) Received: from mail-qk0-f171.google.com (mail-qk0-f171.google.com [209.85.220.171]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id A3EA060E17 for ; Sun, 26 Feb 2017 23:53:53 +0000 (UTC) Received: by mail-qk0-f171.google.com with SMTP id n127so74679149qkf.0 for ; Sun, 26 Feb 2017 15:53:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=thesystech-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=Hn9lCXthucTVLlEWttghxDSoJN5fDiOEf+razCZ6zfs=; b=GyzXf+aQUsW5uxJfljGz5LT10s7GNMtmcOcxjaDPmoE6i2AROU0OMGB4wWhz7dIfHu ViKmNxUEkjnZmXT8LNDLaufqz92td2tRhIztgiGRlq1b8PJDbJj3mcpO+vJJplvrzxa1 cufQXRvbLDdPVrYV4S4no47kxDKtifXpdQxD2rbAyfICiBCQyF3qaQeMtXKJJ2weVoFS 095gcJQ6GCs6LAhKipomORQoy0AWRjG+BODo9EpvH4U4mKLG7x5ARrULSZQNN5xvUDmZ j/Vmyci4H1cFfJ1WyVkkSAUXmzcg+ufTmBL9zd5bAjEQN1SlOlEfUFHIVcxeIkzzLdIL eBOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=Hn9lCXthucTVLlEWttghxDSoJN5fDiOEf+razCZ6zfs=; b=I9znY6ZeHshSzjhRIQ2/N/UkscvMjEabB7Daooe0FyuGcD45NTjSF5TtRiXoZi5jiN NgCom1i31adrO9GkJk8jxaMcE0NxfH7AxXiBfhGYozpBudsXPPCY2Ni2/mWbGGiLxBv4 fQGzLu3+9OUiuXimoxmSgQKF4cfr2FGBQ+fy6GmzbOkW5knEWhYWy+E3gIihD2Y3np5D n6wAG+NqQXSoaEgZmPpWQVTt7IgaitHGknuT7ShiQzSASlV2KVJFHBsGxMcbi2cjqY9U Q8KuBOpqSlufy85TGJ1kfQi4uCFAJ6tAXiuVRByO3NbHDrqqd1ASbRpt3lY8WNs5Mr6r 2UJg== X-Gm-Message-State: AMke39mNGigtA3nhcnGqbbOwP1Er5mV3NyCVpp9z53/4JTh9q6vFY+gdtC3EMpzai65hxZ934dtQW8P/dL2G6XT5 X-Received: by 10.200.45.112 with SMTP id o45mr14029385qta.92.1488153233053; Sun, 26 Feb 2017 15:53:53 -0800 (PST) MIME-Version: 1.0 Received: by 10.140.97.199 with HTTP; Sun, 26 Feb 2017 15:53:52 -0800 (PST) In-Reply-To: References: From: Paul Brannan Date: Sun, 26 Feb 2017 18:53:52 -0500 Message-ID: Subject: Re: mixing range and hash partitioning To: user@kudu.apache.org Content-Type: multipart/alternative; boundary=001a113a80e8d3f7ae054977acc2 archived-at: Sun, 26 Feb 2017 23:53:57 -0000 --001a113a80e8d3f7ae054977acc2 Content-Type: text/plain; charset=UTF-8 Is that 4TB per tablet server, regardless of how many tablets it has? If I have 128GB of data per day, then each tablet server hits the recommended limit after about a month. To store 10 years of data, I would need 120 tablet servers to avoid going over the limit. Is that the best solution or is there another alternative? How many cores are recommended per tablet server? If I typically only scan one day of data at time, could a single core service multiple tablet servers? On Fri, Feb 24, 2017 at 11:22 PM, Paul Brannan wrote: > The test doesn't exactly reproduce what I did in my sample program. > > I'm able to successfully drop the unbounded partition in both cases > (calling set_range_partition_columns only vs calling > set_range_partition_columns+add_hash_partitions). However, if I omit the > call to DropRangePartition, then AddRangePartition succeeds in the first > case and fails in the second case. I expect it to succeed in both cases or > fail in both cases. > > I've attached a simple program which demonstrates. > > > On Fri, Feb 24, 2017 at 7:09 PM, Dan Burkert > wrote: > >> Hi Paul, >> >> I can't reproduce the behavior you are describing, I always get a single >> unbounded range partition when creating the table without specifying range >> bounds or splits (regardless of hash partitioning). I searched and couldn't >> find a unit test for this behavior, so I wrote one - you might compare your >> code against my test. https://gerrit.cloudera.org/#/c/6153/ >> >> Thanks, >> Dan >> >> On Fri, Feb 24, 2017 at 2:41 PM, Paul Brannan < >> paul.brannan@thesystech.com> wrote: >> >>> I can verify that dropping the unbounded range partition allows me to >>> later add bounded partitions. >>> >>> If I only have range partitioning (by commenting out the call to >>> add_hash_partitions), adding a bounded partition succeeds, regardless of >>> whether I first drop the unbounded partition. This seems surprising; why >>> the difference? >>> >>> On Fri, Feb 24, 2017 at 4:20 PM, Dan Burkert >>> wrote: >>> >>>> Hi Paul, >>>> >>>> I think the issue you are running into is that if you don't add a range >>>> partition explicitly during table creation (by calling add_range_partition >>>> or inserting a split with add_range_partition_split), Kudu will default to >>>> creating 1 unbounded range partition. So your two options are to add the >>>> range partition during table creation time, or if you only know that >>>> partition you want at a later time, you can drop the existing partition >>>> (alterer->DropRangePartition with two empty rows), then add the range >>>> partition. Note that dropping the range partition will effectively >>>> truncate the table. This can be done with the same alterer in a single >>>> transaction. If you want to see a bunch of examples, you can check out >>>> this unit test: https://github.com/apache/kudu/blob/master/src/kudu/in >>>> tegration-tests/alter_table-test.cc#L1106. >>>> >>>> - Dan >>>> >>>> On Fri, Feb 24, 2017 at 10:53 AM, Paul Brannan < >>>> paul.brannan@thesystech.com> wrote: >>>> >>>>> I'm trying to create a table with one-column range-partitioned and >>>>> another column hash-partitioned. Documentation for add_hash_partitions and >>>>> set_range_partition_columns suggest this should be possible ("Tables must >>>>> be created with either range, hash, or range and hash partitioning"). >>>>> >>>>> I have a schema with three INT64 columns ("time", "key", and >>>>> "value"). When I create the table, I set up the partitioning: >>>>> >>>>> (*table_creator) >>>>> .table_name("test_table") >>>>> .schema(&schema) >>>>> .add_hash_partitions({"key"}, 2) >>>>> .set_range_partition_columns({"time"}) >>>>> .num_replicas(1) >>>>> .Create() >>>>> >>>>> I later try to add a partition: >>>>> >>>>> auto timesplit(KuduSchema & schema, std::int64_t t) { >>>>> auto split = schema.NewRow(); >>>>> check_ok(split->SetInt64("time", t)); >>>>> return split; >>>>> } >>>>> >>>>> alterer->AddRangePartition( >>>>> timesplit(schema, date_start), >>>>> timesplit(schema, next_date_start)); >>>>> >>>>> check_ok(alterer->Alter()); >>>>> >>>>> But I get an error "Invalid argument: New range partition conflicts >>>>> with existing range partition". >>>>> >>>>> How are hash and range partitioning intended to be mixed? >>>>> >>>>> >>>> >>> >> > --001a113a80e8d3f7ae054977acc2 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Is that 4TB per tablet server, regardless of how= many tablets it has?

If I have 128GB of data per day, then ea= ch tablet server hits the recommended limit after about a month.=C2=A0 To s= tore 10 years of data, I would need 120 tablet servers to avoid going over = the limit.=C2=A0 Is that the best solution or is there another alternative?=

How many cores are recommended per tablet server?=C2=A0 If I = typically only scan one day of data at time, could a single core service mu= ltiple tablet servers?


On Fri, Feb 24, 2017 at 11:22 PM, Paul Brannan <paul.brannan@thesystech.com> wrote:
The test doesn't exactly repro= duce what I did in my sample program.

I'm able to successf= ully drop the unbounded partition in both cases (calling set_range_partitio= n_columns only vs calling set_range_partition_columns+add_hash_partiti= ons).=C2=A0 However, if I omit the call to DropRangePartition, then AddRang= ePartition succeeds in the first case and fails in the second case.=C2=A0 I= expect it to succeed in both cases or fail in both cases.

I&#= 39;ve attached a simple program which demonstrates.


On Fri, Feb 24, 2017 at 7:09 PM, Dan Burkert <danburkert@= apache.org> wrote:
Hi Paul,

I can't reproduce the behavior you= are describing, I always get a single unbounded range partition when creat= ing the table without specifying range bounds or splits (regardless of hash= partitioning). I searched and couldn't find a unit test for this behav= ior, so I wrote one - you might compare your code against my test.=C2=A0https://ge= rrit.cloudera.org/#/c/6153/

Thanks,
=
Dan

On Fri, Feb 24, 2017 at 2:41 PM, Paul Brannan <= ;paul.bran= nan@thesystech.com> wrote:
=
I can verify that dropping the unbounded range partit= ion allows me to later add bounded partitions.

If I only have = range partitioning (by commenting out the call to add_hash_partitions), add= ing a bounded partition succeeds, regardless of whether I first drop the un= bounded partition.=C2=A0 This seems surprising; why the difference?

On Fri, Feb 24, 2017 at 4:20 PM, Dan Burker= t <danburkert@apache.org> wrote:
Hi Paul,

I think the issue y= ou are running into is that if you don't add a range partition explicit= ly during table creation (by calling add_range_partition or inserting a spl= it with add_range_partition_split), Kudu will default to creating 1 unbound= ed range partition.=C2=A0 So your two options are to add the range partitio= n during table creation time, or if you only know that partition you want a= t a later time, you can drop the existing partition (alterer->DropRangeP= artition with two empty rows), then add the range partition.=C2=A0 Note tha= t dropping the range partition will effectively truncate the table.=C2=A0 T= his can be done with the same alterer in a single transaction.=C2=A0 If you= want to see a bunch of examples, you can check out this unit test:=C2=A0https://github.com/apache/kudu/blob/master/src/kudu/integration-tests/alter_table-test= .cc#L1106.

= - Dan
<= br>
On Fri, Feb 24, 2017 at 10:53 AM, Paul Branna= n <paul.brannan@thesystech.com> wrote:
I'm trying to create = a table with one-column range-partitioned and another column hash-partition= ed.=C2=A0 Documentation for add_hash_partitions and set_range_partition_col= umns suggest this should be possible ("Tables must be created with eit= her range, hash, or range and hash partitioning").

I have= a schema with three INT64 columns ("time", "key", and = "value").=C2=A0 When I create the table, I set up the partitionin= g:

(*table_creator)
=C2=A0 .table_name("test_table")
=C2=A0 .schema(&schema)
=C2=A0 .add_hash_partitions({"key"}, = 2)
=C2=A0 .set_range_partition_col= umns({"time"})
=C2= =A0 .num_replicas(1)
=C2=A0 .Creat= e()

I later try to add a partition:

auto timesplit(KuduSchema & schema, std::int64_t t) {
<= /div>
=C2=A0 auto split =3D schema.NewRow();=
=C2=A0 check_ok(split->SetInt6= 4("time", t));
=C2= =A0 return split;
}

<= div style=3D"margin-left:40px">alterer->AddRangePartition(
=C2=A0 timesplit(schema, date_start),
=
=C2=A0 timesplit(schema, next_date_start));=

check_ok(alterer->Alter())= ;

But I get an error "Invalid argument: New range partiti= on conflicts with existing range partition".

How are hash= and range partitioning intended to be mixed?






--001a113a80e8d3f7ae054977acc2--