From user-return-2209-apmail-accumulo-user-archive=accumulo.apache.org@accumulo.apache.org Tue Apr 9 00:43:36 2013 Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A5683F0C8 for ; Tue, 9 Apr 2013 00:43:36 +0000 (UTC) Received: (qmail 75832 invoked by uid 500); 9 Apr 2013 00:43:36 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 75792 invoked by uid 500); 9 Apr 2013 00:43:36 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 75783 invoked by uid 99); 9 Apr 2013 00:43:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Apr 2013 00:43:36 +0000 X-ASF-Spam-Status: No, hits=-2.8 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_HI,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of prvs=1804f5c506=matt.dickson@defence.gov.au designates 203.6.68.1 as permitted sender) Received: from [203.6.68.1] (HELO defence.gov.au) (203.6.68.1) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Apr 2013 00:43:32 +0000 From: "Dickson, Matt MR" To: "'user@accumulo.apache.org'" Date: Tue, 9 Apr 2013 10:43:07 +1000 Subject: RE: Removing splits [SEC=UNCLASSIFIED] Thread-Topic: Removing splits [SEC=UNCLASSIFIED] Thread-Index: Ac40uQ6AtJuHosf0RUGDex/1G3pc7wAARFvw Message-ID: <24070BEF0A3F684489AA943FD3439EF2056E5DF566@CARRXM06.drn.mil.au> References: <24070BEF0A3F684489AA943FD3439EF2056E5DF563@CARRXM06.drn.mil.au> In-Reply-To: Accept-Language: en-US, en-AU Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-protective-marking: VER=2005.6, NS=gov.au, SEC=UNCLASSIFIED, ORIGIN=matt.dickson@defence.gov.au x-tituslabs-classifications-30: TLPropertyRoot=Titus;Classification=UNCLASSIFIED;Precedence=ROUTINE;Privacy= ; x-tituslabs-classificationhash-30: VgNFIFU9Hx+/nZJb9Kg7InklvD3VkBZJCOgpNk2o36EqZsUmbdVo9fqwLlkyx48K3ghuACQ+jV7MzIot/mqarEN3YgF47LGQHkKxCvcReKigcVv5o0eTZqa2Gd9ZSSZkDVN+KYgZspu+pdi9shaXEFmrGEvSHjlzOykuCDF6CtqG1s56lDjLjGLbOXzheImacDcW3RohbnGVKGdkaEoMHbMPiiNrmr4qsCmSzZrJhEZIeC9NEqojA1h2ZIM7s9Kq x-titus-version: 3.3.8.1 acceptlanguage: en-US, en-AU Content-Type: multipart/alternative; boundary="_000_24070BEF0A3F684489AA943FD3439EF2056E5DF566CARRXM06drnmi_" MIME-Version: 1.0 X-OriginalArrivalTime: 09 Apr 2013 00:43:08.0406 (UTC) FILETIME=[3476E960:01CE34BB] X-Virus-Checked: Checked by ClamAV on apache.org --_000_24070BEF0A3F684489AA943FD3439EF2056E5DF566CARRXM06drnmi_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable UNCLASSIFIED All queries will include a on date range, ane a particular family value whi= ch will specify the shard of data. The splits have been setup to prevent h= otspoting on load and because the most recent data is queried most heavily = striping the data across the cluster for each day will ensure query distrib= ution. My understanding of the splits was that they were only used during loading = the data, so once the data is loaded they were redundant. Is that correct? ________________________________ From: David Medinets [mailto:david.medinets@gmail.com] Sent: Tuesday, 9 April 2013 10:27 To: accumulo-user Subject: Re: Removing splits [SEC=3DUNCLASSIFIED] What advantage do you feel you'll gain by removing the splits? Do you know = how you'll be querying the data? On Mon, Apr 8, 2013 at 7:35 PM, Dickson, Matt MR > wrote: UNCLASSIFIED Hi guys, Just a simple question. We ingest data in daily batches and create splits = on the data to distribute the loading, eg splits are 20130407-1, 20130407-2= , ... 20130407-n Once this data is loaded the splits will not be required again. Is there a= maximum number of splits a table can have? How can splits be removed once= they are nolonger required, I can't see any command in the api? Thanks in advance, Matt Dickson IMPORTANT: This email remains the property of the Department of Defence and= is subject to the jurisdiction of section 70 of the Crimes Act 1914. If yo= u have received this email in error, you are requested to contact the sende= r and delete the email. IMPORTANT: This email remains the property of the Department of Defence and= is subject to the jurisdiction of section 70 of the Crimes Act 1914. If yo= u have received this email in error, you are requested to contact the sende= r and delete the email. --_000_24070BEF0A3F684489AA943FD3439EF2056E5DF566CARRXM06drnmi_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

UNCLASSIFIED

All queries will include a on date range, ane=20 a particular family value which will specify the shard of=20 data.  The splits have been setup to prevent hotspoting on load a= nd=20 because the most recent data is queried most heavily striping the data acro= ss=20 the cluster for each day will ensure query distribution.
 <= /DIV>
My understanding of the splits was that they were onl= y used=20 during loading the data, so once the data is loaded=20 they were redundant. Is that correct? 


From: David Medinets=20 [mailto:david.medinets@gmail.com]
Sent: Tuesday, 9 April 2013=20 10:27
To: accumulo-user
Subject: Re: Removing splits=20 [SEC=3DUNCLASSIFIED]

What advantage do you feel you'll gain by removing the split= s? Do=20 you know how you'll be querying the data?


On Mon, Apr 8, 2013 at 7:35 PM, Dickson, Matt MR <= SPAN=20 dir=3Dltr><matt.dickson@defence.gov.au> wrote:

UNCLASSIFIED

Hi guys,
 
Just a simple question.  We inges= t data=20 in daily batches and create splits on the data to distribute the loading,= eg=20 splits are 20130407-1, 20130407-2, ...=20 20130407-n
 
Once this data is loaded the splits wi= ll not=20 be required again.  Is there a maximum number of splits a table can= =20 have?  How can splits be removed once they are nolonger require= d, I=20 can't see any command in the api?
 
Thanks in advance,
Matt Dickson 

IMPORTANT: This= email=20 remains the property of the Department of Defence and is subject to the=20 jurisdiction of section 70 of the Crimes Act 1914. If you have received t= his=20 email in error, you are requested to contact the sender and delete the=20 email.


IMPORTANT: This=20 email remains the property of the Department of Defence and is subject to t= he=20 jurisdiction of section 70 of the Crimes Act 1914. If you have received thi= s=20 email in error, you are requested to contact the sender and delete the=20 email.

--_000_24070BEF0A3F684489AA943FD3439EF2056E5DF566CARRXM06drnmi_--