Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0771AD407 for ; Fri, 20 Jul 2012 20:34:31 +0000 (UTC) Received: (qmail 96453 invoked by uid 500); 20 Jul 2012 20:34:30 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 96403 invoked by uid 500); 20 Jul 2012 20:34:30 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 96395 invoked by uid 99); 20 Jul 2012 20:34:30 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Jul 2012 20:34:30 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of seidl2@llnl.gov designates 128.115.41.81 as permitted sender) Received: from [128.115.41.81] (HELO nspiron-1.llnl.gov) (128.115.41.81) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 20 Jul 2012 20:34:23 +0000 X-Attachments: Received: from nspexhub-2.llnl.gov (HELO nspexhub-2.the-lab.llnl.gov) ([128.115.54.114]) by nspiron-1.llnl.gov with ESMTP; 20 Jul 2012 13:34:03 -0700 Received: from NSPEXMBX-D.the-lab.llnl.gov ([128.115.54.108]) by nspexhub-2.the-lab.llnl.gov ([172.16.54.114]) with mapi; Fri, 20 Jul 2012 13:34:03 -0700 From: "Seidl, Ed" To: "user@accumulo.apache.org" Date: Fri, 20 Jul 2012 13:34:01 -0700 Subject: Re: appending data to tables (partitioning?) Thread-Topic: appending data to tables (partitioning?) Thread-Index: Ac1mtwBozbLGnO9STo+WcygZmy+1kA== Message-ID: In-Reply-To: <555320037.101143.1342815782147.JavaMail.root@linzimmb04o.imo.intelink.gov> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.2.3.120616 acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org On 7/20/12 1:23 PM, "Billie J Rinaldi" wrote: > >One thing you should think about is making it so that you only have one >file per tablet, i.e. that you create a new split point for every new >file that you import. This should be doable if your files are pretty >large and you don't end up having too many tablets. If there is only one >file per tablet, it won't compact unless you tell it to. Awesome...that's exactly the case...I'll have one file per tablet, and all the files should be more-or-less the same size (within 10% or so), on the order of a gigabyte each. Thanks for the split point tip...I hadn't thought of that. This should do exactly what I want. Thanks! Ed > >If you want to have multiple files per tablet, there are a number of >parameters you should think about. However, you should make sure that >you don't have too many files per tablet because 1) query performance >will suffer and 2) there is a limit to the number of files that a tablet >server will open. The limit to open files is adjustable. For scan, it >defaults to 100 files for all the tablets, and for major compaction it >defaults to 10 files per tablet (but the compaction can be performed in >stages). > >To change the compaction criteria, adjust table.file.max and >table.compaction.major.ratio. table.file.max is the maximum number of >files that a tablet can have. If a tablet has more files than this, it >will compact. table.compaction.major.ratio governs when compaction >occurs when a tablet has fewer files than the maximum. It also governs >which files are compacted together in either case. Raising the ratio >will make compactions happen less. If table.file.max is larger than the >number of files you expect to have per tablet, setting >table.compaction.major.ratio to the same value as table.file.max should >keep it from compacting unless there is high variation in your file >sizes. A set of files is compacted into a single file if the size of the >largest file times the ratio is <=3D the sum of the sizes of the files. > >Billie >