Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0A003E4AF for ; Thu, 6 Dec 2012 21:08:08 +0000 (UTC) Received: (qmail 80385 invoked by uid 500); 6 Dec 2012 21:08:08 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 80355 invoked by uid 500); 6 Dec 2012 21:08:08 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 80347 invoked by uid 99); 6 Dec 2012 21:08:08 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Dec 2012 21:08:08 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of eric.newton@gmail.com designates 209.85.223.169 as permitted sender) Received: from [209.85.223.169] (HELO mail-ie0-f169.google.com) (209.85.223.169) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Dec 2012 21:08:01 +0000 Received: by mail-ie0-f169.google.com with SMTP id c14so12753983ieb.0 for ; Thu, 06 Dec 2012 13:07:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=HWvKnEzSplfvUqsC/G4fW02Cv03Bo6gO6/tyx2B7R9k=; b=YvYLh6EirDcTm9TWZotOVTuAmS3bBYe5x0L6LJXWR6qIc9SDeRlz0P5Au1Sevt5t6x z0q4w76hEpMbT5aj6utDjU/O/1RZ+q1fdnmgaKfr6YmKSSNK1ORbGBdS/QWjpfROodrN Or9p+yePKwDXrTF0x7RjVMeUD4s2q3O49FWg9rvlnmpMAH3YMpjxOiC8z0MPQusjHNJT IRR5oiIlOXi0b1yzUAvD701ILlUtXap1TTmMf4EEO1RqC0tLjZYhFNbdnYVPCVKikFsJ Dz/IpOqU6jI4G9WMYfZg6WeGSk+BauRltnE0YUtuvMDy1vNAlSdKa/UaQrE4FFOAd68A 72Cw== MIME-Version: 1.0 Received: by 10.43.4.70 with SMTP id ob6mr2579821icb.56.1354828060110; Thu, 06 Dec 2012 13:07:40 -0800 (PST) Received: by 10.50.237.2 with HTTP; Thu, 6 Dec 2012 13:07:39 -0800 (PST) In-Reply-To: References: Date: Thu, 6 Dec 2012 16:07:39 -0500 Message-ID: Subject: Fwd: Tuning & Compactions From: Eric Newton To: user@accumulo.apache.org Content-Type: multipart/alternative; boundary=bcaec50fe22741302e04d0357dfa X-Virus-Checked: Checked by ClamAV on apache.org --bcaec50fe22741302e04d0357dfa Content-Type: text/plain; charset=ISO-8859-1 Keith noted that my response didn't go back to the whole list. -Eric ---------- Forwarded message ---------- From: Eric Newton Date: Tue, Dec 4, 2012 at 2:25 PM Subject: Re: Tuning & Compactions To: chris@burrell.me.uk By "small indexes"... I mean they are small to read off disk. If you write a gigabyte of indexes, it's going to take some time to read them into RAM. The index is a sub-set of all the keys in the RFile. If you have lots of keys in the index, the lookups can be faster, but it takes more time to load those keys into RAM. Keep your keys small, and try to keep the sub-set of keys in the index small so that first lookup is fast. A million index keys for a billion key/values is not unreasonable. We have used even smaller ratios, especially when the files to be imported are constructed to fit the current split points. You can have an infinite number of families and qualifiers. However, if you ever want to put families into locality groups, it's easier to configure them if the number of families you want in the group is a small number. A group separates families by name. Using the example from the google BigTable paper: you can store small indexed items, like URLs, separately from large value items, like whole web pages, which will give you faster search over the small items, while logically keeping them in the same sorted index. URLs would go into one group, which would be stored separately from another group containing the whole web page and maybe something like image data. A search on URLs would not need to decompress and skip over large values while scanning. Further, URLs are more similar to themselves, than they are to images, and so are likely to compress better when stored together. To complicate things further, Accumulo does not create separate files for each family group, as implied in the BigTable paper. They are stored in separate sections of the RFile. They are also created lazily: as the data is re-written, they will gradually be organized according to the locality group specifications. You can force a re-write, if you like. If you find yourself wanting to put extensions in the column family that have nothing to do with locality groups, just move it over to the column qualifier. We put carefully structured, binary data in the column qualifier all the time. -Eric On Tue, Dec 4, 2012 at 1:06 PM, Chris Burrell wrote: > Thanks for all the comments below. Very helpful! > > On the last point, around "small indexes", do you mean if your set of keys > is small, but having many column-families and column qualifiers? What order > of magnitude would you consider to be small? A few million keys/billion > keys? Or in another way, keys with 10s/100s of column families/qualifiers. > > I have another question around the use of column families and qualifiers. > Would it be good or bad practice to have many column families/qualifiers > per row. I was just wondering if there would be any point in using these > almost as extensions to the keys, i.e. the column family/qualifier would > end up being the last part of the key. I understand column families can > also be used to control how the data gets stored to maximize scanning too. > I was just wondering if there would be drawbacks on having many of these. > > Chris > > > > On 28 November 2012 20:31, Eric Newton wrote: > >> Some comments inlined below: >> >> On Wed, Nov 28, 2012 at 2:49 PM, Chris Burrell wrote: >> >>> Hi >>> >>> I am trialling Accumulo on a small (tiny) cluster and wondering how the >>> best way to tune it would be. I have 1 master + 2 tservers. The master has >>> 8Gb of RAM and the tservers have each 16Gb each. >>> >>> I have set the walogs size to be 2Gb with an external memory map of 9G. >>> The ratio is still the defaulted to 3. I've also upped the heap sizes of >>> each tserver to 2Gb heaps. >>> >>> I'm trying to achieve high-speed ingest via batch writers held on >>> several other servers. I'm loading two separate tables. >>> >>> Here are some questions I have: >>> - Does the config above sound sensible? or overkill? >>> >> >> Looks good to me, assuming you aren't doing other things (like >> map/reduce) on the machines. >> >> >>> - Is it preferable to have more servers with lower specs? >>> >> Yes. Mostly to get more drives. >> >> >>> - Is this the best way to maximise use of the memory? >>> >> It's not bad. You may want to have larger block caches and a smaller >> in-memory map. But if you want to write-mostly, read-little, this is good. >> >> >>> - Does the fact I have 3x2Gb walogs, means that the remaining 3Gb in the >>> external memory map can be used while compactions occur? >>> >> >> Yes. You will want to increase the size or number of logs. With that >> many servers, failures will hopefully be very rare. I would go with >> changing 3 to 8. Having lots of logs on a tablet is no big deal if you >> have disk space, and don't expect many failures. >> >> >>> - When minor compactions occur, does this halt ingest on that particular >>> tablet? or tablet server? >>> >> Only if memory fills before the compactions finish. The monitor page will >> indicate this by displaying "hold time." When this happens the tserver >> will self-tune and start minor compactions earlier with future ingest. >> >> >>> - I have pre-split the tables six-ways, but not entirely sure if that's >>> preferable if I only have 2 servers while trying it out? Perhaps 2 ways >>> might be better? >>> >> Not for that reason, but to be able to use more cores concurrently. Aim >> for 50-100 tablets/node. >> >> >>> - Does the batch upload through the shell client give significantly >>> better performance stats? >>> >> >> Using map/reduce to create RFiles is more efficient. But it also >> increases latency: you only can see the data when the whole file is loaded. >> >> When a file is batch-loaded, its index is read, and the file is assigned >> to matching tablets. With small indexes, you can batch-load terabytes in >> minutes. >> >> -Eric >> >> > --bcaec50fe22741302e04d0357dfa Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Keith noted that my response didn't go back to the whole list.

=
-Eric

---------- Forwarded mes= sage ----------
From: Eric Newton <eric.newton@gmai= l.com>
Date: Tue, Dec 4, 2012 at 2:25 PM
Subject: Re: Tuning & Compactions<= br>To: chris@burrell.me.uk

By "small indexes"... I mean they are small to read off dis= k. =A0If you write a gigabyte of indexes, it's going to take some time = to read them into RAM. The index is a sub-set of all the keys in the RFile.= =A0If you have lots of keys in the index, the lookups can be faster, but i= t takes more time to load those keys into RAM. =A0Keep your keys small, and= try to keep the sub-set of keys in the index small so that first lookup is= fast. =A0A million index keys for a billion key/values is not unreasonable= . =A0We have used even smaller ratios, especially when the files to be impo= rted are constructed to fit the current split points.

You can have an infinite number of families and qualifiers. = =A0However, if you ever want to put families into locality groups, it's= easier to configure them if the number of families you want in the group i= s a small number. =A0A group separates families by name.

Using the example from the google BigTable paper: you c= an store small indexed items, like URLs, separately from large value items,= like whole web pages, which will give you faster search over the small ite= ms, while logically keeping them in the same sorted index. =A0URLs would go= into one group, which would be stored separately from another group contai= ning the whole web page and maybe something like image data. =A0A search on= URLs would not need to decompress and skip over large values while scannin= g. =A0Further, URLs are more similar to themselves, than they are to images= , and so are likely to compress better when stored together.

To complicate things further, Accumulo does not create = separate files for each family group, as implied in the BigTable paper. =A0= They are stored in separate sections of the RFile. =A0They are also created= lazily: as the data is re-written, they will gradually be organized accord= ing to the locality group specifications. =A0You can force a re-write, if y= ou like.

If you find yourself wanting to put extensions in the c= olumn family that have nothing to do with locality groups, just move it ove= r to the column qualifier. =A0We put carefully structured, binary data in t= he column qualifier all the time.

-Eric



On Tue, Dec 4, 2012 at 1:06 PM, Chris Burrell <chris@burrell= .me.uk> wrote:
Thanks for all the comments below. Very helpful!=A0<= div style=3D"font-family:arial,sans-serif;font-size:13px">
On the last point, around "small indexes", do you mean if your se= t of keys is small, but having many column-families and column qualifiers? = What order of magnitude would you consider to be small? A few million keys/= billion keys? Or in another way, keys with 10s/100s of column families/qual= ifiers.

I have another questio= n around the use of column families and qualifiers. Would it be good or bad= practice to have many column families/qualifiers per row. =A0I was just wo= ndering if there would be any point in using these almost as extensions to = the keys, i.e. the column family/qualifier would end up being the last part= of the key. I understand column families can also be used to control how t= he data gets stored to maximize scanning too. I was just wondering if there= would be drawbacks on having many of these.

Chris



On 28 November 2012 20:31, Eric Newton <= span dir=3D"ltr"><eric.newton@gmail.com> wrote:
Some comments inlined below:

On Wed, Nov 28, 2012 at 2:49 PM, Chris Burrell <ch= ris@burrell.me.uk> wrote:
Hi

I am trialling Accumulo on a small (= tiny) cluster and wondering how the best way to tune it would be. I have 1 = master + 2 tservers. The master has 8Gb of RAM and the tservers have each 1= 6Gb each.

I have set the walogs size to be 2Gb with an external m= emory map of 9G. The ratio is still the defaulted to 3. I've also upped= the heap sizes of each tserver to 2Gb heaps.=A0

I'm trying to achieve high-speed ingest via batch writers held on sever= al other servers. I'm loading two separate tables.=A0

Here are some questions I have:
- Does the config above s= ound sensible? or overkill?
=A0
Looks good to me, assuming you aren&#= 39;t doing other things (like map/reduce) on the machines.
= =A0
-=A0Is it preferable to have more servers with lower specs?
Yes. =A0Mostly to get more drives.
=A0
-=A0Is this the best way to maximise use of the memory?
It's not bad. =A0You may want to have larger block cache= s and a smaller in-memory map. =A0But if you want to write-mostly, read-lit= tle, this is good.
=A0
-=A0Does the fact I have 3x2Gb walogs, = means that the remaining 3Gb in the external memory map can be used while c= ompactions occur?

Yes. =A0You will want to increase th= e size or number of logs. =A0With that many servers, failures will hopefull= y be very rare. =A0I would go with changing 3 to 8. =A0Having lots of logs = on a tablet is no big deal if you have disk space, and don't expect man= y failures.
=A0
-=A0When minor compactions occur, does this halt ingest on that partic= ular tablet? or tablet server?
Only if memory = fills before the compactions finish. The monitor page will indicate this by= displaying "hold time." =A0When this happens the tserver will se= lf-tune and start minor compactions earlier with future ingest. =A0
=A0
-=A0I have pre-split the tables six-way= s, but not entirely sure if that's preferable if I only have 2 servers = while trying it out? Perhaps 2 ways might be better?
Not for that reason, but to be able to use more cor= es concurrently. =A0Aim for 50-100 tablets/node.
=A0
-=A0Does the batch upload through the shell client give significantly = better performance stats?
=A0
Using = map/reduce to create RFiles is more efficient. But it also increases latenc= y: you only can see the data when the whole file is loaded.

When a file is batch-loaded, its index is read, and the= file is assigned to matching tablets. =A0With small indexes, you can batch= -load terabytes in minutes.=A0
=A0
-Eric




--bcaec50fe22741302e04d0357dfa--