Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@accumulo.apache.org
Received-SPF: pass (nike.apache.org: domain of eric.newton@gmail.com
 designates 209.85.223.169 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CADxc9BndyFTf8zZovxLUBH1h7GogM7zk3tH7vVXtSRnne=3_Gg@mail.gmail.com>
References: 
 <CACQnaRWKuD2NvNMWaTR0mg5gVpKDb+kg8QWTrXzs2ga5Kz76pg@mail.gmail.com>
	<CADxc9BkCqoYmfAQjeA-9950k8MTV3HpANm23Jwr=5iALPbM00w@mail.gmail.com>
	<CACQnaRX+qcPtAGFNDOZdcbz-bUWN2Q-6BqM9QqoXZNDjC6qr2g@mail.gmail.com>
	<CADxc9BndyFTf8zZovxLUBH1h7GogM7zk3tH7vVXtSRnne=3_Gg@mail.gmail.com>
Date: Thu, 6 Dec 2012 16:07:39 -0500
Message-ID: 
 <CADxc9BkipZ_EpPVRxixJKm6GXuRFD2YW5zugsxHffdHkcVfNKQ@mail.gmail.com>
Subject: Fwd: Tuning & Compactions
From: Eric Newton <eric.newton@gmail.com>
To: user@accumulo.apache.org
Content-Type: multipart/alternative; boundary=bcaec50fe22741302e04d0357dfa

--bcaec50fe22741302e04d0357dfa
Content-Type: text/plain; charset=ISO-8859-1

Keith noted that my response didn't go back to the whole list.

-Eric

---------- Forwarded message ----------
From: Eric Newton <eric.newton@gmail.com>
Date: Tue, Dec 4, 2012 at 2:25 PM
Subject: Re: Tuning & Compactions
To: chris@burrell.me.uk


By "small indexes"... I mean they are small to read off disk.  If you write
a gigabyte of indexes, it's going to take some time to read them into RAM.
The index is a sub-set of all the keys in the RFile.  If you have lots of
keys in the index, the lookups can be faster, but it takes more time to
load those keys into RAM.  Keep your keys small, and try to keep the
sub-set of keys in the index small so that first lookup is fast.  A million
index keys for a billion key/values is not unreasonable.  We have used even
smaller ratios, especially when the files to be imported are constructed to
fit the current split points.

You can have an infinite number of families and qualifiers.  However, if
you ever want to put families into locality groups, it's easier to
configure them if the number of families you want in the group is a small
number.  A group separates families by name.

Using the example from the google BigTable paper: you can store small
indexed items, like URLs, separately from large value items, like whole web
pages, which will give you faster search over the small items, while
logically keeping them in the same sorted index.  URLs would go into one
group, which would be stored separately from another group containing the
whole web page and maybe something like image data.  A search on URLs would
not need to decompress and skip over large values while scanning.  Further,
URLs are more similar to themselves, than they are to images, and so are
likely to compress better when stored together.

To complicate things further, Accumulo does not create separate files for
each family group, as implied in the BigTable paper.  They are stored in
separate sections of the RFile.  They are also created lazily: as the data
is re-written, they will gradually be organized according to the locality
group specifications.  You can force a re-write, if you like.

If you find yourself wanting to put extensions in the column family that
have nothing to do with locality groups, just move it over to the column
qualifier.  We put carefully structured, binary data in the column
qualifier all the time.

-Eric


On Tue, Dec 4, 2012 at 1:06 PM, Chris Burrell <chris@burrell.me.uk> wrote:

> Thanks for all the comments below. Very helpful!
>
> On the last point, around "small indexes", do you mean if your set of keys
> is small, but having many column-families and column qualifiers? What order
> of magnitude would you consider to be small? A few million keys/billion
> keys? Or in another way, keys with 10s/100s of column families/qualifiers.
>
> I have another question around the use of column families and qualifiers.
> Would it be good or bad practice to have many column families/qualifiers
> per row.  I was just wondering if there would be any point in using these
> almost as extensions to the keys, i.e. the column family/qualifier would
> end up being the last part of the key. I understand column families can
> also be used to control how the data gets stored to maximize scanning too.
> I was just wondering if there would be drawbacks on having many of these.
>
> Chris
>
>
>
> On 28 November 2012 20:31, Eric Newton <eric.newton@gmail.com> wrote:
>
>> Some comments inlined below:
>>
>> On Wed, Nov 28, 2012 at 2:49 PM, Chris Burrell <chris@burrell.me.uk>wrote:
>>
>>> Hi
>>>
>>> I am trialling Accumulo on a small (tiny) cluster and wondering how the
>>> best way to tune it would be. I have 1 master + 2 tservers. The master has
>>> 8Gb of RAM and the tservers have each 16Gb each.
>>>
>>> I have set the walogs size to be 2Gb with an external memory map of 9G.
>>> The ratio is still the defaulted to 3. I've also upped the heap sizes of
>>> each tserver to 2Gb heaps.
>>>
>>> I'm trying to achieve high-speed ingest via batch writers held on
>>> several other servers. I'm loading two separate tables.
>>>
>>> Here are some questions I have:
>>> - Does the config above sound sensible? or overkill?
>>>
>>
>> Looks good to me, assuming you aren't doing other things (like
>> map/reduce) on the machines.
>>
>>
>>> - Is it preferable to have more servers with lower specs?
>>>
>> Yes.  Mostly to get more drives.
>>
>>
>>> - Is this the best way to maximise use of the memory?
>>>
>> It's not bad.  You may want to have larger block caches and a smaller
>> in-memory map.  But if you want to write-mostly, read-little, this is good.
>>
>>
>>> - Does the fact I have 3x2Gb walogs, means that the remaining 3Gb in the
>>> external memory map can be used while compactions occur?
>>>
>>
>> Yes.  You will want to increase the size or number of logs.  With that
>> many servers, failures will hopefully be very rare.  I would go with
>> changing 3 to 8.  Having lots of logs on a tablet is no big deal if you
>> have disk space, and don't expect many failures.
>>
>>
>>> - When minor compactions occur, does this halt ingest on that particular
>>> tablet? or tablet server?
>>>
>> Only if memory fills before the compactions finish. The monitor page will
>> indicate this by displaying "hold time."  When this happens the tserver
>> will self-tune and start minor compactions earlier with future ingest.
>>
>>
>>> - I have pre-split the tables six-ways, but not entirely sure if that's
>>> preferable if I only have 2 servers while trying it out? Perhaps 2 ways
>>> might be better?
>>>
>> Not for that reason, but to be able to use more cores concurrently.  Aim
>> for 50-100 tablets/node.
>>
>>
>>> - Does the batch upload through the shell client give significantly
>>> better performance stats?
>>>
>>
>> Using map/reduce to create RFiles is more efficient. But it also
>> increases latency: you only can see the data when the whole file is loaded.
>>
>> When a file is batch-loaded, its index is read, and the file is assigned
>> to matching tablets.  With small indexes, you can batch-load terabytes in
>> minutes.
>>
>> -Eric
>>
>>
>

--bcaec50fe22741302e04d0357dfa
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Keith noted that my response didn&#39;t go back to the whole list.<div><br>=
</div><div>-Eric<br><br><div class=3D"gmail_quote">---------- Forwarded mes=
sage ----------<br>From: <b class=3D"gmail_sendername">Eric Newton</b> <spa=
n dir=3D"ltr">&lt;<a href=3D"mailto:eric.newton@gmail.com">eric.newton@gmai=
l.com</a>&gt;</span><br>
Date: Tue, Dec 4, 2012 at 2:25 PM<br>Subject: Re: Tuning &amp; Compactions<=
br>To: <a href=3D"mailto:chris@burrell.me.uk">chris@burrell.me.uk</a><br><b=
r><br>By &quot;small indexes&quot;... I mean they are small to read off dis=
k. =A0If you write a gigabyte of indexes, it&#39;s going to take some time =
to read them into RAM. The index is a sub-set of all the keys in the RFile.=
 =A0If you have lots of keys in the index, the lookups can be faster, but i=
t takes more time to load those keys into RAM. =A0Keep your keys small, and=
 try to keep the sub-set of keys in the index small so that first lookup is=
 fast. =A0A million index keys for a billion key/values is not unreasonable=
. =A0We have used even smaller ratios, especially when the files to be impo=
rted are constructed to fit the current split points.<div>

<br></div><div>You can have an infinite number of families and qualifiers. =
=A0However, if you ever want to put families into locality groups, it&#39;s=
 easier to configure them if the number of families you want in the group i=
s a small number. =A0A group separates families by name.</div>

<div><br></div><div>Using the example from the google BigTable paper: you c=
an store small indexed items, like URLs, separately from large value items,=
 like whole web pages, which will give you faster search over the small ite=
ms, while logically keeping them in the same sorted index. =A0URLs would go=
 into one group, which would be stored separately from another group contai=
ning the whole web page and maybe something like image data. =A0A search on=
 URLs would not need to decompress and skip over large values while scannin=
g. =A0Further, URLs are more similar to themselves, than they are to images=
, and so are likely to compress better when stored together.</div>

<div><br></div><div>To complicate things further, Accumulo does not create =
separate files for each family group, as implied in the BigTable paper. =A0=
They are stored in separate sections of the RFile. =A0They are also created=
 lazily: as the data is re-written, they will gradually be organized accord=
ing to the locality group specifications. =A0You can force a re-write, if y=
ou like.</div>

<div><br></div><div>If you find yourself wanting to put extensions in the c=
olumn family that have nothing to do with locality groups, just move it ove=
r to the column qualifier. =A0We put carefully structured, binary data in t=
he column qualifier all the time.</div>
<span class=3D"HOEnZb"><font color=3D"#888888">
<div><br></div><div>-Eric</div></font></span><div class=3D"HOEnZb"><div cla=
ss=3D"h5"><div><br></div><div class=3D"gmail_extra"><br><br><div class=3D"g=
mail_quote">On Tue, Dec 4, 2012 at 1:06 PM, Chris Burrell <span dir=3D"ltr"=
>&lt;<a href=3D"mailto:chris@burrell.me.uk" target=3D"_blank">chris@burrell=
.me.uk</a>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><span style=3D"font-family:arial,sans-serif;=
font-size:13px">Thanks for all the comments below. Very helpful!=A0</span><=
div style=3D"font-family:arial,sans-serif;font-size:13px">

<br></div><div style=3D"font-family:arial,sans-serif;font-size:13px">
On the last point, around &quot;small indexes&quot;, do you mean if your se=
t of keys is small, but having many column-families and column qualifiers? =
What order of magnitude would you consider to be small? A few million keys/=
billion keys? Or in another way, keys with 10s/100s of column families/qual=
ifiers.</div>


<div style=3D"font-family:arial,sans-serif;font-size:13px"><br></div><div s=
tyle=3D"font-family:arial,sans-serif;font-size:13px">I have another questio=
n around the use of column families and qualifiers. Would it be good or bad=
 practice to have many column families/qualifiers per row. =A0I was just wo=
ndering if there would be any point in using these almost as extensions to =
the keys, i.e. the column family/qualifier would end up being the last part=
 of the key. I understand column families can also be used to control how t=
he data gets stored to maximize scanning too. I was just wondering if there=
 would be drawbacks on having many of these.</div>

<span><font color=3D"#888888">
<div style=3D"font-family:arial,sans-serif;font-size:13px"><br></div><div s=
tyle=3D"font-family:arial,sans-serif;font-size:13px">Chris</div></font></sp=
an><div><div><div style=3D"font-family:arial,sans-serif;font-size:13px">
<br></div><div class=3D"gmail_extra">
<br><br><div class=3D"gmail_quote">On 28 November 2012 20:31, Eric Newton <=
span dir=3D"ltr">&lt;<a href=3D"mailto:eric.newton@gmail.com" target=3D"_bl=
ank">eric.newton@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gm=
ail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-le=
ft:1ex">


Some comments inlined below:<br><div class=3D"gmail_extra"><br><div class=
=3D"gmail_quote"><div>On Wed, Nov 28, 2012 at 2:49 PM, Chris Burrell <span =
dir=3D"ltr">&lt;<a href=3D"mailto:chris@burrell.me.uk" target=3D"_blank">ch=
ris@burrell.me.uk</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;p=
adding-left:1ex">Hi<div><br></div><div>I am trialling Accumulo on a small (=
tiny) cluster and wondering how the best way to tune it would be. I have 1 =
master + 2 tservers. The master has 8Gb of RAM and the tservers have each 1=
6Gb each.</div>


<div><br></div><div>I have set the walogs size to be 2Gb with an external m=
emory map of 9G. The ratio is still the defaulted to 3. I&#39;ve also upped=
 the heap sizes of each tserver to 2Gb heaps.=A0</div><div><br></div><div>


I&#39;m trying to achieve high-speed ingest via batch writers held on sever=
al other servers. I&#39;m loading two separate tables.=A0</div><div><br></d=
iv><div>Here are some questions I have:</div><div>- Does the config above s=
ound sensible? or overkill?</div>


</blockquote><div>=A0</div></div><div>Looks good to me, assuming you aren&#=
39;t doing other things (like map/reduce) on the machines.</div><div><div>=
=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8e=
x;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-styl=
e:solid;padding-left:1ex">


<div>-=A0Is it preferable to have more servers with lower specs?</div></blo=
ckquote></div><div>Yes. =A0Mostly to get more drives.</div><div><div>=A0</d=
iv><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;bord=
er-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:soli=
d;padding-left:1ex">


<div>-=A0Is this the best way to maximise use of the memory?</div></blockqu=
ote></div><div>It&#39;s not bad. =A0You may want to have larger block cache=
s and a smaller in-memory map. =A0But if you want to write-mostly, read-lit=
tle, this is good.</div>


<div>
<div>=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px=
 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left=
-style:solid;padding-left:1ex"><div>-=A0Does the fact I have 3x2Gb walogs, =
means that the remaining 3Gb in the external memory map can be used while c=
ompactions occur?</div>


</blockquote><div><br></div></div><div>Yes. =A0You will want to increase th=
e size or number of logs. =A0With that many servers, failures will hopefull=
y be very rare. =A0I would go with changing 3 to 8. =A0Having lots of logs =
on a tablet is no big deal if you have disk space, and don&#39;t expect man=
y failures.</div>


<div>
<div>=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px=
 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left=
-style:solid;padding-left:1ex">
<div>-=A0When minor compactions occur, does this halt ingest on that partic=
ular tablet? or tablet server?</div></blockquote></div><div>Only if memory =
fills before the compactions finish. The monitor page will indicate this by=
 displaying &quot;hold time.&quot; =A0When this happens the tserver will se=
lf-tune and start minor compactions earlier with future ingest. =A0</div>


<div>
<div>=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px=
 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left=
-style:solid;padding-left:1ex"><div>-=A0I have pre-split the tables six-way=
s, but not entirely sure if that&#39;s preferable if I only have 2 servers =
while trying it out? Perhaps 2 ways might be better?</div>


</blockquote></div><div>Not for that reason, but to be able to use more cor=
es concurrently. =A0Aim for 50-100 tablets/node.</div><div><div>=A0<br></di=
v><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;borde=
r-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid=
;padding-left:1ex">


<div>-=A0Does the batch upload through the shell client give significantly =
better performance stats?</div></blockquote><div>=A0</div></div><div>Using =
map/reduce to create RFiles is more efficient. But it also increases latenc=
y: you only can see the data when the whole file is loaded.</div>


<div><br></div><div>When a file is batch-loaded, its index is read, and the=
 file is assigned to matching tablets. =A0With small indexes, you can batch=
-load terabytes in minutes.=A0</div><span><font color=3D"#888888"><div>
=A0<br></div><div>-Eric</div><div><br>
</div></font></span></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></div><br></div>

--bcaec50fe22741302e04d0357dfa--