Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 63AFDF12A for ; Thu, 4 Apr 2013 14:47:55 +0000 (UTC) Received: (qmail 7100 invoked by uid 500); 4 Apr 2013 14:47:55 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 7032 invoked by uid 500); 4 Apr 2013 14:47:54 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 7012 invoked by uid 99); 4 Apr 2013 14:47:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Apr 2013 14:47:54 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of marcreichman@gmail.com designates 209.85.128.41 as permitted sender) Received: from [209.85.128.41] (HELO mail-qe0-f41.google.com) (209.85.128.41) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Apr 2013 14:47:50 +0000 Received: by mail-qe0-f41.google.com with SMTP id b10so227140qen.0 for ; Thu, 04 Apr 2013 07:47:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=x4X7+PI6zhwXDzX3vHUNLch9rU1c8ucahQK+r/3FBIo=; b=d4+1gQCU/zLIR/SMVqfbjTd8lJBIvJliAjEYreqnbGS7pFeNBaZOObjLuZEQUpstf7 fewOQUMpYOPMKhRLutyb6iTbNpZAAgso5Wam369uYlgcTC9jLglmIVcU7ymCGEVN8ED0 ktP0AI1DoM5iKPvrzPjUrCvqWcFkrWZ58r0gc76SBgGSucM3oK5EzhJGC3qJZJjXeC+k rM2I1EbDETjlqSMJK+gnwVlDcFYht7kC6Fb00XUwJqOStKuu//1Nj6O6ciT095Wbzy3r hGGbi4RVrnwGAyweJEbCkFNEgdEVQ1aJ740wVW9wj2UOyeUJpMRZziYLNWOH7KdvNI2y qlAQ== MIME-Version: 1.0 X-Received: by 10.49.61.226 with SMTP id t2mr5757727qer.40.1365086849559; Thu, 04 Apr 2013 07:47:29 -0700 (PDT) Received: by 10.49.26.38 with HTTP; Thu, 4 Apr 2013 07:47:29 -0700 (PDT) In-Reply-To: References: <515AF40F.7040600@gmail.com> Date: Thu, 4 Apr 2013 09:47:29 -0500 Message-ID: Subject: Re: increase "running scans" in monitor? From: Marc Reichman To: user Content-Type: multipart/alternative; boundary=047d7bdc1032c178c004d98a0c35 X-Virus-Checked: Checked by ClamAV on apache.org --047d7bdc1032c178c004d98a0c35 Content-Type: text/plain; charset=UTF-8 Hi Keith, I think I concluded that my map task is a little more CPU-bound, and when I'm not in my map code I'm running scans, thus, if I'm running a lot of scans, I'm not spending enough time in my map task. Sound reasonable? We were pointed to this due to a customer who indicated that maybe we had a setting wrong as to be "not seeing enough scans" given the map task slot count. Our use of accumulo is fairly basic at this point. Our map task could work fine with a SequenceFile or MapFile directly, but we went with HBase and then Accumulo because we will need to add/remove single pieces of data regularly, which doesn't really scale with the direct-HDFS approach. As such, my mapreduce jobs are using the AccumuloRowInputFormat and operating on every row in sequence, no explicit other iterators, no locality, no authorizations. I will run listscans next time I'm running and see if it paints a better picture. Thanks for writing back! Marc On Wed, Apr 3, 2013 at 8:15 PM, Keith Turner wrote: > On Tue, Apr 2, 2013 at 11:35 AM, Marc Reichman > wrote: > > I apologize, I neglected to include row counts. For the above split sizes > > mentioned, there are roughly ~55K rows, ~300K rows, ~800K rows, and ~2M > > rows. > > > > I'm not necessarily hard-set on the idea that lower "running scans" are > > affecting my overall job time negatively, and I realize that my jobs > > themselves may simply be starving the tablet servers (cpu-wise). In my > > experiences thus-far, running all 8 CPU cores per node leads to an > overall > > quicker job completion than pulling one core out of the mix to let > accumulo > > itself have more breathing room. > > Scans in accumulo fetch batches of key/values. When a scan is > fetching one of these batches and storing it in a buffer on the tablet > server, its counted as running. While that batch is being serialized > and sent to the client its not counted as running. In my experience > the speed at which a batch of key values can be read from RFiles, is > much faster than the speed at which a batch can be serialized, sent to > client, and then deserialized. Maybe this explains what you are > seeing. > > Have you tried running listscans in the shell while your map reduce > job is running? This will show all of the mappers scan sessions. For > each scan session you can see its state, the running state should > correspond to the run count on the monitor page. > > I suspect if you ran a map reduce job that pushed a lot of work into > iterators on the tablet servers, then you would see much higher > running scans counts. For example if your mappers setup a filter > that only returned 1/20th of the data, then scans would spend a lot > more time reading a batch of data relative to the time spent > transmitting a batch of data. > > > > > > > On Tue, Apr 2, 2013 at 10:20 AM, Marc Reichman > > wrote: > >> > >> Hi Josh, > >> > >> Thanks for writing back. I am doing all explicit splits using addSplits > in > >> the Java API since the keyspace is easy to divide evenly. Depending on > the > >> table size for some of these experiments, I've had 128 splits, 256, > 512, or > >> 1024 splits. My jobs are executing properly, MR-wise, in the sense that > I do > >> have a proper amount of map tasks created (as the count of splits above, > >> respectively). My concern is that the jobs may not be quite as busy as > they > >> can be, dataflow-wise and I think the "Running Scans" per table/tablet > >> server seem to be good indicators of that. > >> > >> My data is a 32-byte key (an md5 value), and I have one column family > with > >> 3 columns which contain "bigger" data, anywhere from 50-100k to an > >> occasional 10M-15M piece. > >> > >> > >> On Tue, Apr 2, 2013 at 10:06 AM, Josh Elser > wrote: > >>> > >>> Hi Marc, > >>> > >>> How many tablets are in the table you're running MR over (see the > >>> monitor)? Might adding some more splits to your table (`addsplits` in > the > >>> Accumulo shell) get you better parallelism? > >>> > >>> What does your data look like in your table? Lots of small rows? Few > very > >>> large rows? > >>> > >>> > >>> On 4/2/13 10:56 AM, Marc Reichman wrote: > >>>> > >>>> Hello, > >>>> > >>>> I am running a accumulo-based MR job using the AccumuloRowInputFormat > on > >>>> 1.4.1. Config is more-or-less default, using the native-standalone 3GB > >>>> template, but with the TServer memory put up to 2GB in > accumulo-env.sh from > >>>> its default. accumulo-site.xml has tserver.memory.maps.max at 1G, > >>>> tserver.cache.data.size at 50M, and tserver.cache.index.size at 512M. > >>>> > >>>> My tables are created with maxversions for all three types (scan, > minc, > >>>> majc) at 1 and compress type as gz. > >>>> > >>>> I am finding, on an 8 node test cluster with 64 map task slots, that > >>>> when a job is running, the 'Running Scans' count in the monitor is > roughly > >>>> 0-4 on average for each tablet server. When viewed at the table view, > this > >>>> puts the running scans anywhere from 4-24 on average. I would > expect/hope > >>>> the scans to be somewhere close to the map task count. To me, this > means one > >>>> of the following. > >>>> 1. There is a configuration setting inhibiting the amount of scans > from > >>>> accumulating (excuse the pun) to about the same amount as my map tasks > >>>> 2. My map task job is cpu-intensive enough to introduce delays between > >>>> scans and everything is fine > >>>> 3. Some combination of 1/2. > >>>> > >>>> On an alternate cluster, 40 nodes with 320 task slots, we haven't seen > >>>> anywhere near full capacity scanning with map tasks which have the > same > >>>> performance, and the problem seems much worse. > >>>> > >>>> I am experimenting with some of the readahead configuration variables > >>>> for the tablet servers in the meantime, but haven't found any smoking > guns > >>>> yet. > >>>> > >>>> Thank you, > >>>> Marc > >>>> > >>>> > >>>> -- > >>>> http://saucyandbossy.wordpress.com > >>> > >>> > >> > >> > >> > >> -- > >> http://saucyandbossy.wordpress.com > > > > > > > > > > -- > > http://saucyandbossy.wordpress.com > -- http://saucyandbossy.wordpress.com --047d7bdc1032c178c004d98a0c35 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Keith,

I think I concluded tha= t my map task is a little more CPU-bound, and when I'm not in my map co= de I'm running scans, thus, if I'm running a lot of scans, I'm = not spending enough time in my map task. Sound reasonable? We were pointed = to this due to a customer who indicated that maybe we had a setting wrong a= s to be "not seeing enough scans" given the map task slot count.<= /div>

Our use of accumulo is fairly basic at this= point. Our map task could work fine with a SequenceFile or MapFile directl= y, but we went with HBase and then Accumulo because we will need to add/rem= ove single pieces of data regularly, which doesn't really scale with th= e direct-HDFS approach. As such, my mapreduce jobs are using the AccumuloRo= wInputFormat and operating on every row in sequence, no explicit other iter= ators, no locality, no authorizations.

I will run listscans next time I'm runn= ing and see if it paints a better picture.

Thanks for writing back!

Marc


On Wed,= Apr 3, 2013 at 8:15 PM, Keith Turner <keith@deenlo.com> wrot= e:
On Tue, Apr 2, 2013 at 11:= 35 AM, Marc Reichman <marcreic= hman@gmail.com> wrote:
> I apologize, I neglected to include row counts. For the above split si= zes
> mentioned, there are roughly ~55K rows, ~300K rows, ~800K rows, and ~2= M
> rows.
>
> I'm not necessarily hard-set on the idea that lower "running = scans" are
> affecting my overall job time negatively, and I realize that my jobs > themselves may simply be starving the tablet servers (cpu-wise). In my=
> experiences thus-far, running all 8 CPU cores per node leads to an ove= rall
> quicker job completion than pulling one core out of the mix to let acc= umulo
> itself have more breathing room.

Scans in accumulo fetch batches of key/values. =C2=A0 When a scan is<= br> fetching one of these batches and storing it in a buffer on the tablet
server, its counted as running. =C2=A0While that batch is being serialized<= br> and sent to the client its not counted as running. =C2=A0In my experience the speed at which a batch of key values can be read from RFiles, is
much faster than the speed at which a batch can be serialized, sent to
client, and then deserialized. =C2=A0Maybe this explains what you are
seeing.

Have you tried running listscans in the shell while your map reduce
job is running? =C2=A0This will show all of the mappers scan sessions. =C2= =A0For
each scan session you can see its state, the running state should
correspond to the run count on the monitor page.

I suspect if you ran a map reduce job that pushed a lot of work into
iterators on the tablet servers, then you would see much higher
running scans counts. =C2=A0 For example if your mappers setup a filter
that only returned 1/20th of the data, then scans would spend a lot
more time reading a batch of data relative to the time spent
transmitting a batch of data.

>
>
> On Tue, Apr 2, 2013 at 10:20 AM, Marc Reichman <marcreichman@gmail.com>
> wrote:
>>
>> Hi Josh,
>>
>> Thanks for writing back. I am doing all explicit splits using addS= plits in
>> the Java API since the keyspace is easy to divide evenly. Dependin= g on the
>> table size for some of these experiments, I've had 128 splits,= 256, 512, or
>> 1024 splits. My jobs are executing properly, MR-wise, in the sense= that I do
>> have a proper amount of map tasks created (as the count of splits = above,
>> respectively). My concern is that the jobs may not be quite as bus= y as they
>> can be, dataflow-wise and I think the "Running Scans" pe= r table/tablet
>> server seem to be good indicators of that.
>>
>> My data is a 32-byte key (an md5 value), and I have one column fam= ily with
>> 3 columns which contain "bigger" data, anywhere from 50-= 100k to an
>> occasional 10M-15M piece.
>>
>>
>> On Tue, Apr 2, 2013 at 10:06 AM, Josh Elser <josh.elser@gmail.com> wrote:
>>>
>>> Hi Marc,
>>>
>>> How many tablets are in the table you're running MR over (= see the
>>> monitor)? Might adding some more splits to your table (`addspl= its` in the
>>> Accumulo shell) get you better parallelism?
>>>
>>> What does your data look like in your table? Lots of small row= s? Few very
>>> large rows?
>>>
>>>
>>> On 4/2/13 10:56 AM, Marc Reichman wrote:
>>>>
>>>> Hello,
>>>>
>>>> I am running a accumulo-based MR job using the AccumuloRow= InputFormat on
>>>> 1.4.1. Config is more-or-less default, using the native-st= andalone 3GB
>>>> template, but with the TServer memory put up to 2GB in acc= umulo-env.sh from
>>>> its default. accumulo-site.xml has tserver.memory.maps.max= at 1G,
>>>> tserver.cache.data.size at 50M, and tserver.cache.index.si= ze at 512M.
>>>>
>>>> My tables are created with maxversions for all three types= (scan, minc,
>>>> majc) at 1 and compress type as gz.
>>>>
>>>> I am finding, on an 8 node test cluster with 64 map task s= lots, that
>>>> when a job is running, the 'Running Scans' count i= n the monitor is roughly
>>>> 0-4 on average for each tablet server. When viewed at the = table view, this
>>>> puts the running scans anywhere from 4-24 on average. I wo= uld expect/hope
>>>> the scans to be somewhere close to the map task count. To = me, this means one
>>>> of the following.
>>>> 1. There is a configuration setting inhibiting the amount = of scans from
>>>> accumulating (excuse the pun) to about the same amount as = my map tasks
>>>> 2. My map task job is cpu-intensive enough to introduce de= lays between
>>>> scans and everything is fine
>>>> 3. Some combination of 1/2.
>>>>
>>>> On an alternate cluster, 40 nodes with 320 task slots, we = haven't seen
>>>> anywhere near full capacity scanning with map tasks which = have the same
>>>> performance, and the problem seems much worse.
>>>>
>>>> I am experimenting with some of the readahead configuratio= n variables
>>>> for the tablet servers in the meantime, but haven't fo= und any smoking guns
>>>> yet.
>>>>
>>>> Thank you,
>>>> Marc
>>>>
>>>>
>>>> --
>>>> http://saucyandbossy.wordpress.com
>>>
>>>
>>
>>
>>
>> --
>> h= ttp://saucyandbossy.wordpress.com
>
>
>
>
> --
> http:= //saucyandbossy.wordpress.com



--
= http://saucyandbossy.wordpre= ss.com
--047d7bdc1032c178c004d98a0c35--