accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Supporting large values
Date Wed, 28 May 2014 14:24:45 GMT
On 5/28/14, 9:39 AM, Bill Havanki wrote:
> Thanks Josh!
>
> - This is indeed under CDH 4.6.0. If there is a particular line number you
> want to see code for, just name it and I'll look it up.

I was generally curious to see what kind of batching the DfsOutputStream 
does (it looked like it was checksumming small chunks of data), but I 
can look into that some more to satisfy my curiosity.

> - Re #2, the test client is sending mutations of only one cell each, so a
> mutation should be 100 MB + a little, due to the large value. It's
> inefficient, but it seems to be a good idea just for getting this test to
> survive. Maybe the logger code is hanging on to mutations in memory before
> writing them out? (That would surprise me, but I dunno.)

Well, I think you're going to have to be able to keep "about" two copies 
in memory (what I was trying to get at before). The tserver is going to 
get the Mutation objects from the client. So, that's one instance of, 
say, 100MB. Before that write finishes, you'll also need to write those 
out to the WAL, which means that you'll be serializing each Mutation 
using the Writable methods, which, while it isn't quite the same as 
having a discrete object of that size on the heap, you're still writing 
out those bytes to the DataOutput which are going to be buffered through 
JVM heap.

> Another fact I didn't mention is that I am running 2 writers and 2 readers
> for the test. Perhaps 612 and 613 are the write threads, and then 615 is
> one scan, which might leave 614 as the remains of the other scan, which has
> already failed and is logging an OOME (which is what the monitor shows)?

Perhaps! That might make sense.

> My thought from looking at this again is that Thrift is running out of
> space forming the scan result message as it fills up a
> ByteArrayOutputStream. Maybe there is some way to force Thrift to break
> things up?

I don't know of anything inside of thrift that we could use to do that.

Overall, though, what's your intent by testing this? Is it to have a 
better understanding of server-side memory usage? Generally speaking, if 
you have clients getting back 100MB values and the server is writing 
100MB values, that would intuitively use up a bit of heap space.

I could see merit in constructing a general formula for memory 
consumption based on avg key-value size, number of threads available to 
read, number of threads available to write, and number of MinC/MajC 
threads. It probably wouldn't be much more valuable than a starting 
point due to variance, but it would be a starting point!

> Thanks for burning cycles on this.
>
> Bill
>
>
> On Tue, May 27, 2014 at 7:11 PM, Josh Elser <josh.elser@gmail.com> wrote:
>
>> Well, for this one, it looks to me that you have two threads writing data
>> (ClientPool 612 and 613), with 612 being blocked by 613. There are two
>> threads reading data, but they both appear to be in nativemap code, so I
>> don't expect too much memory usage from them. ClientPool 615 is the thrift
>> call for one of those scans. I'm not quite sure what ClientPool 614 is
>> doing.
>>
>> Much hunch is that 613 is what actually pushed you into the OOME. I can't
>> really say much more because I assume you're running on CDH as the line
>> numbers don't match up to the Hadoop sources I have locally.
>>
>> I don't think there's much inside the logger code that will hold onto
>> duplicate mutations, so the two things I'm curious about are:
>>
>> 1. Any chunking/buffering done inside of the DFSOutputStream (and if we
>> should be using/configuring something differently). I see some signs of
>> this from the method names in the stack trace.
>>
>> 2. Figuring out a formula for sizes of Mutations that are directly (via
>> (Server)Mutation objects on heap) or indirectly (being written out to some
>> OutputStream, like the DfsOutputStream previously mentioned), relative to
>> the Accumulo configuration.
>>
>> I imagine #2 is where the most value we could gain would come from.
>>
>> Hopefully that brain dump is helpful :)
>>
>>
>> On 5/27/14, 6:19 PM, Bill Havanki wrote:
>>
>>> Stack traces are here:
>>>
>>> https://gist.github.com/kbzod/e6e21ea15cf5670ba534
>>>
>>> This time something showed up in the monitor, often there is no stack
>>> trace
>>> there. The thread dump is from setting ACCUMULO_KILL_CMD to "kill -3 %p".
>>>
>>> Thanks again
>>> Bill
>>>
>>>
>>> On Tue, May 27, 2014 at 5:09 PM, Bill Havanki <bhavanki@clouderagovt.com>
>>> wrote:
>>>
>>>   I left the default key size constraint in place. I had set the tserver
>>>> mesage size up from 1 GB to 1.5 GB, but it didn't help. (I forgot that
>>>> config item.)
>>>>
>>>> Stack trace(s) coming up! I got tired of failures all day so I'm running
>>>> a
>>>> different test that will hopefully work. I'll re-break it shortly :D
>>>>
>>>>
>>>> On Tue, May 27, 2014 at 5:04 PM, Josh Elser <josh.elser@gmail.com>
>>>> wrote:
>>>>
>>>>   Stack traces would definitely be helpful, IMO.
>>>>>
>>>>> (or interesting if nothing else :D)
>>>>>
>>>>>
>>>>> On 5/27/14, 4:55 PM, Bill Havanki wrote:
>>>>>
>>>>>   No sir. I am seeing general out of heap space messages, nothing about
>>>>>> direct buffers. One specific example would be while Thrift is writing
>>>>>> to
>>>>>> a
>>>>>> ByteArrayOutputStream to send off scan results. (I can get an exact
>>>>>> stack
>>>>>> trace - easily :} - if it would be helpful.) It seems as if there
just
>>>>>> isn't enough heap left, after controlling for what I have so far.
>>>>>>
>>>>>> As a clarification of my original email: each row has 100 cells,
and
>>>>>> each
>>>>>> cell has a 100 MB value. So, one row would occupy just over 10 GB.
>>>>>>
>>>>>>
>>>>>> On Tue, May 27, 2014 at 4:49 PM, <dlmarion@comcast.net> wrote:
>>>>>>
>>>>>>    Are you seeing something similar to the error in
>>>>>>
>>>>>>> https://issues.apache.org/jira/browse/ACCUMULO-2495?
>>>>>>>
>>>>>>> ----- Original Message -----
>>>>>>>
>>>>>>> From: "Bill Havanki" <bhavanki@clouderagovt.com>
>>>>>>> To: "Accumulo Dev List" <dev@accumulo.apache.org>
>>>>>>> Sent: Tuesday, May 27, 2014 4:30:59 PM
>>>>>>> Subject: Supporting large values
>>>>>>>
>>>>>>> I'm trying to run a stress test where each row in a table has
100
>>>>>>> cells,
>>>>>>> each with a value of 100 MB of random data. (This is using Bill
>>>>>>> Slacum's
>>>>>>> memory stress test tool). Despite fiddling with the cluster
>>>>>>> configuration,
>>>>>>> I always run out of tablet server heap space before too long.
>>>>>>>
>>>>>>> Here are the configurations I've tried so far, with valuable
guidance
>>>>>>> from
>>>>>>> Busbey and madrob:
>>>>>>>
>>>>>>> - native maps are enabled, tserver.memory.maps.max = 8G
>>>>>>> - table.compaction.minor.logs.threshold = 8
>>>>>>> - tserver.walog.max.size = 1G
>>>>>>> - Tablet server has 4G heap (-Xmx4g)
>>>>>>> - table is pre-split into 8 tablets (split points 0x20, 0x40,
0x60,
>>>>>>> ...), 5
>>>>>>> tablet servers are available
>>>>>>> - tserver.cache.data.size = 256M
>>>>>>> - tserver.cache.index.size = 40M (keys are small - 4 bytes -
in this
>>>>>>> test)
>>>>>>> - table.scan.max.memory = 256M
>>>>>>> - tserver.readahead.concurrent.max = 4 (default is 16)
>>>>>>>
>>>>>>> It's often hard to tell where the OOM error comes from, but I
have
>>>>>>> seen
>>>>>>> it
>>>>>>> frequently coming from Thrift as it is writing out scan results.
>>>>>>>
>>>>>>> Does anyone have any good conventions for supporting large values?
>>>>>>> (Warning: I'll want to work on large keys (and tiny values) next!
:) )
>>>>>>>
>>>>>>> Thanks very much
>>>>>>> Bill
>>>>>>>
>>>>>>> --
>>>>>>> // Bill Havanki
>>>>>>> // Solutions Architect, Cloudera Govt Solutions
>>>>>>> // 443.686.9283
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>> --
>>>> // Bill Havanki
>>>> // Solutions Architect, Cloudera Govt Solutions
>>>> // 443.686.9283
>>>>
>>>>
>>>
>>>
>>>
>
>

Mime
View raw message