accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Bennight <ch...@slowcar.net>
Subject Re: Persistent WAL files on 1.5
Date Sun, 30 Mar 2014 17:07:59 GMT
Nope, no entries in memory (well, a few hundred on the trace + metadata) -
in trying for easy solutions I already cycled through the existing tables
and ran flush/compact cycles.

But thank for the GC pointer (and apologies, in retrospect it should have
been obvious)

It looks like I needed to bump up the memory for the GC process (was at
256, increased it to 512) - I had the GC log tailing with a filter for
ERROR or WARN, and didn't see anything, so assumed things were fine.
 (Instead it was just because the process got killed due to heapspace).
 There were about just over 1M candidates for deletion - about 10 minutes,
but it's back to normal now.

Thanks!


On Sun, Mar 30, 2014 at 10:42 AM, Sean Busbey <busbey+lists@cloudera.com>wrote:

> Hi Chris!
>
> If you look at the "master" page on the monitor, does it show a large
> number of entries in memory?
>
> If you look at the "garbage collector" page on the monitor, what does it
> report for the last cycle?
>
> Can you upload the log file(s) for the GC and a tablet server somewhere?
>
> -Sean
> On Mar 30, 2014 9:09 AM, "Chris Bennight" <chris@slowcar.net> wrote:
>
>> Cluster is a 5 node VM based  accumulo 1.5 / cdh 4.5 instance.
>>  Replication factor of 2.
>>
>> It's a dev instance, so nothing critical (though I would like to not
>> loose data there as it represents a week or so to re-ingest & process)
>>
>> It recently ran out of space during in ingest, so I cleared out some
>> tables which were no longer being used.    I didn't recover much of the
>> free space, and really the total usage  (~6TB seemed  much higher than the
>> number of entries (~50Billion) - knowing that none of the entries were
>> especially large)
>>
>> -bash-4.1$ hadoop fs -du -h /accumulo/
>> 0            /accumulo/instance_id
>> 58.5K     /accumulo/lib
>> 3.5G      /accumulo/recovery
>> 118.2G  /accumulo/tables
>> 0           /accumulo/version
>> 2.5T      /accumulo/wal
>>
>> -bash-4.1$ hadoop fs -du -h /accumulo/wal/
>> 495.2G     /accumulo/wal/10.10.10.51:9997
>> 541.3G     /accumulo/wal/10.10.10.52:9997
>> 515.7G     /accumulo/wal/10.10.10.53:9997
>> 474.3G     /accumulo/wal/10.10.10.54:9997
>> 562.5G     /accumulo/wal/10.10.10.55:9997
>>
>>
>> As I mentioned, it's a dev cluster so it's entirely possible some wierd
>> confluence of events happened previously to cause this - what I'm more
>> concerned about is how to I recover that space.   I'm not worried at this
>> point about any information that might be in the WAL files.
>>
>> Accumulo itself has been restarted a few times for various reasons.
>>
>> The only notable log entry is in the tserver log file are the
>> [tabletserver.TabletServer] WARN : Running low on memory
>> occuring ~ 15 times a second.   Tserver memory settings don't seem to
>> impact this  (8GB allocated to tservers, bloom filters are on, as are block
>> cache (2GB), index cache (1GB), native memory maps (1GB)
>>
>> Otherwise I don't see anything out of norm in the master, monitor, gc, or
>> tracer files (on master)
>>
>>
>>

Mime
View raw message