accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Bennight <>
Subject Persistent WAL files on 1.5
Date Sun, 30 Mar 2014 14:09:12 GMT
Cluster is a 5 node VM based  accumulo 1.5 / cdh 4.5 instance.  Replication
factor of 2.

It's a dev instance, so nothing critical (though I would like to not loose
data there as it represents a week or so to re-ingest & process)

It recently ran out of space during in ingest, so I cleared out some tables
which were no longer being used.    I didn't recover much of the free
space, and really the total usage  (~6TB seemed  much higher than the
number of entries (~50Billion) - knowing that none of the entries were
especially large)

-bash-4.1$ hadoop fs -du -h /accumulo/
0            /accumulo/instance_id
58.5K     /accumulo/lib
3.5G      /accumulo/recovery
118.2G  /accumulo/tables
0           /accumulo/version
2.5T      /accumulo/wal

-bash-4.1$ hadoop fs -du -h /accumulo/wal/
495.2G     /accumulo/wal/
541.3G     /accumulo/wal/
515.7G     /accumulo/wal/
474.3G     /accumulo/wal/
562.5G     /accumulo/wal/

As I mentioned, it's a dev cluster so it's entirely possible some wierd
confluence of events happened previously to cause this - what I'm more
concerned about is how to I recover that space.   I'm not worried at this
point about any information that might be in the WAL files.

Accumulo itself has been restarted a few times for various reasons.

The only notable log entry is in the tserver log file are the
[tabletserver.TabletServer] WARN : Running low on memory
occuring ~ 15 times a second.   Tserver memory settings don't seem to
impact this  (8GB allocated to tservers, bloom filters are on, as are block
cache (2GB), index cache (1GB), native memory maps (1GB)

Otherwise I don't see anything out of norm in the master, monitor, gc, or
tracer files (on master)

View raw message