accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Tillotson <>
Subject WAL - rate limiting factor x4.67
Date Wed, 04 Dec 2013 10:14:33 GMT
I've been trying to get the most out of streaming data into Accumulo 1.5 (Hadoop Cloudera CDH4).
Having tried a number of settings, re-writing client code etc I finally switched off the Write
Ahead Log (table.walog.enabled=false) and saw a huge leap in ingest performance. 

Ingest with table.walog.enabled= true:   ~6 MB/s
Ingest with table.walog.enabled= false:  ~28 MB/s

That is a factor of about x4.67 speed improvement. 

Now my use case could probably live without or work around not having a wal, but I wondered
if this was a known issue?? 
(didn't see anything in jira), wal seem to be a significant rate limiter this is either endemic
to Accumulo or an HDFS / setup issue. Though given everything is in HDFS these days and otherwise
IO flies it looks like Accumulo WAL is the most likely culprit.   

I don't believe this to be an IO issue on the box, with wal off the is significantly more
IO (up to 80M/s reported by dstat), with wal on (up to 12M/s reported by dstat). Testing
the box with FIO sequential write is 160M/s. 

Further info: 
Hadoop 2.00 (Cloudera cdh4)
Accumulo (1.5.0)
Zookeeper ( with Netty, minor improvement of <1MB/s  )
Filesystem ( HDFS is ZFS, compression=on, dedup=on, otherwise ext4 )

With large imports from scratch now I start off CPU bound and as more shuffling is needed
this becomes Disk bound later in the import as expected. So I know pre-splitting would probably
sort it.


View raw message