accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Newton (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-578) consider using hdfs for the walog
Date Thu, 24 May 2012 16:32:29 GMT


Eric Newton commented on ACCUMULO-578:

More thinking about file GC: we can eliminate the possibility of removing a log file that
is in use by asking the tserver to remove the log, instead of doing it in the GC.

# open the file and begin using it, in a directory named for the tserver address
# write references to the log into the !METADATA table, as the log is used
# tablet server removes references to logs as tablets flush to disk
# gc asks the tserver to remove the file when it sees no METADATA table references
#* the tablet server ignores the request if it is still using the log
# master will assign log sorts when it finds an unassigned tablet with log references
#* log sorts need to recover the lease on the file to prevent stray updates from appearing
#* log sorts should be monitored, perhaps made into a FATE operation
# once a tablet's logs have been sorted, the tablet is assigned by the master
# gc will remove sorted log entries when all references to the logs have been removed
# as always, checks against the !METADATA table have to use the special consistency checking

> consider using hdfs for the walog
> ---------------------------------
>                 Key: ACCUMULO-578
>                 URL:
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: logger, tserver
>    Affects Versions: 1.5.0-SNAPSHOT
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>         Attachments: HDFS_WAL_states.pdf, NNOpsComparison.pdf, comparison.png
> Using HDFS for walogs would fix:
>  * ACCUMULO-84: any node can read the replicated files
>  * ACCUMULO-558: wouldn't need to monitor loggers
>  * ACCUMULO-544: log references wouldn't include hostnames
>  * ACCUMULO-423: wouldn't need to monitor loggers
>  * ACCUMULO-258: hdfs has load balancing already
> To implement it, we would need the ability to distribute log sorts.
> Continuing to use loggers helps us avoid:
>  * hdfs pipeline strategy
>  * we don't have fine-grained insight when a single node makes dfs slow
>  * additional namenode pressure
>  * flexibility: for example, we can add fadvise() calls to the logger before HDFS supports

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message