hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-842) Serialize NN edits log as avro records
Date Tue, 12 Jan 2010 18:57:54 GMT

    [ https://issues.apache.org/jira/browse/HDFS-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12799323#action_12799323

Konstantin Shvachko commented on HDFS-842:

25% loss in performance during startups seems very high. Longer startup contributes to the
cluster down time, which is considered a critical metric by most of my customers. I can live
with 10%, but I haven't asked those customers yet.

50% increase in writing is simply unacceptable imo. This means you slow down the whole name-node
operation, because it logs every transaction, which modifies the namespace.

Periodic checkpoints happen either if the next period ended or if edits file reached a certain
size. If Avro increases edits two times then checkpoints will happen twice as often because
the edits size threshold will be reached faster on the same data. Saving edits per se in the
checkpoint process is negligible, because you always save the empty edits once the log is
digested. It is the frequency of checkpoints that suffer.

> Serialize NN edits log as avro records
> --------------------------------------
>                 Key: HDFS-842
>                 URL: https://issues.apache.org/jira/browse/HDFS-842
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: name-node
>            Reporter: Todd Lipcon
> Right now, the edits log is a mishmash of ad-hoc serialization and Writables. Switching
it over to Avro records would be really useful for operator tools - an "offline edits viewer"
would become trivial ("avrocat")

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message