hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9260) Improve the performance and GC friendliness of NameNode startup and full block reports
Date Tue, 10 Jan 2017 19:12:58 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15815889#comment-15815889
] 

Daryn Sharp commented on HDFS-9260:
-----------------------------------

I'd like to propose this patch be reverted on the NN, but perhaps not the DN.  Partially because
it complicates porting HDFS-11310, but also due to performance concerns.

Synopsis of the design change: Reference-dense datastructures with high mutation rates are
the enemy of young gen gc pauses.  When a tenured reference is updated, a dirty card tracks
the memory region of the changed reference.  The next young gen gc checks traceability to
references in the dirty card regions.  FBR processing rewrites triplets references to identity
unreported blocks, causing a spike in young gen gc load.  A folded/sorted tree is a clever
means to avoid rewriting the triplets pointer during a FBR, thus reducing young gen gc pressure.

That said, here's the main issues I see:
# FBRs: DNs must send sorted block reports, else the NN creates a temporary sorted/folded
tree for iteration.  A new block report encoding was designed (by me) for in-place iteration
to reduce high object allocation rates that made FBR processing unacceptable at scale.  Building
the sorted tree undoes the benefit by exploding the entire report into a highly fragmented
tree (out of order insertion).  During a rolling upgrade this will place extreme pressure
on the NN until all DNs are upgraded.
# IBRs: The performance of the tree is predicated on increasing block ids to avoid fragmentation
while filling the tree.  However, block churn and organic replication from dead nodes, decommissioning
nodes, failed storages, block balancing, etc will pepper a node with random blocks.  The IBRs
will cause trees to quickly fragment.  The tree mutations are likely to cause more dirty cards
than simply linking/unlinking a block into the triplets.
# Tree compaction:  Every 10 mins all sufficiently fragmented storage trees will be compacted.
 In practice this may be a large portion of the cluster storages due to IBRs, translating
to bursts of very heavy gc load.  Heap growth will increase due to defunct tree nodes.
# The CMS remark time reduction is not compelling when cycles should occur every few days
or a week if the heap is adequately sized.

The doc primarily focuses on FBRs with a footnote of 4X increase in IBR processing and negative
impacts to balancing.  Impacts to balancing are equivalent to replication from dead nodes,
failed storages, decommissioning nodes, invalidations when dead nodes rejoin, etc.  FBR savings
are great, but not at the expense of increased processing and gc load from IBRs.

Unless there are real-world metrics with a large cluster under load that dispute my concerns,
I think we should revert.

> Improve the performance and GC friendliness of NameNode startup and full block reports
> --------------------------------------------------------------------------------------
>
>                 Key: HDFS-9260
>                 URL: https://issues.apache.org/jira/browse/HDFS-9260
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, namenode, performance
>    Affects Versions: 2.7.1
>            Reporter: Staffan Friberg
>            Assignee: Staffan Friberg
>             Fix For: 3.0.0-alpha1
>
>         Attachments: FBR processing.png, HDFS Block and Replica Management 20151013.pdf,
HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.003.patch, HDFS-7435.004.patch, HDFS-7435.005.patch,
HDFS-7435.006.patch, HDFS-7435.007.patch, HDFS-9260.008.patch, HDFS-9260.009.patch, HDFS-9260.010.patch,
HDFS-9260.011.patch, HDFS-9260.012.patch, HDFS-9260.013.patch, HDFS-9260.014.patch, HDFS-9260.015.patch,
HDFS-9260.016.patch, HDFS-9260.017.patch, HDFS-9260.018.patch, HDFSBenchmarks.zip, HDFSBenchmarks2.zip
>
>
> This patch changes the datastructures used for BlockInfos and Replicas to keep them sorted.
This allows faster and more GC friendly handling of full block reports.
> Would like to hear peoples feedback on this change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message