From hdfs-issues-return-270893-archive-asf-public=cust-asf.ponee.io@hadoop.apache.org Mon Jul 8 19:51:02 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 0495C180665 for ; Mon, 8 Jul 2019 21:51:01 +0200 (CEST) Received: (qmail 231 invoked by uid 500); 8 Jul 2019 19:51:01 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 219 invoked by uid 99); 8 Jul 2019 19:51:01 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Jul 2019 19:51:01 +0000 Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 67060E2B9D for ; Mon, 8 Jul 2019 19:51:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 1F6B826564 for ; Mon, 8 Jul 2019 19:51:00 +0000 (UTC) Date: Mon, 8 Jul 2019 19:51:00 +0000 (UTC) From: "Daryn Sharp (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-14617) Improve fsimage load time by writing sub-sections to the fsimage index MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-14617?page=3Dcom.atlassian= .jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1688= 0666#comment-16880666 ]=20 Daryn Sharp commented on HDFS-14617: ------------------------------------ Was asked to take a look at this. =C2=A0I think this can be done with no im= age format incompatibility and minor changes. How about the image reading=C2=A0thread just=C2=A0adds the=C2=A0inodes to a= queue=C2=A0for a thread pool to process? =C2=A0Perhaps just a single threa= d consuming the queue will be sufficient since it will avoid synch overhead= s. > Improve fsimage load time by writing sub-sections to the fsimage index > ---------------------------------------------------------------------- > > Key: HDFS-14617 > URL: https://issues.apache.org/jira/browse/HDFS-14617 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Reporter: Stephen O'Donnell > Assignee: Stephen O'Donnell > Priority: Major > Attachments: HDFS-14617.001.patch > > > Loading an fsimage is basically a single threaded process. The current fs= image is written out in sections, eg iNode, iNode_Directory, Snapshots, Sna= pshot_Diff etc. Then at the end of the file, an index is written that conta= ins the offset and length of each section. The image loader code uses this = index to initialize an input stream to read and process each section. It is= important that one section is fully loaded before another is started, as t= he next section depends on the results of the previous one. > What I would like to propose is the following: > 1. When writing the image, we can optionally output sub_sections to the i= ndex. That way, a given section would effectively be split into several sec= tions, eg: > {code:java} > inode_section offset 10 length 1000 > inode_sub_section offset 10 length 500 > inode_sub_section offset 510 length 500 > =20 > inode_dir_section offset 1010 length 1000 > inode_dir_sub_section offset 1010 length 500 > inode_dir_sub_section offset 1010 length 500 > {code} > Here you can see we still have the original section index, but then we al= so have sub-section entries that cover the entire section. Then a processor= can either read the full section in serial, or read each sub-section in pa= rallel. > 2. In the Image Writer code, we should set a target number of sub-section= s, and then based on the total inodes in memory, it will create that many s= ub-sections per major image section. I think the only sections worth doing = this for are inode, inode_reference, inode_dir and snapshot_diff. All other= s tend to be fairly small in practice. > 3. If there are under some threshold of inodes (eg 10M) then don't bother= with the sub-sections as a serial load only takes a few seconds at that sc= ale. > 4. The image loading code can then have a switch to enable 'parallel load= ing' and a 'number of threads' where it uses the sub-sections, or if not en= abled falls back to the existing logic to read the entire section in serial= . > Working with a large image of 316M inodes and 35GB on disk, I have a proo= f of concept of this change working, allowing just inode and inode_dir to b= e loaded in parallel, but I believe inode_reference and snapshot_diff can b= e make parallel with the same technique. > Some benchmarks I have are as follows: > {code:java} > Threads 1 2 3 4=20 > -------------------------------- > inodes 448 290 226 189=20 > inode_dir 326 211 170 161=20 > Total 927 651 535 488 (MD5 calculation about 100 seconds) > {code} > The above table shows the time in seconds to load the inode section and t= he inode_directory section, and then the total load time of the image. > With 4 threads using the above technique, we are able to better than half= the load time of the two sections. With the patch in HDFS-13694 it would t= ake a further 100 seconds off the run time, going from 927 seconds to 388, = which is a significant improvement. Adding more threads beyond 4 has dimini= shing returns as there are some synchronized points in the loading code to = protect the in memory structures. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org