From hdfs-dev-return-40825-archive-asf-public=cust-asf.ponee.io@hadoop.apache.org Thu Jan 9 11:27:37 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 92DE318063F for ; Thu, 9 Jan 2020 12:27:37 +0100 (CET) Received: (qmail 37248 invoked by uid 500); 9 Jan 2020 11:27:35 -0000 Mailing-List: contact hdfs-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-dev@hadoop.apache.org Received: (qmail 37236 invoked by uid 99); 9 Jan 2020 11:27:35 -0000 Received: from Unknown (HELO mailrelay1-lw-us.apache.org) (10.10.3.159) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Jan 2020 11:27:35 +0000 Received: from mail-lj1-f180.google.com (mail-lj1-f180.google.com [209.85.208.180]) by mailrelay1-lw-us.apache.org (ASF Mail Server at mailrelay1-lw-us.apache.org) with ESMTPSA id 513D6106E for ; Thu, 9 Jan 2020 11:27:35 +0000 (UTC) Received: by mail-lj1-f180.google.com with SMTP id h23so6809002ljc.8 for ; Thu, 09 Jan 2020 03:27:35 -0800 (PST) X-Gm-Message-State: APjAAAVnwFyxwMTB0bIxHAe9nIVrzR8XgNw5VwSDMxIpVDt9wOc7QCMz VpU9FtSihX5G3xmV3ouEmKeeddRa8FJuXTdoK2xTvg== X-Google-Smtp-Source: APXvYqww+L0HkAMWcohp175fcJOvJzJgI5s2N2o83n6Ut7H2Rmhekw9jteM9dpmtAWCfE5EEHTzN2/etfIwHUPCsdIM= X-Received: by 2002:a2e:9806:: with SMTP id a6mr6050547ljj.178.1578569254208; Thu, 09 Jan 2020 03:27:34 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Wei-Chiu Chuang Date: Thu, 9 Jan 2020 19:27:07 +0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Reminder: Hadoop Storage Online Meetup tomorrow (Hadoop 2->3 upgrade) To: Hadoop Common , Hdfs-dev Content-Type: multipart/alternative; boundary="000000000000a6b91c059bb34bc8" --000000000000a6b91c059bb34bc8 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks for Wangda's help, I am able to retrieve the recording of this session. Please feel free to download the recording at: https://cloudera.zoom.us/rec/share/7MF_dLX0339OY5391xvkZP8NLrXieaa8gyZK-fYJ= nUkGOUUXvaUh5cl_6AVYetQl non-Mandarin speakers, please send me the feedback on how you think about the session this time. I served as the translator this time and I need your feedback to improve next time. On Fri, Jan 3, 2020 at 10:01 PM Wei-Chiu Chuang wrote: > > Hi, it was a well attended session with more than 40 attendees joined! > Thanks Fei Hui for giving us such a great talk. > > Here's the summary for your reference. > > > https://docs.google.com/document/d/1jXM5Ujvf-zhcyw_5kiQVx6g-HeKe-YGnFS_1-= qFXomI/edit?usp=3Dsharing > 01/02/2020 Didi talked about their large scale HDFS cluster upgrade > experience. > > Slides: > https://drive.google.com/open?id=3D1iwJ1asalYfgnOCBuE-RfeG-NpSocjIcy > > Didi studied two upgrade approaches from the community documentation: > express upgrade and rolling upgrade. Rolling upgrade was selected. > > The upgrade involved HDFS server side only. Clients are still on Hadoop > 2.7 because applications such as Hive and Spark does not support Hadoop 3 > yet. > > Zookeeper was not upgraded. > > Didi practiced upgrade + downgrade more than 10 times before doing it for > real. > > Didi=E2=80=99s largest cluster has 5 federated namespaces, and 10+ thousa= nd nodes. > The upgrade took a month. JournalNodes took 1 week; NameNode: 2 weeks; > DataNodes took a week. > > During upgrade, HDFS does not clean up trash. Because the upgrade window > was a month long, the trash became a concern because it could exhaust all > available space. Didi has a (script?) to clean trash daily. > > A problem was encountered which may not be related: Clients were > occasionally unable to close files. Solution: reviewed DataNode log, and > found that the blocks were not reported in time, and that was because > delete blocks took too long. > > Two parameters were changed to address the issue: > > Increase dfs.client.block.write.locateFollowingBlock.retries and > > Reduce dfs.block.invalidate.limit (from the default 1000 to 500) > > Didi believes the new upstream change HDFS-14997 can alleviate this issue= . > > Timeline: > > May 2019, verified the plan is good. > > July: trial run with a 100-node cluster, completed rolling upgrade > successfully. > > Oct: 300+ node cluster rolling upgrade completed. > > Nov: 10-thousand node cluster rolling upgrade completed. > > Offline test > > Had Spark, Hive and Hadoop full test set. Verified the upgrade/downgrade > has no impact. > > Reviewed the 4000+ patches between Hadoop 2.7 and 3.2, to make sure > there=E2=80=99s no incompatible changes. > > Authored 40+ internal wikis to document the process. > > Future: > > Didi=E2=80=99s interested in Ozone to address the small file problems. > > Want to incorporate the Consistent Read from Standby feature to increase > NameNode RPC performance. > > Finally, DataNode upgrade is hard. Will look into HDFS Maintenance Mode t= o > make this easier in the future. > > This is a HDFS-only upgrade work. YARN upgrade is planned in the second > half of 2020. Since the main purpose is to use EC to reduce space usage, > Didi ported EC client side code to Hadoop 2.7 clients, and these clients > can read/write EC blocks! > > > On Wed, Jan 1, 2020 at 7:42 PM Wei-Chiu Chuang wrote= : > >> Hi, >> This is a gentle reminder for tomorrow's online meetup. Fei Hui from DiD= i >> is going to give a presentation about DiDi's Hadoop 2 -> Hadoop 3 upgrad= e >> experience. >> >> We will extend this session to 1 hour. Fei will speak in Mandarin and I >> will help translate. So non-Mandarin speakers feel free to join! >> >> Time/Date: >> Jan 1 10PM (US west coast PST) / Jan 2 2pm (Beijing, China CST) / Jan 2 >> 11:30am (India, IST) / Jan 2 3pm (Tokyo, Japan, JST) >> >> Join Zoom Meeting >> >> https://cloudera.zoom.us/j/880548968 >> >> One tap mobile >> >> +16465588656,,880548968# US (New York) >> >> +17207072699,,880548968# US >> >> Dial by your location >> >> +1 646 558 8656 US (New York) >> >> +1 720 707 2699 US >> >> 877 853 5257 US Toll-free >> >> 888 475 4499 US Toll-free >> >> Meeting ID: 880 548 968 >> Find your local number: https://zoom.us/u/acaGRDfMVl >> > --000000000000a6b91c059bb34bc8--