hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wei-Chiu Chuang <weic...@apache.org>
Subject Re: Reminder: Hadoop Storage Online Meetup tomorrow (Hadoop 2->3 upgrade)
Date Thu, 09 Jan 2020 11:27:07 GMT
Thanks for Wangda's help, I am able to retrieve the recording of this
session.

Please feel free to download the recording at:
https://cloudera.zoom.us/rec/share/7MF_dLX0339OY5391xvkZP8NLrXieaa8gyZK-fYJnUkGOUUXvaUh5cl_6AVYetQl

non-Mandarin speakers, please send me the feedback on how you think about
the session this time. I served as the translator this time and I need your
feedback to improve next time.

On Fri, Jan 3, 2020 at 10:01 PM Wei-Chiu Chuang <weichiu@apache.org> wrote:

>
> Hi, it was a well attended session with more than 40 attendees joined!
> Thanks Fei Hui for giving us such a great talk.
>
> Here's the summary for your reference.
>
>
> https://docs.google.com/document/d/1jXM5Ujvf-zhcyw_5kiQVx6g-HeKe-YGnFS_1-qFXomI/edit?usp=sharing
> 01/02/2020 Didi talked about their large scale HDFS cluster upgrade
> experience.
>
> Slides:
> https://drive.google.com/open?id=1iwJ1asalYfgnOCBuE-RfeG-NpSocjIcy
>
> Didi studied two upgrade approaches from the community documentation:
> express upgrade and rolling upgrade. Rolling upgrade was selected.
>
> The upgrade involved HDFS server side only. Clients are still on Hadoop
> 2.7 because applications such as Hive and Spark does not support Hadoop 3
> yet.
>
> Zookeeper was not upgraded.
>
> Didi practiced upgrade + downgrade more than 10 times before doing it for
> real.
>
> Didi’s largest cluster has 5 federated namespaces, and 10+ thousand nodes.
> The upgrade took a month. JournalNodes took 1 week; NameNode: 2 weeks;
> DataNodes took a week.
>
> During upgrade, HDFS does not clean up trash. Because the upgrade window
> was a month long, the trash became a concern because it could exhaust all
> available space. Didi has a (script?) to clean trash daily.
>
> A problem was encountered which may not be related: Clients were
> occasionally unable to close files. Solution: reviewed DataNode log, and
> found that the blocks were not reported in time, and that was because
> delete blocks took too long.
>
> Two parameters were changed to address the issue:
>
> Increase dfs.client.block.write.locateFollowingBlock.retries and
>
> Reduce dfs.block.invalidate.limit (from the default 1000 to 500)
>
> Didi believes the new upstream change HDFS-14997 can alleviate this issue.
>
> Timeline:
>
> May 2019, verified the plan is good.
>
> July: trial run with a 100-node cluster, completed rolling upgrade
> successfully.
>
> Oct: 300+ node cluster rolling upgrade completed.
>
> Nov: 10-thousand node cluster rolling upgrade completed.
>
> Offline test
>
> Had Spark, Hive and Hadoop full test set. Verified the upgrade/downgrade
> has no impact.
>
> Reviewed the 4000+ patches between Hadoop 2.7 and 3.2, to make sure
> there’s no incompatible changes.
>
> Authored 40+ internal wikis to document the process.
>
> Future:
>
> Didi’s interested in Ozone to address the small file problems.
>
> Want to incorporate the Consistent Read from Standby feature to increase
> NameNode RPC performance.
>
> Finally, DataNode upgrade is hard. Will look into HDFS Maintenance Mode to
> make this easier in the future.
>
> This is a HDFS-only upgrade work. YARN upgrade is planned in the second
> half of 2020. Since the main purpose is to use EC to reduce space usage,
> Didi ported EC client side code to Hadoop 2.7 clients, and these clients
> can read/write EC blocks!
>
>
> On Wed, Jan 1, 2020 at 7:42 PM Wei-Chiu Chuang <weichiu@apache.org> wrote:
>
>> Hi,
>> This is a gentle reminder for tomorrow's online meetup. Fei Hui from DiDi
>> is going to give a presentation about DiDi's Hadoop 2 -> Hadoop 3 upgrade
>> experience.
>>
>> We will extend this session to 1 hour. Fei will speak in Mandarin and I
>> will help translate. So non-Mandarin speakers feel free to join!
>>
>> Time/Date:
>> Jan 1 10PM (US west coast PST) / Jan 2 2pm (Beijing, China CST) / Jan 2
>> 11:30am (India, IST) / Jan 2 3pm (Tokyo, Japan, JST)
>>
>> Join Zoom Meeting
>>
>> https://cloudera.zoom.us/j/880548968
>>
>> One tap mobile
>>
>> +16465588656,,880548968# US (New York)
>>
>> +17207072699,,880548968# US
>>
>> Dial by your location
>>
>>         +1 646 558 8656 US (New York)
>>
>>         +1 720 707 2699 US
>>
>>         877 853 5257 US Toll-free
>>
>>         888 475 4499 US Toll-free
>>
>> Meeting ID: 880 548 968
>> Find your local number: https://zoom.us/u/acaGRDfMVl
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message