Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 72FBB200BC8 for ; Wed, 9 Nov 2016 00:44:24 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 71965160B12; Tue, 8 Nov 2016 23:44:24 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id B8CA4160B0A for ; Wed, 9 Nov 2016 00:44:23 +0100 (CET) Received: (qmail 81876 invoked by uid 500); 8 Nov 2016 23:44:23 -0000 Mailing-List: contact commits-help@beam.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@beam.incubator.apache.org Delivered-To: mailing list commits@beam.incubator.apache.org Received: (qmail 81867 invoked by uid 99); 8 Nov 2016 23:44:22 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Nov 2016 23:44:22 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 90B101A9C59 for ; Tue, 8 Nov 2016 23:44:22 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -6.218 X-Spam-Level: X-Spam-Status: No, score=-6.218 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-2.999, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id gfD8M867_kqK for ; Tue, 8 Nov 2016 23:44:20 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with SMTP id EF2A35F56B for ; Tue, 8 Nov 2016 23:44:19 +0000 (UTC) Received: (qmail 81846 invoked by uid 99); 8 Nov 2016 23:44:19 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Nov 2016 23:44:19 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 0D5CBE07EF; Tue, 8 Nov 2016 23:44:19 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: davor@apache.org To: commits@beam.incubator.apache.org Date: Tue, 08 Nov 2016 23:44:19 -0000 Message-Id: X-Mailer: ASF-Git Admin Mailer Subject: [1/2] incubator-beam-site git commit: Add tool to fix links. archived-at: Tue, 08 Nov 2016 23:44:24 -0000 Repository: incubator-beam-site Updated Branches: refs/heads/asf-site 268cadca4 -> 81bb48952 Add tool to fix links. Signed-off-by: Jason Kuster Project: http://git-wip-us.apache.org/repos/asf/incubator-beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-beam-site/commit/e5828ee4 Tree: http://git-wip-us.apache.org/repos/asf/incubator-beam-site/tree/e5828ee4 Diff: http://git-wip-us.apache.org/repos/asf/incubator-beam-site/diff/e5828ee4 Branch: refs/heads/asf-site Commit: e5828ee4a886bf02dda0099c3c60e15ac429ece3 Parents: 268cadc Author: Jason Kuster Authored: Tue Nov 8 14:52:06 2016 -0800 Committer: Davor Bonaci Committed: Tue Nov 8 15:43:58 2016 -0800 ---------------------------------------------------------------------- tools/append_index_html_to_internal_links.py | 76 +++++++++++++++++++++++ 1 file changed, 76 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-beam-site/blob/e5828ee4/tools/append_index_html_to_internal_links.py ---------------------------------------------------------------------- diff --git a/tools/append_index_html_to_internal_links.py b/tools/append_index_html_to_internal_links.py new file mode 100644 index 0000000..da87f57 --- /dev/null +++ b/tools/append_index_html_to_internal_links.py @@ -0,0 +1,76 @@ +"""Script to fix the links in the staged website. +Finds all internal links which do not have index.html at the end and appends +index.html in the appropriate place (preserving anchors, etc). + +Usage: + From root directory, after running the jekyll build, execute + 'python tools/append_index_html_to_internal_links.py'. + +Dependencies: + beautifulsoup4 + Installable via pip as 'sudo pip install beautifulsoup4' or apt via + 'sudo apt-get install python-beautifulsoup4'. + +""" + +import fnmatch +import os +import re +from bs4 import BeautifulSoup + +# Original link match. Matches any string which starts with '/' and doesn't +# have a file extension. +linkMatch = r'^\/(.*\.(?!([^\/]+)$))?[^.]*$' + +# Regex which matches strings of type /internal/link/#anchor. Breaks into two +# groups for ease of inserting 'index.html'. +anchorMatch1 = r'(.+\/)(#[^\/]+$)' + +# Regex which matches strings of type /internal/link#anchor. Breaks into two +# groups for ease of inserting 'index.html'. +anchorMatch2 = r'(.+\/[a-zA-Z0-9]+)(#[^\/]+$)' + + +matches = [] +# Recursively walk content directory and find all html files. +for root, dirnames, filenames in os.walk('content'): + for filename in fnmatch.filter(filenames, '*.html'): + # Javadoc does not have the index.html problem, so omit it. + if 'javadoc' not in root: + matches.append(os.path.join(root, filename)) + +print 'Matches: ' + str(len(matches)) +# Iterates over each matched file looking for link matches. +for match in matches: + print 'Fixing links in: ' + match + mf = open(match) + soup = BeautifulSoup(mf, "lxml") + # Iterates over every + for a in soup.findAll('a'): + try: + hr = a['href'] + if re.match(linkMatch, hr) is not None: + if hr.endswith('/'): + # /internal/link/ + a['href'] = hr + 'index.html' + elif re.match(anchorMatch1, hr) is not None: + # /internal/link/#anchor + mat = re.match(anchorMatch1, hr) + a['href'] = mat.group(1) + 'index.html' + mat.group(2) + elif re.match(anchorMatch2, hr) is not None: + # /internal/link#anchor + mat = re.match(anchorMatch2, hr) + a['href'] = mat.group(1) + '/index.html' + mat.group(2) + else: + # /internal/link + a['href'] = hr + '/index.html' + mf.close() + + html = soup.prettify("utf-8") + # Write back to the file. + with open(match, "wb") as f: + print 'Replacing ' + hr + ' with: ' + a['href'] + f.write(html) + except KeyError as e: + # Some tags don't have an href. + continue