impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Taras Bobrovytsky (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5181: Extract PYPI metadata from a webpage
Date Fri, 07 Apr 2017 16:21:43 GMT
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-5181: Extract PYPI metadata from a webpage
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6579/1/infra/python/deps/pip_download.py
File infra/python/deps/pip_download.py:

Line 86:     regex = r'<a href=\".*packages/(.*)#md5=(.*?)\".*>(.*)<\/a>'
> Can you add a comment explaining why we're using regexes to parse the HTML,
I considered using beautifulsoup, but the problem is that we have to download and install
it first before using it in this script. Let me know if you have some ideas how we can do
this (I think it's definitely a better solution).

Since the html is guaranteed to be structured a certain way according to the PEP 503 documentation,
I think it's ok to use regex to parse.


-- 
To view, visit http://gerrit.cloudera.org:8080/6579
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If3845a0d5f568d4352e3cc4883596736974fd7de
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: David Knupp <dknupp@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message