impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Volker (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5181: Extract PYPI metadata from a webpage
Date Fri, 07 Apr 2017 12:08:30 GMT
Lars Volker has posted comments on this change.

Change subject: IMPALA-5181: Extract PYPI metadata from a webpage
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6579/1/infra/python/deps/pip_download.py
File infra/python/deps/pip_download.py:

Line 86:     regex = r'<a href=\".*packages/(.*)#md5=(.*?)\".*>(.*)<\/a>'
> I think this would be more robust if all of the * quantifiers were non-gree
Can you add a comment explaining why we're using regexes to parse the HTML, i.e. why we can't
use beautifulsoup or lxml or the like?


-- 
To view, visit http://gerrit.cloudera.org:8080/6579
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: If3845a0d5f568d4352e3cc4883596736974fd7de
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: David Knupp <dknupp@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstrong@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message