impala-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Taras Bobrovytsky (JIRA)" <>
Subject [jira] [Resolved] (IMPALA-5181) Make it possible to get Python package metadata from an HTML web page in
Date Sat, 08 Apr 2017 01:54:41 GMT


Taras Bobrovytsky resolved IMPALA-5181.
       Resolution: Fixed
    Fix Version/s: Impala 2.9.0

commit 4a79c9e7e3928f919b5fb60bab4145ba886d6252
Author: Taras Bobrovytsky <>
Date:   Thu Mar 30 13:08:21 2017 -0700

    IMPALA-5181: Extract PYPI metadata from a webpage
    There were some build failures due to a failure to download a JSON file
    containing package metadata from PYPI. We need to switch to downloading
    this from a PYPI mirror. In order to be able to download the metadata
    from a PYPI mirror, we need be able to extract the data from a web page,
    because PYPI mirrors do not always have a JSON interface.
    We implement a regex based html parser in this patch. Also, we increase
    the number of download attempts and randomly vary the amount of time
    between each attempt.
    - Tested locally against PYPI and a PYPI mirror.
    - Ran a private build that passed (which used a PYPI mirror).
    Change-Id: If3845a0d5f568d4352e3cc4883596736974fd7de
    Reviewed-by: Tim Armstrong <>
    Tested-by: Impala Public Jenkins

> Make it possible to get Python package metadata from an HTML web page in
> ----------------------------------------------------------------------------------------
>                 Key: IMPALA-5181
>                 URL:
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Infrastructure
>            Reporter: Taras Bobrovytsky
>            Assignee: Taras Bobrovytsky
>             Fix For: Impala 2.9.0
> Currently allows retrieving Python package metadata only from a JSON
file, for example This is a problem because some
PYPI mirrors might not implement this interface.
> The data in the JSON file should also be accessible through a web interface - for example,
> should be able to parse the web page and extract the information we need.

This message was sent by Atlassian JIRA

View raw message