pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Philip (flip) Kromer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-3877) Getting Geo Latitude/Longitude from Address Lines
Date Sun, 18 May 2014 03:37:15 GMT

    [ https://issues.apache.org/jira/browse/PIG-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000968#comment-14000968

Philip (flip) Kromer commented on PIG-3877:

* This makes separate HTTP calls for the latitude, then the longitude. Better to have one
method that returns a tuple prepared from the fully-parsed reponse and let the caller project
what they want.
* What happens on a response that fails to geocode or for any other reason doesn't have a
latLng element? the  JSONObject latLng = (JSONObject) ((JSONObject)locations.get(0)).get("latLng");
geolongitude = (String) latLng.get("lng"); sequence feels like a recipe for NPE.
* Is the intuit backend ready for people who might use this in production? Or even for apache
and the world's automated build systems to hit it without standing as abusive?
* I worry about having Pig make a network call on every record. There's no facility for throttling,
backoff, or HTTP keep-alive.
* Even with those, the only way I can imagine to make this workable at production scale using
an over-the-network geocoder would be to deploy an instance on each machine. Pete Warden's
[Data Science Toolkit|http://petewarden.com/2013/10/06/geocode-the-world-with-the-new-data-science-toolkit/]
has a [Standalone Geocoder|http://www.datasciencetoolkit.org/developerdocs#googlestylegeocoder];
this should target that and refer to it (or acceptable alternative) in the docs.

> Getting Geo Latitude/Longitude from Address Lines
> -------------------------------------------------
>                 Key: PIG-3877
>                 URL: https://issues.apache.org/jira/browse/PIG-3877
>             Project: Pig
>          Issue Type: Improvement
>          Components: piggybank
>    Affects Versions: 0.10.1
>            Reporter: Rekha Joshi
>            Assignee: Rekha Joshi
>              Labels: patch, piggybank
>             Fix For: 0.10.1
>         Attachments: PIG-3877.1.patch
> In many datasets mining use cases, it is needed to get latitude, longitude just from
address lines.The IP fields are missing.
> The Attached udfs for getting the geo latitude/longitude on address lines.

This message was sent by Atlassian JIRA

View raw message