beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vikas Kedigehalli (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-1800) Can't save datastore objects
Date Fri, 24 Mar 2017 17:05:41 GMT

    [ https://issues.apache.org/jira/browse/BEAM-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15940746#comment-15940746
] 

Vikas Kedigehalli commented on BEAM-1800:
-----------------------------------------

That is weird because the protobuf message is serailized to string before sending to httplib
(https://github.com/GoogleCloudPlatform/google-cloud-datastore/blob/master/python/googledatastore/connection.py#L191)

[~mlambert] could you bypass Beam and trying writing directly using this library and see if
it fails? That would tell us if its a Beam issue or not. 
https://github.com/GoogleCloudPlatform/google-cloud-datastore/blob/master/python/googledatastore/connection.py#L127

> Can't save datastore objects
> ----------------------------
>
>                 Key: BEAM-1800
>                 URL: https://issues.apache.org/jira/browse/BEAM-1800
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py
>            Reporter: Mike Lambert
>            Assignee: Vikas Kedigehalli
>
> I can't seem to save my database objects using {{WriteToDatastore}}, as it errors out
on a strange unicode issue when trying to write a batch. Stacktrace follows:
> {noformat}
> File "apache_beam/runners/common.py", line 195, in apache_beam.runners.common.DoFnRunner.receive
(apache_beam/runners/common.c:5142)
>   self.process(windowed_value) 
> File "apache_beam/runners/common.py", line 267, in apache_beam.runners.common.DoFnRunner.process
(apache_beam/runners/common.c:7201)
>   self.reraise_augmented(exn) 
> File "apache_beam/runners/common.py", line 279, in apache_beam.runners.common.DoFnRunner.reraise_augmented
(apache_beam/runners/common.c:7590)
>   raise type(exn), args, sys.exc_info()[2] 
> File "apache_beam/runners/common.py", line 263, in apache_beam.runners.common.DoFnRunner.process
(apache_beam/runners/common.c:7090)
>   self._dofn_simple_invoker(element) 
> File "apache_beam/runners/common.py", line 198, in apache_beam.runners.common.DoFnRunner._dofn_simple_invoker
(apache_beam/runners/common.c:5262)
>   self._process_outputs(element, self.dofn_process(element.value)) 
> File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/datastore/v1/datastoreio.py",
line 354, in process
>   self._flush_batch() 
> File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/datastore/v1/datastoreio.py",
line 363, in _flush_batch
>   helper.write_mutations(self._datastore, self._project, self._mutations) 
> File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/datastore/v1/helper.py",
line 187, in write_mutations
>   commit(commit_request) 
> File "/usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry.py", line 174, in
wrapper
>   return fun(*args, **kwargs) 
> File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/datastore/v1/helper.py",
line 185, in commit
>   datastore.commit(req) 
> File "/usr/local/lib/python2.7/dist-packages/googledatastore/connection.py", line 140,
in commit
>   datastore_pb2.CommitResponse) 
> File "/usr/local/lib/python2.7/dist-packages/googledatastore/connection.py", line 199,
in _call_method
>   method='POST', body=payload, headers=headers) 
> File "/usr/local/lib/python2.7/dist-packages/oauth2client/client.py", line 631, in new_request
>   redirections, connection_type) 
> File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1609, in request
(response, content)
>   = self._request(conn, authority, uri, request_uri, method, body, headers, redirections,
cachekey) 
> File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1351, in _request
(response, content)
>   = self._conn_request(conn, request_uri, method, body, headers) 
> File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1273, in _conn_request
>   conn.request(method, request_uri, body, headers) 
> File "/usr/lib/python2.7/httplib.py", line 1039, in request
>   self._send_request(method, url, body, headers)
> File "/usr/lib/python2.7/httplib.py", line 1073, in _send_request
>    self.endheaders(body) 
> File "/usr/lib/python2.7/httplib.py", line 1035, in endheaders
>   self._send_output(message_body) 
> File "/usr/lib/python2.7/httplib.py", line 877, in _send_output
>   msg += message_body TypeError: must be str, not unicode
> [while running 'write to datastore/Convert to Mutation']
> {noformat}
> My code is basically:
> {noformat}
>         | 'convert from entity' >> beam.Map(ConvertFromEntity)
>         | 'write to datastore' >> WriteToDatastore(client.project)
> {noformat}
> Where {{ConvertFromEntity}} converts from a google.cloud.datastore object (which has
a nice API/interface) into the underlying protobuf (which is what the beam gcp/datastore library
expects):
> {noformat}
> from google.cloud.datastore import helpers
> def ConvertFromEntity(entity):
>     return helpers.entity_to_protobuf(entity)
> {noformat}
> I assume entity_to_protobuf works fine/normally, since it's also what is used by {{google/cloud/datastore/batch.py}}
to write a bunch of {{entity_pb2.Entity}} objects into the {{datastore_pb2.CommitRequest.mutations[n].upsert}}:
> In batch.py: {{put() -> _assign_entity_to_pb() -> entity_to_protobuf()}}.
> In datastoreio.py: {{WriteToDatastore->DatastoreWriteFn.to_upsert_mutation->_Mutate.DatastoreWriteFn->helper.write_mutations}}
> Any idea what's going on here and why this doesn't work? Yes, I may have some unicode
in my objects...but it works in my appengine DB/NDB usage. I will attempt to skip WriteToDatastore
and just put unbatched entities using the datastore library and see if that goes any better
for me...



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message