beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Etienne Chauchot (JIRA)" <>
Subject [jira] [Commented] (BEAM-3201) ElasticsearchIO should deal with documents id
Date Tue, 28 Nov 2017 11:31:00 GMT


Etienne Chauchot commented on BEAM-3201:

Hi [~nerdynick]. Ok for the partition transform, it does not fit your use case.
Of course deserialization of the json string will be done inside the {{writeFn.ProcessElement}}
only once and the deserialized object will be passed to the three {{with[id|type|index]Fn}}.
The deserialized object cannot be jackson JSONObject because it is not serializable preventing
the 3 {{with[id|type|index]Fn}} user defined functions to be called by beam. We can chose
whatever object representation of json as long as it is serializable.  The  {{with[id|type|index]Fn}}
functions will take this object representation as parameter and output {{String}} value (String
id value, String index value, String type value) determined by the user out of the object
representation of the ES document. Beam will not add or remove metadata _id, _type, _index
to the message payload in Read and Write (to avoid deserialize/parse/re-serialize). But if
the user wants to add these fields to his documents to get them afterwards in {{with[id|type|index]Fn}}
or just determine their value out of other fields it is ok but these fields would be stored
as part of the paylaod (leaving the document untouched).

> ElasticsearchIO should deal with documents id
> ---------------------------------------------
>                 Key: BEAM-3201
>                 URL:
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-extensions
>            Reporter: Etienne Chauchot
>            Assignee: Chet Aldrich
> Today the ESIO only inserts the payload of the ES documents. Elasticsearch generates
a document id for each record inserted. So each new insertion is considered as a new document.
Users want to be able to update documents using the IO. So, for the write part of the IO,
users should be able to provide a document id so that they could update already stored documents.
Providing an id for the documents could also help the user on indempotency.

This message was sent by Atlassian JIRA

View raw message