beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nicholas Verbeck (JIRA)" <>
Subject [jira] [Commented] (BEAM-3201) ElasticsearchIO should deal with documents id
Date Wed, 22 Nov 2017 23:02:00 GMT


Nicholas Verbeck commented on BEAM-3201:

[~chetaldrich] I'm not against the user functions. I just feel to really support them efficiently
the ESIO.Write() signature would need to change from String to something else; Map, Object,
JSONObject, etc. In fact I talked about it when I tried to start the discussions on BEAM-3222
within the dev mailing list. 

The use-case I'm trying to solve for is not a unique one or a new of one. In most cases, including
my own currently, it involves timeseries data. Where you'd bucket the data into separate indexes
by day, hour, etc. It'd be impractical to launch separate jobs or define an unlimited list
of partitions for each time separation. Especially when streaming data from Kafka. Data shows
up late and other issues would make it very difficult if you couldn't change the index/type
dynamically as the data flows by. ES already supports this action/use-case with the Bulk API.
As well as further enhances the ability to do this with index templates. 

> ElasticsearchIO should deal with documents id
> ---------------------------------------------
>                 Key: BEAM-3201
>                 URL:
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-extensions
>            Reporter: Etienne Chauchot
>            Assignee: Chet Aldrich
> Today the ESIO only inserts the payload of the ES documents. Elasticsearch generates
a document id for each record inserted. So each new insertion is considered as a new document.
Users want to be able to update documents using the IO. So, for the write part of the IO,
users should be able to provide a document id so that they could update already stored documents.
Providing an id for the documents could also help the user on indempotency.

This message was sent by Atlassian JIRA

View raw message