flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From cjolif <...@git.apache.org>
Subject [GitHub] flink pull request #5374: [FLINK-8101][flink-connectors] Elasticsearch 5.3+ ...
Date Wed, 31 Jan 2018 21:06:29 GMT
Github user cjolif commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5374#discussion_r165186422
  
    --- Diff: flink-connectors/flink-connector-elasticsearch5.3/src/main/java/org/apache/flink/streaming/connectors/elasticsearch53/BulkProcessorIndexer.java
---
    @@ -0,0 +1,57 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.flink.streaming.connectors.elasticsearch53;
    +
    +import org.apache.flink.streaming.connectors.elasticsearch.RequestIndexer;
    +
    +import org.elasticsearch.action.ActionRequest;
    +import org.elasticsearch.action.DocWriteRequest;
    +import org.elasticsearch.action.bulk.BulkProcessor;
    +
    +import java.util.concurrent.atomic.AtomicLong;
    +
    +/**
    + * Implementation of a {@link RequestIndexer}, using a {@link BulkProcessor}.
    + * {@link ActionRequest ActionRequests} will be converted to {@link DocWriteRequest}
    + * and will be buffered before sending a bulk request to the Elasticsearch cluster.
    + */
    +public class BulkProcessorIndexer implements RequestIndexer {
    +
    +	private final BulkProcessor bulkProcessor;
    +	private final boolean flushOnCheckpoint;
    +	private final AtomicLong numPendingRequestsRef;
    +
    +	public BulkProcessorIndexer(BulkProcessor bulkProcessor,
    +								boolean flushOnCheckpoint,
    +								AtomicLong numPendingRequests) {
    +		this.bulkProcessor = bulkProcessor;
    +		this.flushOnCheckpoint = flushOnCheckpoint;
    +		this.numPendingRequestsRef = numPendingRequests;
    +	}
    +
    +	@Override
    +	public void add(ActionRequest... actionRequests) {
    +		for (ActionRequest actionRequest : actionRequests) {
    +			if (flushOnCheckpoint) {
    +				numPendingRequestsRef.getAndIncrement();
    +			}
    +			this.bulkProcessor.add((DocWriteRequest) actionRequest);
    --- End diff --
    
    This is actually from the commit I brought into the PR from orignal @zjureel's PR. That
said I think the answer is definitely yes in the case that matters for Flink. Indeed:
    
    * The ActionRequest values here are actually coming from the implementation of the `ElasticsearchSinkFunction.process`
method which should create `ActionRequest` and add them to the indexer.
    * The idea here is not to create any sort of `ActionRequest` you would possibly dream
of but indexing requests?
    * The way to create `ActionRequest` for indexing in Elasticsearch is to use `org.elasticsearch.action.index.IndexRequest`

    * starting with Elasticsearch 5.3 IndexRequest inherits from `DocWriteRequest` while it
was not before 5.3.
    
    See: 
    
    ![image](https://user-images.githubusercontent.com/623171/35646706-5723ab78-06d0-11e8-8d50-5b4545047a1f.png)
    
    vs
    
    ![image](https://user-images.githubusercontent.com/623171/35646719-63d7f1b2-06d0-11e8-8308-c330b3c11dad.png)
    
    So the only case I see where this could not be a `DocWriteRequest` would be if someone
in the `ElasticsearchSinkFunction` would create something else than an index request. But
I don't really see why? 
    
    That said this raises the question of why from the origin the API was not typed against
`IndexRequest` instead of `ActionRequest` as this would avoid those questions and force the
user to return a `IndexRequest`?
    
    In every case there is little choice because starting with 5.3 Elasticsearch does not
accept ActionRequest in BulkProcessor anymore but just IndexRequest/DocWriteRequest.
    
    Do you have a suggestion on how to handle this better? Obviously I can add documentation
saying starting with 5.3 the sink function MUST return DocWriteRequest? But is that enough
for you?



---

Mime
View raw message