From user-return-33330-archive-asf-public=cust-asf.ponee.io@flink.apache.org Mon Mar 9 13:26:11 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 81D5A180181 for ; Mon, 9 Mar 2020 14:26:11 +0100 (CET) Received: (qmail 38110 invoked by uid 500); 9 Mar 2020 13:26:10 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@flink.apache.org Received: (qmail 38099 invoked by uid 99); 9 Mar 2020 13:26:10 -0000 Received: from Unknown (HELO mailrelay1-lw-us.apache.org) (10.10.3.159) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Mar 2020 13:26:09 +0000 Received: from mail-ot1-f45.google.com (mail-ot1-f45.google.com [209.85.210.45]) by mailrelay1-lw-us.apache.org (ASF Mail Server at mailrelay1-lw-us.apache.org) with ESMTPSA id 7EFE72233 for ; Mon, 9 Mar 2020 13:26:09 +0000 (UTC) Received: by mail-ot1-f45.google.com with SMTP id 111so5151990oth.13 for ; Mon, 09 Mar 2020 06:26:09 -0700 (PDT) X-Gm-Message-State: ANhLgQ1ireaU0W1wGmtIYjGwE7901TdltvVEUwE3i+0ypEHNv04aPgMi qzY1vxpl1ttcHLNmWeaCyQSzyyXRLalzmJTMyKM= X-Google-Smtp-Source: ADFU+vvK+2OjzdwKTtErk7bSUAf+epZOkgF/H2qKMHjoi1eAlU+XG16lcFHuyHZ0A3gGkzHK5vGCMiT8R/2jYVhOCew= X-Received: by 2002:a9d:4783:: with SMTP id b3mr12576929otf.212.1583760369055; Mon, 09 Mar 2020 06:26:09 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Robert Metzger Date: Mon, 9 Mar 2020 14:25:53 +0100 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Writing a DataSet to ElasticSearch To: Niels Basjes Cc: user Content-Type: multipart/alternative; boundary="0000000000003500ce05a06bf2a3" --0000000000003500ce05a06bf2a3 Content-Type: text/plain; charset="UTF-8" Hey Niels, For the OOM problem: Did you try RocksDB? I don't think there's an ES OutputFormat. I guess there's no way around implementing your own OutputFormat for ES, if you want to use the DataSet API. It should not be too hard to implement. On Sun, Mar 1, 2020 at 1:42 PM Niels Basjes wrote: > Hi, > > I have a job in Flink 1.10.0 which creates data that I need to write to > ElasticSearch. > Because it really is a Batch (and doing it as a stream keeps giving OOM > problems: big + unordered + groupby) I'm trying to do it as a real batch. > > To write a DataSet to some output (that is not a file) an OutputFormat > implementation is needed. > > public DataSink output(OutputFormat outputFormat) > > The problem I have is that I have not been able to find a "OutputFormat" > for ElasticSearch. > Adding ES as a Sink to a DataStream is trivial because a Sink is provided > out of the box. > > The only alternative I came up with is to write the output of my batch to > a file and then load that (with a stream) into ES. > > What is the proper solution? > Is there an OutputFormat for ES I can use that I overlooked? > > -- > Best regards / Met vriendelijke groeten, > > Niels Basjes > > --0000000000003500ce05a06bf2a3 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hey Niels,

For the OOM problem: Di= d you try RocksDB?

I don't think th= ere's an ES OutputFormat.

I guess there's = no way around implementing your own OutputFormat for ES, if you want to use= the DataSet API. It should not be too hard to implement.


On Sun, Mar 1, 2020 at 1:42 PM Niels Basjes <Niels@basjes.nl> wrote:
=
Hi,
=
I have a job in Flink 1.10.0 which creates data that I need = to write to ElasticSearch.
Because it really is a Batch (and doin= g it as a stream keeps giving OOM problems: big + unordered + groupby) I= 9;m trying to do it as a real batch.

To write a Da= taSet to some output (that is not a file) an OutputFormat implementation is= needed.
public DataSink<T> output(OutputFormat<=
T> outputFormat)
The problem I have is that I hav= e not been able to find a "OutputFormat" for ElasticSearch.
=
Adding ES as a Sink to a DataStream is trivial because a Sink is provi= ded out of the box.

The only alternative=C2=A0I ca= me up with is to write the output of my batch to a file and then load that = (with a stream) into ES.

What is the proper soluti= on?
Is there an OutputFormat for ES I can use that I overlooked?<= br clear=3D"all">

--
Best regards / Met = vriendelijke groeten,

Niels Basjes

<= /div>
--0000000000003500ce05a06bf2a3--