Return-Path: X-Original-To: apmail-flink-dev-archive@www.apache.org Delivered-To: apmail-flink-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 80720190A6 for ; Thu, 14 Apr 2016 16:47:58 +0000 (UTC) Received: (qmail 39960 invoked by uid 500); 14 Apr 2016 16:47:58 -0000 Delivered-To: apmail-flink-dev-archive@flink.apache.org Received: (qmail 39882 invoked by uid 500); 14 Apr 2016 16:47:58 -0000 Mailing-List: contact dev-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list dev@flink.apache.org Received: (qmail 39870 invoked by uid 99); 14 Apr 2016 16:47:58 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Apr 2016 16:47:58 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id A05151A0575 for ; Thu, 14 Apr 2016 16:47:57 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.28 X-Spam-Level: * X-Spam-Status: No, score=1.28 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=okkam-it.20150623.gappssmtp.com Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id jGpjjA35ccvP for ; Thu, 14 Apr 2016 16:47:54 +0000 (UTC) Received: from mail-wm0-f68.google.com (mail-wm0-f68.google.com [74.125.82.68]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id 566EF5F236 for ; Thu, 14 Apr 2016 16:47:54 +0000 (UTC) Received: by mail-wm0-f68.google.com with SMTP id a140so24507784wma.2 for ; Thu, 14 Apr 2016 09:47:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=okkam-it.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to; bh=+56XqbMPdrAzHoRg7KWV8MyGrOy+eECvk2XeD71ID4Q=; b=usc/lBovMgaH2kb3/+h2IXISviHJ7mcRq0/j9+L+VCl/+Ya7LGz2MVy8W13WD+Xvr8 NZd+f0ltkBzQx8VfAErwYo6Jrsp2KT+O4aY9xsj3RjB2UE7yujbti9t0BPsmR/bZJzk4 lJiSXY31mGTn6Vgh1ms4pnGvU39w8P+G2KzRisDBGQbAu2CXt2Ia7uCQer8NNMrvSkwV fZ6f3NwrugIvAfu3cyX+Jzcrfh/IDw/3v7HqqFTYEhH7gw4vsIirH26Nczd7AG6X0qbh vuTKVttgj2/TFh4afkO1SGz0rO2kWP1zH3TVOwEM1goW4IjT6HouSHhaP5peZOu1xctx OAVQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to; bh=+56XqbMPdrAzHoRg7KWV8MyGrOy+eECvk2XeD71ID4Q=; b=GDcb/IqZ3tmqbSFBWHtUFZUeOdguCj7EOjx9Kz1oUWKkKhO8dcXmBo1dTevFylJzUA 0GUg2ufdrK+kVD0Z7HHrXcdmOoMKI/w0EuX7g/eRC9HPgBfNl5k5SZefQe9gBCo5XOCe vxq+tDpsAwWZ7S9EqRTTajrD1afZkjM+6QvFKRRn+7vQeYhSWN6TExP+9X9y1WfsnN2V 42y/4zoXaeA/f4ysOI2TpHsKQdh2oH+3GVMStHmC+9ohhdcMx0rV288GwYS4/ufbe6WP dB5/pn5o5cbVj0dF9e+GCEF9pbiNZCt/hzwpzZ4iqfHRM7/ERA+TBs2GAdMnR7H7CDKk sCqw== X-Gm-Message-State: AOPr4FVSKFOyiQHtU6awJ0pmZvRINKX/plHiM/PU5mKN8kpi0ypYRbcHRU3X0JHictc06nG9wZmt+8WTAstu3Q== MIME-Version: 1.0 X-Received: by 10.194.246.137 with SMTP id xw9mr18665208wjc.172.1460652473998; Thu, 14 Apr 2016 09:47:53 -0700 (PDT) Received: by 10.28.93.82 with HTTP; Thu, 14 Apr 2016 09:47:53 -0700 (PDT) X-Originating-IP: [37.227.0.129] Received: by 10.28.93.82 with HTTP; Thu, 14 Apr 2016 09:47:53 -0700 (PDT) In-Reply-To: <570FC05D.1060909@apache.org> References: <570FB704.5090506@apache.org> <570FC05D.1060909@apache.org> Date: Thu, 14 Apr 2016 18:47:53 +0200 Message-ID: Subject: Re: FLINK-3750 (JDBCInputFormat) From: Flavio Pompermaier To: dev@flink.apache.org Content-Type: multipart/alternative; boundary=089e01681c7eda79b3053074a766 --089e01681c7eda79b3053074a766 Content-Type: text/plain; charset=UTF-8 ok thanks!just one last question: an inputformat is instantiated for each task slot or once for task manger? On 14 Apr 2016 18:07, "Chesnay Schepler" wrote: > no. > > if (connection==null) { > establishCOnnection(); > } > > done. same connection for all splits. > > On 14.04.2016 17:59, Flavio Pompermaier wrote: > >> I didn't understand what you mean for "it should also be possible to reuse >> the same connection of an InputFormat across InputSplits, i.e., calls of >> the open() method". >> At the moment in the open method there's a call to establishConnection, >> thus, a new connection is created for each split. >> If I understood correctly, you're suggesting to create a pool in the >> inputFormat and simply call poo.borrow() in the open() rather than >> establishConnection? >> >> On 14 Apr 2016 17:28, "Chesnay Schepler" wrote: >> >> On 14.04.2016 17:22, Fabian Hueske wrote: >>> >>> Hi Flavio, >>>> >>>> that are good questions. >>>> >>>> 1) Replacing null values by default values and simply forwarding records >>>> is >>>> very dangerous, in my opinion. >>>> I see two alternatives: A) we use a data type that tolerates null >>>> values. >>>> This could be a POJO that the user has to provide or Row. The drawback >>>> of >>>> Row is that it is untyped and not easy to handle. B) We use Tuple and >>>> add >>>> an additional field that holds an Integer which serves as a bitset to >>>> mark >>>> null fields. This would be a pretty low level API though. I am leaning >>>> towards the user-provided POJO option. >>>> >>>> i would also lean towards the POJO option. >>> >>> 2) The JDBCInputFormat is located in a dedicated Maven module. I think we >>>> can add a dependency to that module. However, it should also be possible >>>> to >>>> reuse the same connection of an InputFormat across InputSplits, i.e., >>>> calls >>>> of the open() method. Wouldn't that be sufficient? >>>> >>>> this is the right approach imo. >>> >>> Best, Fabian >>>> >>>> 2016-04-14 16:59 GMT+02:00 Flavio Pompermaier : >>>> >>>> Hi guys, >>>> >>>>> I'm integrating the comments of Chesnay to my PR but there's a couple >>>>> of >>>>> thing that I'd like to discuss with the core developers. >>>>> >>>>> >>>>> 1. about the JDBC type mapping (addValue() method at [1]: At the >>>>> moment >>>>> if I find a null value for a Double, the getDouble of jdbc return >>>>> 0.0. >>>>> Is >>>>> it really the correct behaviour? Wouldn't be better to use a POJO >>>>> or >>>>> the >>>>> Row of datatable that can handle void? Moreover, the mapping >>>>> between >>>>> SQL >>>>> type and Java types varies much from the single JDBC >>>>> implementation. >>>>> Wouldn't be better to rely on the Java type coming from using >>>>> resultSet.getObject() to get such a mapping rather than using the >>>>> ResultSetMetadata types? >>>>> 2. I'd like to handle connections very efficiently because we >>>>> have a >>>>> use >>>>> case with billions of records and thus millions of splits and >>>>> establish >>>>> a >>>>> new connection each time could be expensive. Would it be a >>>>> problem to >>>>> add >>>>> apache pool dependency to the jdbc batch connector in order to >>>>> reuase >>>>> the >>>>> created connections? >>>>> >>>>> >>>>> [1] >>>>> >>>>> >>>>> >>>>> https://github.com/fpompermaier/flink/blob/FLINK-3750/flink-batch-connectors/flink-jdbc/src/main/java/org/apache/flink/api/java/io/jdbc/JDBCInputFormat.java >>>>> >>>>> >>>>> > --089e01681c7eda79b3053074a766--