Return-Path: X-Original-To: apmail-flink-dev-archive@www.apache.org Delivered-To: apmail-flink-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6F7EF18A36 for ; Wed, 16 Sep 2015 15:34:47 +0000 (UTC) Received: (qmail 64848 invoked by uid 500); 16 Sep 2015 15:34:37 -0000 Delivered-To: apmail-flink-dev-archive@flink.apache.org Received: (qmail 64791 invoked by uid 500); 16 Sep 2015 15:34:37 -0000 Mailing-List: contact dev-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list dev@flink.apache.org Received: (qmail 64779 invoked by uid 99); 16 Sep 2015 15:34:37 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Sep 2015 15:34:37 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 1BAB1180976 for ; Wed, 16 Sep 2015 15:34:37 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.879 X-Spam-Level: ** X-Spam-Status: No, score=2.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id q2iuO_R3W6zu for ; Wed, 16 Sep 2015 15:34:35 +0000 (UTC) Received: from mail-la0-f44.google.com (mail-la0-f44.google.com [209.85.215.44]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 48BF842B5D for ; Wed, 16 Sep 2015 15:34:35 +0000 (UTC) Received: by lahg1 with SMTP id g1so102765688lah.1 for ; Wed, 16 Sep 2015 08:34:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=K884JO6M3w1nJSGt4Acmu46tWVOWhk8PkCTGmtxtG1M=; b=Lw8oAK6/Hcdrx91jYdyUycP25G2AX9atW2FS8Ur03QFm6HvoPFGQg4xhNqQCiZ5myJ 9M49GFlsj6RxuZ9On+GcKJUcYmlt+U3xRqazaRo73BewUqF/9G6gm8a0dsashz0BFhs/ K8cdjkusio36ruEp/R55WbRO17Ju9ripEYsgeAMBU4KeOdj4Yydjx67umwbDWde75e/O ezRck1fzXH4d2QpqVxuf2MI0Rjn/1g4o+q5KmN0SMpxC+MpArElopaL2JWL3qegkMjkc OIx1nB3WJ5ueu4UK9h+cvxOWGBi6u3shNUoSJY3zTGCw8uMOP9kBSQ4fT6j56o3XUxEF wojA== X-Received: by 10.152.170.135 with SMTP id am7mr12038543lac.78.1442417668208; Wed, 16 Sep 2015 08:34:28 -0700 (PDT) MIME-Version: 1.0 Received: by 10.112.40.140 with HTTP; Wed, 16 Sep 2015 08:33:58 -0700 (PDT) In-Reply-To: References: From: Fabian Hueske Date: Wed, 16 Sep 2015 17:33:58 +0200 Message-ID: Subject: Re: CompactingHashTable question To: "dev@flink.apache.org" Content-Type: multipart/alternative; boundary=089e0122797abb3f7d051fdf082c --089e0122797abb3f7d051fdf082c Content-Type: text/plain; charset=UTF-8 Yes, probing the HashTable with a key that does not exist will yield a join function call with a null value (or empty iterator in case of CoGroup). The semantics of the join are the same regardless of the hash table implementation. The fact that the error only occurs with the managed HT, indicates that there is a bug somewhere :-( 2015-09-16 17:26 GMT+02:00 Vasiliki Kalavri : > Hi, > > thanks a lot Fabian! > > I didn't know that join with the solution set is an outer join. That's a > surprise :) > > So, if I understand correctly, I should have a null value when my other > input to the join contains some key that doesn't exist in the solution set, > right? That's not the case in my application; I'm not generating any new > keys. > > Also, when setting the solutionSetUnManaged option, the exception doesn't > occur anymore. Are the join semantics different when the solution set is in > unmanaged memory? > > Cheers, > Vasia. > > > On 16 September 2015 at 16:50, Fabian Hueske wrote: > > > Hi Vasia, > > > > I looked into the code. A serializer should never return null when > > deserializing. Either it does not detect that something went wrong with > the > > deserialization or it should throw an exception. > > > > Regarding the handling of null returns in the Drivers. If there is no > entry > > in the HT for a certain key, the HT will return null which is expected. > > If a CoGroupWithSolutionSet*Driver receives a null value, it gives an > empty > > iterator to the user function. The JoinWithSolutionSet*Driver calls the > > join function with a null value. Both behaviors are expected. A join > with a > > solution set is actually an outer join and a join function in such a join > > needs to be able to handle null values on the solution set side. > > > > Cheers, Fabian > > > > > > 2015-09-15 17:41 GMT+02:00 Vasiliki Kalavri : > > > > > Hello to my squirrels, > > > > > > I ran into an NPE for some iterations code and it looks like what's > > > described in FLINK-2443 < > > https://issues.apache.org/jira/browse/FLINK-2443 > > > >. > > > I'm trying to understand the problem and I could really use your help > :) > > > > > > So far, it seems that the exception is caused by a null value returned > by > > > CompactingHashTable.*getMatchFor*(PT probeSideRecord). > > > > > > This method returns null in the following cases: > > > - when the hash table is "closed" > > > - when the segment is done > > > - if the serializer actually returns a null record > > > > > > It seems that on the join/cogroup driver side there is no check or > > special > > > handling when the build side record is null, i.e. the null record is > > still > > > passed to the join function. > > > Is this correct and if not, what should the driver do in this case? > > > > > > Thank you! > > > > > > Cheers, > > > Vasia. > > > > > > --089e0122797abb3f7d051fdf082c--