From dev-return-37100-archive-asf-public=cust-asf.ponee.io@ignite.apache.org Fri Jul 27 15:31:34 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id CFD43180657 for ; Fri, 27 Jul 2018 15:31:33 +0200 (CEST) Received: (qmail 7226 invoked by uid 500); 27 Jul 2018 13:31:32 -0000 Mailing-List: contact dev-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@ignite.apache.org Delivered-To: mailing list dev@ignite.apache.org Received: (qmail 7214 invoked by uid 99); 27 Jul 2018 13:31:32 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Jul 2018 13:31:32 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id A3568C1B0E for ; Fri, 27 Jul 2018 13:31:31 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.97 X-Spam-Level: * X-Spam-Status: No, score=1.97 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, T_DKIMWL_WL_MED=-0.01] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=stuwee-org.20150623.gappssmtp.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id xg5MWvU7DXfU for ; Fri, 27 Jul 2018 13:31:29 +0000 (UTC) Received: from mail-wm0-f51.google.com (mail-wm0-f51.google.com [74.125.82.51]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 4AE665F107 for ; Fri, 27 Jul 2018 13:31:29 +0000 (UTC) Received: by mail-wm0-f51.google.com with SMTP id n11-v6so5476436wmc.2 for ; Fri, 27 Jul 2018 06:31:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=stuwee-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:message-id:in-reply-to:references:subject:mime-version; bh=mjJ3RHOhZ/TBqCbyOGo3gkxRH5AYtlZGfsGGPYp2O2s=; b=iaWP+iUPrHn24VaHd826e8AQggPf8ogPYcJEgtUeTHHZdtVfAw4o2PWd1C6b3lN2U/ QohgX8/SiyJe4XuuD3FBy4ygq5JUD2p8MLJ1nv7SgpFt+nrct1Z3CJYgBq9T5bngWBrN 9Wu5kq3XwN1bRJoYQeNcy4QNbOfHWng4fj5NiifzOXDxtnBPs7j96dWTiBHq418A9PfW pRT+fOBrppFXdiuOg10id5jKdQP2ZPbq7sSAjVc+nfIqyX+kzspXnXyLdY2Nr6l4qoxn 30TBn6a6wiSdxdB4RBevnZJRZ+yZI+Krc5WvG8rRsf1vT5+mr+h83xFIZrMP3TOwz88l 8ygw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:message-id:in-reply-to:references :subject:mime-version; bh=mjJ3RHOhZ/TBqCbyOGo3gkxRH5AYtlZGfsGGPYp2O2s=; b=SClwOasp6y0Q+JpG8ayDNDM3Hn6S4B2+pojeRNgs09yvT/URrJy4Nlx9N3eIANZqjc Xu1PEXH9I9N5I1JKDVLz/71bglFs6UNWLDqTqf8qMoDq88qbOHMCoxz4A7qLaJaInoxZ hm7zhYZ7ljNCh6io9C5fpcueJdFqkfeLL+9wGtNhVRRy4xjItIDEUK4DIQFh6nr8rLCN VPvXb1+rxlpNpgRZmSFUbtOjBxyXHNo7T2m9N1uECuS1Xi6vy9RgPAz5dfkq1BjAaLjM vBJXyOzJ/uytsIxL4U6d7QG7y1TJJhDWMifsoBneYEdJBGo25g6HsvR0Wo3f7iXPxA/B 8nOg== X-Gm-Message-State: AOUpUlFnJ5f5y1ObN2dati0yYFBaQGLVSL6DcICQXawtwsw+TNzngf6p 4Rz2XHIHZDOxz4HMLtNbI1R512lC798= X-Google-Smtp-Source: AAOMgpc00h/MApA8HoqnZtGYdTfQ56pxDke99ynEND6vCx8eHs8hlHTtsQa9tQffTTcMqvece/4wCg== X-Received: by 2002:a1c:9290:: with SMTP id u138-v6mr4562721wmd.52.1532698282015; Fri, 27 Jul 2018 06:31:22 -0700 (PDT) Received: from [192.168.0.17] (cpc119498-heme14-2-0-cust229.9-1.cable.virginm.net. [82.13.57.230]) by smtp.gmail.com with ESMTPSA id b6-v6sm6100075wru.66.2018.07.27.06.31.20 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 27 Jul 2018 06:31:21 -0700 (PDT) Date: Fri, 27 Jul 2018 14:31:19 +0100 From: Stuart Macdonald To: dev@ignite.apache.org Message-ID: <93689565832C457580F8B479E47F4088@stuwee.org> In-Reply-To: References: Subject: Re: Spark DataFrames With Cache Key and Value Objects X-Mailer: sparrow 1.6.4 (build 1178) MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="5b5b1ea7_43f18422_15f" --5b5b1ea7_43f18422_15f Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Here=E2=80=99s the ticket: =20 https://issues.apache.org/jira/browse/IGNITE-9108 Stuart. =20 On =46riday, 27 July 2018 at 14:19, Nikolay Izhikov wrote: > Sure. > =20 > Please, send ticket number in this thread. > =20 > =D0=BF=D1=82, 27 =D0=B8=D1=8E=D0=BB=D1=8F 2018 =D0=B3., 16:16 Stuart Ma= cdonald : > =20 > > Thanks Nikolay. =46or both options if the cache object isn=E2=80=99t = a simple type, > > we=E2=80=99d probably do something like this in our Ignite SQL statem= ent: > > =20 > > select cast(=5Fkey as binary), cast(=5Fval as binary), ... > > =20 > > Which would give us the BinaryObject=E2=80=99s byte=5B=5D, then for o= ption 1 we keep > > the Ignite format and introduce a new Spark Encoder for Ignite binary= types > > ( > > =20 > > https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Enc= oder.html > > ), > > so that the end user interface would be something like: > > =20 > > IgniteSparkSession session =3D ... > > Dataset data=46rame =3D ... > > Dataset valDataSet =3D > > data=46rame.select(=E2=80=9C=5Fval=5F).as(session.binaryObjectEncoder= (MyValClass.class)) > > =20 > > Or for option 2 we have a behind-the-scenes Ignite-to-Kryo UD=46 so t= hat the > > user interface would be standard Spark: > > =20 > > Dataset data=46rame =3D ... > > DataSet dataSet =3D > > data=46rame.select(=E2=80=9C=5Fval=5F).as(Encoders.kryo(MyValClass.cl= ass)) > > =20 > > I=E2=80=99ll create a ticket and maybe put together a test case for f= urther > > discussion=3F > > =20 > > Stuart. > > =20 > > On 27 Jul 2018, at 09:50, Nikolay Izhikov wrote: > > =20 > > Hello, Stuart. > > =20 > > I like your idea. > > =20 > > 1. Ignite BinaryObjects, in which case we=E2=80=99d need to supply a = Spark Encoder > > implementation for BinaryObjects > > =20 > > 2. Kryo-serialised versions of the objects. > > =20 > > =20 > > Seems like first option is simple adapter. Am I right=3F > > If yes, I think it's a more efficient way comparing with transformati= on of > > each object to some other(Kryo) format. > > =20 > > Can you provide some additional links for both options=3F > > Where I can find API or(and) examples=3F > > =20 > > As a second step, we can apply same approach to the regular key, valu= e > > caches. > > =20 > > =46eel free to create a ticket. > > =20 > > =D0=92 =D0=9F=D1=82, 27/07/2018 =D0=B2 09:37 +0100, Stuart Macdonald = =D0=BF=D0=B8=D1=88=D0=B5=D1=82: > > =20 > > Ignite Dev Community, > > =20 > > =20 > > Within Ignite-supplied Spark Data=46rames, I=E2=80=99d like to propos= e adding support > > =20 > > for =5Fkey and =5Fval columns which represent the cache key and value= objects > > =20 > > similar to the current =5Fkey/=5Fval column semantics in Ignite SQL. > > =20 > > =20 > > If the cache key or value objects are standard SQL types (eg. String,= Int, > > =20 > > etc) they will be represented as such in the Data=46rame schema, othe= rwise > > =20 > > they are represented as Binary types encoded as either: 1. Ignite > > =20 > > BinaryObjects, in which case we=E2=80=99d need to supply a Spark Enco= der > > =20 > > implementation for BinaryObjects, or 2. Kryo-serialised versions of t= he > > =20 > > objects. Option 1 would probably be more efficient but option 2 would= be > > =20 > > more idiomatic Spark. > > =20 > > =20 > > This feature would be controlled with an optional parameter in the Ig= nite > > =20 > > data source, defaulting to the current implementation which doesn=E2=80= =99t supply > > =20 > > =5Fkey or =5Fval columns. The rationale behind this is the same as th= e Ignite > > =20 > > SQL =5Fkey and =5Fval columns: to allow access to the full cache obje= cts from a > > =20 > > SQL context. > > =20 > > =20 > > Can I ask for feedback on this proposal please=3F > > =20 > > =20 > > I=E2=80=99d be happy to contribute this feature if we agree on the co= ncept. > > =20 > > =20 > > Stuart. =20 --5b5b1ea7_43f18422_15f--