Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6B3131895B for ; Thu, 16 Jul 2015 17:22:52 +0000 (UTC) Received: (qmail 79262 invoked by uid 500); 16 Jul 2015 17:22:52 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 79211 invoked by uid 500); 16 Jul 2015 17:22:52 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 79201 invoked by uid 99); 16 Jul 2015 17:22:52 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Jul 2015 17:22:52 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id A7E30C0910 for ; Thu, 16 Jul 2015 17:22:51 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.881 X-Spam-Level: ** X-Spam-Status: No, score=2.881 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, MIME_QP_LONG_LINE=0.001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id P47SklW_i7JY for ; Thu, 16 Jul 2015 17:22:42 +0000 (UTC) Received: from mail-oi0-f54.google.com (mail-oi0-f54.google.com [209.85.218.54]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 0483E43DDB for ; Thu, 16 Jul 2015 17:22:42 +0000 (UTC) Received: by oigd21 with SMTP id d21so12518966oig.1 for ; Thu, 16 Jul 2015 10:22:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:content-type:content-transfer-encoding:mime-version:subject :message-id:date:references:in-reply-to:to; bh=2PANjzBtlFalkzD/85+kSLQPEn6muOv6eRATizAmeOM=; b=pyIrmG1NvrJ1eMbxvLJ6HckA+XKgNe8W2VC5JPvL9vJ/fUsYt5WDWrNLP8EL1ItNDP 2Rd5TauaDOeXUqLU95/tx+RwSJCUTWOt6c0p5bgYHQ0nQQihj1NU/7rrHKh2IuR5+34Y ELEBNl3POfSUAOD2dw5nyrqZACBUKm6dOyTYGa6OOlPZq+Rso3QzkEusjXpPtrV0oTEA 9fh2ASMFTrMnXh7Y4LMMSp+S4fm4V1456GyJq6Qr/qaG+Sls1tdbTiBKZ0wJMzZdclMe H05oEPRg0WFLKV4ZkcO4uH/J0wlU+cM1+iX5VubRPMgYmfDm8McsTDkaIko0QnnGlWHg uAhw== X-Received: by 10.60.178.241 with SMTP id db17mr10032040oec.36.1437067354919; Thu, 16 Jul 2015 10:22:34 -0700 (PDT) Received: from [100.125.101.196] (215.sub-70-215-196.myvzw.com. [70.215.196.215]) by smtp.gmail.com with ESMTPSA id kg7sm1039199oeb.1.2015.07.16.10.22.34 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 16 Jul 2015 10:22:34 -0700 (PDT) From: Bill Slacum Content-Type: multipart/alternative; boundary=Apple-Mail-859D4D7B-2BD2-4B78-9F3A-3EEE8348526A Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (1.0) Subject: Re: AccumuloInputFormat with pyspark? Message-Id: <642C1A80-CDF1-4720-A134-E192C143889E@gmail.com> Date: Thu, 16 Jul 2015 12:22:32 -0500 References: In-Reply-To: To: "user@accumulo.apache.org" X-Mailer: iPhone Mail (12H143) --Apple-Mail-859D4D7B-2BD2-4B78-9F3A-3EEE8348526A Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable I would think the thrift proxy may have definitions for those classes, but t= hey may not map 1:1 to the regular old Java objects. I'm unfortunately not too familiar with the way Python + spark works. The bi= g thing will probably making sure whatever structs you create for the token a= nd the auths serialize in the exact same manner as the Java versions.=20 > On Jul 16, 2015, at 12:13 PM, Kina Winoto wrote:= >=20 > Thanks William! I found that function yesterday actually, but what was mor= e helpful is that I ended up building a configuration object in Scala that i= s used to connect to Accumulo and seeing the keys that way too. My next bloc= ker is that I need to build an equivalent PasswordToken object and an Author= izations object in python. Any ideas there? Is the best route to just reimpl= ement them in Python to pass to hadoop? >=20 >> On Wed, Jul 15, 2015 at 9:49 PM, William Slacum wrote= : >> Look in ConfiguratorBase for how it converts enums to config keys. These a= re the two methods that are used: >>=20 >> /** >> * Provides a configuration key for a given feature enum, prefixed by t= he implementingClass >> * >> * @param implementingClass >> * the class whose name will be used as a prefix for the prope= rty configuration key >> * @param e >> * the enum used to provide the unique part of the configurati= on key >> * @return the configuration key >> * @since 1.6.0 >> */ >> protected static String enumToConfKey(Class implementingClass, Enum<= ?> e) { >> return implementingClass.getSimpleName() + "." + e.getDeclaringClass(= ).getSimpleName() + "." + StringUtils.camelize(e.name().toLowerCase()); >> } >>=20 >> /** >> * Provides a configuration key for a given feature enum. >> * >> * @param e >> * the enum used to provide the unique part of the configurati= on key >> * @return the configuration key >> */ >> protected static String enumToConfKey(Enum e) { >> return e.getDeclaringClass().getSimpleName() + "." + StringUtils.came= lize(e.name().toLowerCase()); >> } >>=20 >>> On Wed, Jul 15, 2015 at 11:20 AM, Kina Winoto w= rote: >>> Has anyone used the python Spark API and AccumuloInputFormat? >>>=20 >>> Using AccumuloInputFormat in scala and java within spark is straightforw= ard, but the python spark API's newAPIHadoopRDD function takes in its config= uration via a python dict (https://spark.apache.org/docs/1.1.0/api/python/py= spark.context.SparkContext-class.html#newAPIHadoopRDD) and there isn't an ob= vious job configuration set of keys to use. =46rom looking at the Accumulo s= ource, it seems job configuration values are stored with keys that are java e= nums and it's unclear to me what to use for configuration keys in my python d= ict.=20 >>>=20 >>> Any thoughts as to how to do this would be helpful! >>>=20 >>> Thanks, >>>=20 >>> Kina >=20 --Apple-Mail-859D4D7B-2BD2-4B78-9F3A-3EEE8348526A Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable
I would think the thrift proxy may hav= e definitions for those classes, but they may not map 1:1 to the regular old= Java objects.

I'm unfortunately not too familiar w= ith the way Python + spark works. The big thing will probably making sure wh= atever structs you create for the token and the auths serialize in the exact= same manner as the Java versions. 



On Jul 16= , 2015, at 12:13 PM, Kina Winoto <winoto.kina.s@gmail.com> wrote:

Thanks William! I found that function yesterday a= ctually, but what was more helpful is that I ended up building a configurati= on object in Scala that is used to connect to Accumulo and seeing the keys t= hat way too. My next blocker is that I need to build an equivalent PasswordT= oken object and an Authorizations object in python. Any ideas there? Is the b= est route to just reimplement them in Python to pass to hadoop?

On Wed, Jul 15, 2015 at 9= :49 PM, William Slacum <wslacum@gmail.com> wrote:
Look in ConfiguratorBase for how it co= nverts enums to config keys. These are the two methods that are used:
  /**
   * Provides a configuration key for a given featu= re enum, prefixed by the implementingClass
   *
  = * @param implementingClass
   *     &= nbsp;    the class whose name will be used as a prefix for th= e property configuration key
   * @param e
   *&nb= sp;         the enum used to provide= the unique part of the configuration key
   * @return the conf= iguration key
   * @since 1.6.0
   */
  pr= otected static String enumToConfKey(Class<?> implementingClass, Enum&l= t;?> e) {
    return implementingClass.getSimpleName() += "." + e.getDeclaringClass().getSimpleName() + "." + StringUtils.camelize(e.name().toLowerCase());
&n= bsp; }

  /**
   * Provides a configuration key for a= given feature enum.
   *
   * @param e
 &= nbsp; *          the enum used t= o provide the unique part of the configuration key
   * @return= the configuration key
   */
  protected static String e= numToConfKey(Enum<?> e) {
    return e.getDeclaringC= lass().getSimpleName() + "." + StringUtils.camelize(e.name().toLowerCase());
  }

On Wed, Jul 15, 2015 at 11:20 AM, Kina Winoto <winoto.ki= na.s@gmail.com> wrote:
Has anyone used the python Spark API and AccumuloInputFormat= ?

Using AccumuloInputFormat in scala and java within spark= is straightforward, but the python spark API's newAPIHadoopRDD function tak= es in its configuration via a python dict (https://spark.apache.org/docs/1.1.0/api/python= /pyspark.context.SparkContext-class.html#newAPIHadoopRDD) and there isn'= t an obvious job configuration set of keys to use. =46rom looking at the Acc= umulo source, it seems job configuration values are stored with keys that ar= e java enums and it's unclear to me what to use for configuration keys in my= python dict.

Any thoughts as to how to do t= his would be helpful!

Thanks,

<= /div>
Kina




= --Apple-Mail-859D4D7B-2BD2-4B78-9F3A-3EEE8348526A--