accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kina Winoto <winoto.kin...@gmail.com>
Subject Re: AccumuloInputFormat with pyspark?
Date Thu, 16 Jul 2015 17:13:35 GMT
Thanks William! I found that function yesterday actually, but what was more
helpful is that I ended up building a configuration object in Scala that is
used to connect to Accumulo and seeing the keys that way too. My next
blocker is that I need to build an equivalent PasswordToken object and an
Authorizations object in python. Any ideas there? Is the best route to just
reimplement them in Python to pass to hadoop?

On Wed, Jul 15, 2015 at 9:49 PM, William Slacum <wslacum@gmail.com> wrote:

> Look in ConfiguratorBase for how it converts enums to config keys. These
> are the two methods that are used:
>
>   /**
>    * Provides a configuration key for a given feature enum, prefixed by
> the implementingClass
>    *
>    * @param implementingClass
>    *          the class whose name will be used as a prefix for the
> property configuration key
>    * @param e
>    *          the enum used to provide the unique part of the
> configuration key
>    * @return the configuration key
>    * @since 1.6.0
>    */
>   protected static String enumToConfKey(Class<?> implementingClass,
> Enum<?> e) {
>     return implementingClass.getSimpleName() + "." +
> e.getDeclaringClass().getSimpleName() + "." + StringUtils.camelize(e.name
> ().toLowerCase());
>   }
>
>   /**
>    * Provides a configuration key for a given feature enum.
>    *
>    * @param e
>    *          the enum used to provide the unique part of the
> configuration key
>    * @return the configuration key
>    */
>   protected static String enumToConfKey(Enum<?> e) {
>     return e.getDeclaringClass().getSimpleName() + "." +
> StringUtils.camelize(e.name().toLowerCase());
>   }
>
> On Wed, Jul 15, 2015 at 11:20 AM, Kina Winoto <winoto.kina.s@gmail.com>
> wrote:
>
>> Has anyone used the python Spark API and AccumuloInputFormat?
>>
>> Using AccumuloInputFormat in scala and java within spark is
>> straightforward, but the python spark API's newAPIHadoopRDD function takes
>> in its configuration via a python dict (
>> https://spark.apache.org/docs/1.1.0/api/python/pyspark.context.SparkContext-class.html#newAPIHadoopRDD)
>> and there isn't an obvious job configuration set of keys to use. From
>> looking at the Accumulo source, it seems job configuration values are
>> stored with keys that are java enums and it's unclear to me what to use for
>> configuration keys in my python dict.
>>
>> Any thoughts as to how to do this would be helpful!
>>
>> Thanks,
>>
>> Kina
>>
>>
>>
>

Mime
View raw message