accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Slacum <wsla...@gmail.com>
Subject Re: AccumuloInputFormat with pyspark?
Date Thu, 16 Jul 2015 17:22:32 GMT
I would think the thrift proxy may have definitions for those classes, but they may not map
1:1 to the regular old Java objects.

I'm unfortunately not too familiar with the way Python + spark works. The big thing will probably
making sure whatever structs you create for the token and the auths serialize in the exact
same manner as the Java versions. 



> On Jul 16, 2015, at 12:13 PM, Kina Winoto <winoto.kina.s@gmail.com> wrote:
> 
> Thanks William! I found that function yesterday actually, but what was more helpful is
that I ended up building a configuration object in Scala that is used to connect to Accumulo
and seeing the keys that way too. My next blocker is that I need to build an equivalent PasswordToken
object and an Authorizations object in python. Any ideas there? Is the best route to just
reimplement them in Python to pass to hadoop?
> 
>> On Wed, Jul 15, 2015 at 9:49 PM, William Slacum <wslacum@gmail.com> wrote:
>> Look in ConfiguratorBase for how it converts enums to config keys. These are the
two methods that are used:
>> 
>>   /**
>>    * Provides a configuration key for a given feature enum, prefixed by the implementingClass
>>    *
>>    * @param implementingClass
>>    *          the class whose name will be used as a prefix for the property configuration
key
>>    * @param e
>>    *          the enum used to provide the unique part of the configuration key
>>    * @return the configuration key
>>    * @since 1.6.0
>>    */
>>   protected static String enumToConfKey(Class<?> implementingClass, Enum<?>
e) {
>>     return implementingClass.getSimpleName() + "." + e.getDeclaringClass().getSimpleName()
+ "." + StringUtils.camelize(e.name().toLowerCase());
>>   }
>> 
>>   /**
>>    * Provides a configuration key for a given feature enum.
>>    *
>>    * @param e
>>    *          the enum used to provide the unique part of the configuration key
>>    * @return the configuration key
>>    */
>>   protected static String enumToConfKey(Enum<?> e) {
>>     return e.getDeclaringClass().getSimpleName() + "." + StringUtils.camelize(e.name().toLowerCase());
>>   }
>> 
>>> On Wed, Jul 15, 2015 at 11:20 AM, Kina Winoto <winoto.kina.s@gmail.com>
wrote:
>>> Has anyone used the python Spark API and AccumuloInputFormat?
>>> 
>>> Using AccumuloInputFormat in scala and java within spark is straightforward,
but the python spark API's newAPIHadoopRDD function takes in its configuration via a python
dict (https://spark.apache.org/docs/1.1.0/api/python/pyspark.context.SparkContext-class.html#newAPIHadoopRDD)
and there isn't an obvious job configuration set of keys to use. From looking at the Accumulo
source, it seems job configuration values are stored with keys that are java enums and it's
unclear to me what to use for configuration keys in my python dict. 
>>> 
>>> Any thoughts as to how to do this would be helpful!
>>> 
>>> Thanks,
>>> 
>>> Kina
> 

Mime
View raw message