accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Billie Rinaldi <billie.rina...@gmail.com>
Subject Re: spark with AccumuloRowInputFormat?
Date Mon, 04 May 2015 16:33:59 GMT
http://stackoverflow.com/questions/24450540/how-to-use-sparks-newapihadooprdd-from-java
This issue seems to indicate your analysis is correct, that the compilation
error has to do with there being an intermediate AbstractInputFormat.  I'd
be curious whether a different compiler would help or not, as they suggest.


On Mon, May 4, 2015 at 8:46 AM, Marc Reichman <mreichman@pixelforensics.com>
wrote:

> Has anyone done any testing with Spark and AccumuloRowInputFormat? I have
> no problem doing this for AccumuloInputFormat:
>
> JavaPairRDD<Key, Value> pairRDD = sparkContext.newAPIHadoopRDD(job.getConfiguration(),
>         AccumuloInputFormat.class,
>         Key.class, Value.class);
>
> But I run into a snag trying to do a similar thing:
>
> JavaPairRDD<Text, PeekingIterator<Map.Entry<Key, Value>>> pairRDD =
sparkContext.newAPIHadoopRDD(job.getConfiguration(),
>         AccumuloRowInputFormat.class,
>         Text.class, PeekingIterator.class);
>
> The compilation error is (big, sorry):
>
> Error:(141, 97) java: method newAPIHadoopRDD in class org.apache.spark.api.java.JavaSparkContext
cannot be applied to given types;
>   required: org.apache.hadoop.conf.Configuration,java.lang.Class<F>,java.lang.Class<K>,java.lang.Class<V>
>   found: org.apache.hadoop.conf.Configuration,java.lang.Class<org.apache.accumulo.core.client.mapreduce.AccumuloRowInputFormat>,java.lang.Class<org.apache.hadoop.io.Text>,java.lang.Class<org.apache.accumulo.core.util.PeekingIterator>
>   reason: inferred type does not conform to declared bound(s)
>     inferred: org.apache.accumulo.core.client.mapreduce.AccumuloRowInputFormat
>     bound(s): org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.Text,org.apache.accumulo.core.util.PeekingIterator>
>
> I've tried a few things, the signature of the function is:
>
> public <K, V, F extends org.apache.hadoop.mapreduce.InputFormat<K, V>> JavaPairRDD<K,
V> newAPIHadoopRDD(Configuration conf, Class<F> fClass, Class<K> kClass, Class<V>
vClass)
>
> I guess it's having trouble with the format extending InputFormatBase with
> its own additional generic parameters (the Map.Entry inside
> PeekingIterator).
>
> This may be an issue to chase with Spark vs Accumulo, unless something can
> be tweaked on the Accumulo side or I could wrap the InputFormat with my own
> somehow.
>
> Accumulo 1.6.1, Spark 1.3.1, JDK 7u71.
>
> Stopping short of this, can anyone think of a good way to use
> AccumuloInputFormat to get what I'm getting from the Row version in a
> performant way? It doesn't necessarily have to be an iterator approach, but
> I'd need all my values with the key in one consuming function. I'm looking
> into ways to do it in spark functions but trying to avoid any major
> performance hits.
>
> Thanks,
>
> Marc
>
> p.s. The summit was absolutely great, thank you all for having it!
>
>

Mime
View raw message