crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vincent Fabro <vincent.fabro.nu...@gmail.com>
Subject (CRUNCH-184) Gora backend implementation: first attempt
Date Mon, 25 May 2015 08:06:50 GMT
Dear all

A patch for a crude Gora backend implementation is attached. I copy-pasted
the HBase implementation and made modifications.

I have questions to push it further:

- HBaseSourceTarget implements TableSource<..., ...>, but
GoraSourceTarget implements Source<Pair<K, V>>, Gora DataStore is a map and
not a multimap. Should it be a TableSource anyway ?

- I made simple examples in GoraSourceIT (will be removed, no proper tests
yet). You can read/write to a GoraSourceTarget when using MemPipeline, but
MRPipeline gives the following error when reading from a Gora MemStore
(GoraSourceIT.testGoraTarget()):
1035 [main] WARN  org.apache.hadoop.util.NativeCodeLoader  - Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable
2205 [Thread-2] WARN  org.apache.hadoop.mapreduce.JobSubmitter  - Hadoop
command-line option parsing not performed. Implement the Tool interface and
execute your application with ToolRunner to remedy this.
2207 [Thread-2] WARN  org.apache.hadoop.mapreduce.JobSubmitter  - No job
jar file set.  User classes may not be found. See Job or Job#setJar(String).
2925 [Thread-2] INFO
org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob  -
Running job "org.apache.crunch.io.gora.GoraSourceIT:
GoraDataStore(org.apache.gora.memory.store.MemStore@2b3b2... ID=1 (1/1)"
2925 [Thread-2] INFO
org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob  -
Job status available at: http://localhost:8080/
java.util.NoSuchElementException
    at java.util.TreeMap.key(TreeMap.java:1221)
    at java.util.TreeMap.firstKey(TreeMap.java:285)
    at org.apache.gora.memory.store.MemStore.execute(MemStore.java:125)
    at org.apache.gora.query.impl.QueryBase.execute(QueryBase.java:73)
    at
org.apache.gora.mapreduce.GoraRecordReader.executeQuery(GoraRecordReader.java:68)
    at
org.apache.gora.mapreduce.GoraRecordReader.nextKeyValue(GoraRecordReader.java:110)
    at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)
    at
org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
    at
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
    at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:235)
    at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

- Should there be an equivalent to HBaseTypes.puts and HBaseTypes.deletes
with Gora?

- When Crunch was imported to Eclipse, the following problem appeared in
crunch-hbase/pom.xml:
Plugin execution not covered by lifecycle configuration:
 org.apache.maven.plugins:maven-dependency-plugin:2.8:build-classpath
 (execution: create-mrapp-generated-classpath, phase: generate-test-
 resources)
What could be the reason (for the moment I let Eclipse automatically fix
the problem) ?

- More generally, what about code quality? (still junior...)

I don't know if it's headed in the right place, so thanks in advance for
your directions.

Vincent

Mime
View raw message