accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aji Janis <aji1...@gmail.com>
Subject Re: importdirectory in accumulo
Date Fri, 05 Apr 2013 15:36:56 GMT
I agree with you that changing HADOOP_CLASSPATH like you said should be
done. I couldn't quite do that just yet (people have jobs running and don't
want to risk it).

However, I did a work around. (I am going off the theory that my
Hadoop_classpath is bad so it can't accept all the libraries I am passing
to it so I decided to package all the libraries I needed into a jar.
http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/)
I downloaded the source code and made a shaded (uber) jar to include all
the libraries I needed. Then I submitted the hadoop job with my uber jar
like any other map reduce job. My mappers and reducers finish the job but I
got an exception for waitForTableOperation. I think this proves my theory
of bad classpath but clearly I have more issues to deal with. If you have
any suggestions on how to even debug that would be awesome!

My console output(removed a lot of server specific stuff for security) is
below. I modified BulkIngestExample.java to add some print statements.
Modified lines shown below also.


[user@nodebulk]$ /opt/hadoop/bin/hadoop jar uber-BulkIngestExample.jar
instance zookeepers user password table inputdir tmp/bulk

3/04/05 11:20:52 INFO input.FileInputFormat: Total input paths to process :
1
13/04/05 11:20:53 INFO mapred.JobClient: Running job: job_201304021611_0045
13/04/05 11:20:54 INFO mapred.JobClient:  map 0% reduce 0%
13/04/05 11:21:10 INFO mapred.JobClient:  map 100% reduce 0%
13/04/05 11:21:25 INFO mapred.JobClient:  map 100% reduce 50%
13/04/05 11:21:26 INFO mapred.JobClient:  map 100% reduce 100%
13/04/05 11:21:31 INFO mapred.JobClient: Job complete: job_201304021611_0045
13/04/05 11:21:31 INFO mapred.JobClient: Counters: 25
13/04/05 11:21:31 INFO mapred.JobClient:   Job Counters
13/04/05 11:21:31 INFO mapred.JobClient:     Launched reduce tasks=2
13/04/05 11:21:31 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=15842
13/04/05 11:21:31 INFO mapred.JobClient:     Total time spent by all
reduces waiting after reserving slots (ms)=0
13/04/05 11:21:31 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0
13/04/05 11:21:31 INFO mapred.JobClient:     Rack-local map tasks=1
13/04/05 11:21:31 INFO mapred.JobClient:     Launched map tasks=1
13/04/05 11:21:31 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=25891
13/04/05 11:21:31 INFO mapred.JobClient:   File Output Format Counters
13/04/05 11:21:31 INFO mapred.JobClient:     Bytes Written=496
13/04/05 11:21:31 INFO mapred.JobClient:   FileSystemCounters
13/04/05 11:21:31 INFO mapred.JobClient:     FILE_BYTES_READ=312
13/04/05 11:21:31 INFO mapred.JobClient:     HDFS_BYTES_READ=421
13/04/05 11:21:31 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=68990
13/04/05 11:21:31 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=496
13/04/05 11:21:31 INFO mapred.JobClient:   File Input Format Counters
13/04/05 11:21:31 INFO mapred.JobClient:     Bytes Read=280
13/04/05 11:21:31 INFO mapred.JobClient:   Map-Reduce Framework
13/04/05 11:21:31 INFO mapred.JobClient:     Reduce input groups=10
13/04/05 11:21:31 INFO mapred.JobClient:     Map output materialized
bytes=312
13/04/05 11:21:31 INFO mapred.JobClient:     Combine output records=0
13/04/05 11:21:31 INFO mapred.JobClient:     Map input records=10
13/04/05 11:21:31 INFO mapred.JobClient:     Reduce shuffle bytes=186
13/04/05 11:21:31 INFO mapred.JobClient:     Reduce output records=10
13/04/05 11:21:31 INFO mapred.JobClient:     Spilled Records=20
13/04/05 11:21:31 INFO mapred.JobClient:     Map output bytes=280
13/04/05 11:21:31 INFO mapred.JobClient:     Combine input records=0
13/04/05 11:21:31 INFO mapred.JobClient:     Map output records=10
13/04/05 11:21:31 INFO mapred.JobClient:     SPLIT_RAW_BYTES=141
13/04/05 11:21:31 INFO mapred.JobClient:     Reduce input records=10

Here is the exception caught:
org.apache.accumulo.core.client.AccumuloException: Internal error
processing waitForTableOperation

E.getMessage returns:
Internal error processing waitForTableOperation
Exception in thread "main" java.lang.RuntimeException:
org.apache.accumulo.core.client.AccumuloException: Internal error
processing waitForTableOperation
        at
org.apache.accumulo.examples.simple.mapreduce.bulk.BulkIngestExample.run(BulkIngestExample.java:151)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at
org.apache.accumulo.examples.simple.mapreduce.bulk.BulkIngestExample.main(BulkIngestExample.java:166)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: org.apache.accumulo.core.client.AccumuloException: Internal
error processing waitForTableOperation
        at
org.apache.accumulo.core.client.admin.TableOperationsImpl.doTableOperation(TableOperationsImpl.java:290)
        at
org.apache.accumulo.core.client.admin.TableOperationsImpl.doTableOperation(TableOperationsImpl.java:258)
        at
org.apache.accumulo.core.client.admin.TableOperationsImpl.importDirectory(TableOperationsImpl.java:945)
        at
org.apache.accumulo.examples.simple.mapreduce.bulk.BulkIngestExample.run(BulkIngestExample.java:146)
        ... 7 more
Caused by: org.apache.thrift.TApplicationException: Internal error
processing waitForTableOperation
        at
org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
        at
org.apache.accumulo.core.master.thrift.MasterClientService$Client.recv_waitForTableOperation(MasterClientService.java:684)
        at
org.apache.accumulo.core.master.thrift.MasterClientService$Client.waitForTableOperation(MasterClientService.java:665)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at
org.apache.accumulo.cloudtrace.instrument.thrift.TraceWrap$2.invoke(TraceWrap.java:84)
        at $Proxy5.waitForTableOperation(Unknown Source)
        at
org.apache.accumulo.core.client.admin.TableOperationsImpl.waitForTableOperation(TableOperationsImpl.java:230)
        at
org.apache.accumulo.core.client.admin.TableOperationsImpl.doTableOperation(TableOperationsImpl.java:272)
        ... 10 more
[user@nodebulk]$


Modification in BulkIngestExample

line 146     connector.tableOperations().importDirectory(tableName, workDir
+ "/files", workDir + "/failures", false);

    } catch (Exception e) {
      System.out.println("\nHere is the exception caught:\n"+ e);
      System.out.println("\nE.getMessage returns:\n"+ e.getMessage());
line 151      throw new RuntimeException(e);
    } finally {
      if (out != null)
        out.close();
line 166     int res = ToolRunner.run(CachedConfiguration.getInstance(),
new BulkIngestExample(), args);


On Thu, Apr 4, 2013 at 3:51 PM, Billie Rinaldi <billie@apache.org> wrote:

> On Thu, Apr 4, 2013 at 12:26 PM, Aji Janis <aji1705@gmail.com> wrote:
>
>> I haven't tried the classpath option yet, but I executed the below
>> command as hadoop user ... this seemed to be the command that accumulo was
>> trying to execute anyway and I am not sure but I would think this should
>> have avoided the custom classpath issue... Right/Wrong?
>>
>
> No, the jar needs to be both in the libjars and on the classpath.  There
> are classes that need to be accessed on the local machine in the process of
> submitting the MapReduce job, and this only can see the classpath, not the
> libjars.
>
> The HADOOP_CLASSPATH you have is unusual.  More often, HADOOP_CLASSPATH is
> not set at all in hadoop-env.sh, but if it is it should generally be of the
> form newstuff:$HADOOP_CLASSPATH to avoid this issue.
>
> You will have to restart Hadoop after making the change to hadoop-env.sh.
>
> Billie
>
>
>
>>
>>
>> Got the same error:
>> *[hadoop@node]$ /opt/hadoop/bin/hadoop jar
>> /opt/accumulo/lib/examples-simple-1.4.2.jar
>> org.apache.accumulo.examples.simple.mapreduce.bulk.BulkIngestExample
>> -libjars
>> "/opt/accumulo/lib/libthrift-0.6.1.jar,/opt/accumulo/lib/accumulo-core-1.4.2.jar,/opt/zookeeper/zookeeper-3.3.3.jar,/opt/accumulo/lib/cloudtrace-1.4.2.jar,/opt/accumulo/lib/commons-collections-3.2.jar,/opt/accumulo/lib/commons-configuration-1.5.jar,/opt/accumulo/lib/commons-io-1.4.jar,/opt/accumulo/lib/commons-jci-core-1.0.jar,/opt/accumulo/lib/commons-jci-fam-1.0.jar,/opt/accumulo/lib/commons-lang-2.4.jar,/opt/accumulo/lib/commons-logging-1.0.4.jar,/opt/accumulo/lib/commons-logging-api-1.0.4.jar"
>> *
>>  *
>> *
>> Exception in thread "main" java.lang.NoClassDefFoundError:
>> org/apache/accumulo/core/client/Instance
>>         at java.lang.Class.forName0(Native Method)
>>         at java.lang.Class.forName(Class.java:264)
>>         at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.accumulo.core.client.Instance
>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>>         ... 3 more
>>
>>
>>
>> On Thu, Apr 4, 2013 at 2:51 PM, Billie Rinaldi <billie@apache.org> wrote:
>>
>>> On Thu, Apr 4, 2013 at 11:41 AM, Aji Janis <aji1705@gmail.com> wrote:
>>>
>>>> *[accumulo@node accumulo]$ cat /opt/hadoop/conf/hadoop-env.sh | grep
>>>> HADOOP_CLASSPATH*
>>>> export HADOOP_CLASSPATH=./:/conf:/build/*:
>>>>
>>>
>>> To preserve custom HADOOP_CLASSPATHs, this line should be:
>>> export HADOOP_CLASSPATH=./:/conf:/build/*:$HADOOP_CLASSPATH
>>>
>>> Billie
>>>
>>>
>>>
>>>>
>>>> looks like it is overwriting everything. Isn't this the default
>>>> behavior? Is you hadoop-env.sh missing that line?
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Apr 4, 2013 at 2:25 PM, Billie Rinaldi <billie@apache.org>wrote:
>>>>
>>>>> On Thu, Apr 4, 2013 at 10:27 AM, Aji Janis <aji1705@gmail.com> wrote:
>>>>>
>>>>>> I thought about the permissions issue too. All the accumulo stuff is
>>>>>> under accumulo user so I started running the commands as accumulo ... only
>>>>>> to get the same result.
>>>>>> -The errors happen right away
>>>>>> -the box has both accumulo and hadoop on it
>>>>>> -the jar contains the instance class. But note that the instance
>>>>>> class is part of accumulo-core and not examples-simple-1.4.2.jar .... (can
>>>>>> this be the issue?)
>>>>>>
>>>>>
>>>>> No, that isn't the issue.  tool.sh is finding the accumulo-core jar
>>>>> and putting it on the HADOOP_CLASSPATH and in the libjars.
>>>>>
>>>>> I wonder if your hadoop environment is set up to override the
>>>>> HADOOP_CLASSPATH.  Check in your hadoop-env.sh to see if HADOOP_CLASSPATH
>>>>> is set there.
>>>>>
>>>>> The reason your commands of the form "tool.sh lib/*jar" aren't working
>>>>> is that the regex is finding multiple jars and putting them all on the
>>>>> command line.  tool.sh expects at most one jar followed by a class name, so
>>>>> whatever jar comes second when the regex is expanded is being interpreted
>>>>> as a class name.
>>>>>
>>>>> Billie
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Commands I ran:
>>>>>>
>>>>>> *[accumulo@node accumulo]$ whoami*
>>>>>> accumulo
>>>>>> *[accumulo@node accumulo]$ ls -l*
>>>>>> total 184
>>>>>> drwxr-xr-x 2 accumulo accumulo  4096 Apr  4 10:25 bin
>>>>>> -rwxr-xr-x 1 accumulo accumulo 24263 Oct 22 15:30 CHANGES
>>>>>> drwxr-xr-x 3 accumulo accumulo  4096 Apr  3 10:17 conf
>>>>>> drwxr-xr-x 2 accumulo accumulo  4096 Jan 15 13:35 contrib
>>>>>> -rwxr-xr-x 1 accumulo accumulo   695 Nov 18  2011 DISCLAIMER
>>>>>> drwxr-xr-x 5 accumulo accumulo  4096 Jan 15 13:35 docs
>>>>>> drwxr-xr-x 4 accumulo accumulo  4096 Jan 15 13:35 lib
>>>>>> -rwxr-xr-x 1 accumulo accumulo 56494 Mar 21  2012 LICENSE
>>>>>> drwxr-xr-x 2 accumulo accumulo 12288 Apr  3 14:43 logs
>>>>>> -rwxr-xr-x 1 accumulo accumulo  2085 Mar 21  2012 NOTICE
>>>>>> -rwxr-xr-x 1 accumulo accumulo 27814 Oct 17 08:32 pom.xml
>>>>>> -rwxr-xr-x 1 accumulo accumulo 12449 Oct 17 08:32 README
>>>>>> drwxr-xr-x 9 accumulo accumulo  4096 Nov  8 13:40 src
>>>>>> drwxr-xr-x 5 accumulo accumulo  4096 Nov  8 13:40 test
>>>>>> drwxr-xr-x 2 accumulo accumulo  4096 Apr  4 09:09 walogs
>>>>>> *[accumulo@node accumulo]$ ls bin/*
>>>>>> accumulo           check-slaves  etc_initd_accumulo  start-all.sh
>>>>>> start-server.sh  stop-here.sh    tdown.sh  tup.sh
>>>>>> catapultsetup.acc  config.sh     LogForwarder.sh     start-here.sh
>>>>>>  stop-all.sh      stop-server.sh  tool.sh   upgrade.sh
>>>>>> *[accumulo@node accumulo]$ ls lib/*
>>>>>> accumulo-core-1.4.2.jar            accumulo-start-1.4.2.jar
>>>>>>  commons-collections-3.2.jar    commons-logging-1.0.4.jar
>>>>>>  jline-0.9.94.jar
>>>>>> accumulo-core-1.4.2-javadoc.jar    accumulo-start-1.4.2-javadoc.jar
>>>>>>  commons-configuration-1.5.jar  commons-logging-api-1.0.4.jar
>>>>>>  libthrift-0.6.1.jar
>>>>>> accumulo-core-1.4.2-sources.jar    accumulo-start-1.4.2-sources.jar
>>>>>>  commons-io-1.4.jar             examples-simple-1.4.2.jar
>>>>>>  log4j-1.2.16.jar
>>>>>> accumulo-server-1.4.2.jar          cloudtrace-1.4.2.jar
>>>>>>  commons-jci-core-1.0.jar       examples-simple-1.4.2-javadoc.jar  native
>>>>>> accumulo-server-1.4.2-javadoc.jar  cloudtrace-1.4.2-javadoc.jar
>>>>>>  commons-jci-fam-1.0.jar        examples-simple-1.4.2-sources.jar
>>>>>>  wikisearch-ingest-1.4.2-javadoc.jar
>>>>>> accumulo-server-1.4.2-sources.jar  cloudtrace-1.4.2-sources.jar
>>>>>>  commons-lang-2.4.jar           ext
>>>>>>  wikisearch-query-1.4.2-javadoc.jar
>>>>>>
>>>>>> *[accumulo@node accumulo]$ jar -tf
>>>>>> /opt/accumulo/lib/accumulo-core-1.4.2.jar | grep
>>>>>> org/apache/accumulo/core/client/Instance*
>>>>>> org/apache/accumulo/core/client/Instance.class
>>>>>>
>>>>>> *[accumulo@node accumulo]$ jar -tf
>>>>>> /opt/accumulo/lib/examples-simple-1.4.2.jar | grep
>>>>>> org/apache/accumulo/core/client/Instance*
>>>>>> *
>>>>>> *
>>>>>> *[accumulo@node accumulo]$ ./bin/tool.sh lib/*[^cs].jar
>>>>>> org.apache.accumulo.examples.simple.mapreduce.bulk.BulkIngestExample
>>>>>> myinstance zookeepers user pswd tableName inputDir tmp/bulkWork*
>>>>>> USERJARS=
>>>>>> CLASSNAME=lib/accumulo-server-1.4.2.jar
>>>>>>
>>>>>> HADOOP_CLASSPATH=/opt/accumulo/lib/libthrift-0.6.1.jar:/opt/accumulo/lib/accumulo-core-1.4.2.jar:/opt/zookeeper/zookeeper-3.3.3.jar:/opt/accumulo/lib/cloudtrace-1.4.2.jar:/opt/accumulo/lib/commons-collections-3.2.jar:/opt/accumulo/lib/commons-configuration-1.5.jar:/opt/accumulo/lib/commons-io-1.4.jar:/opt/accumulo/lib/commons-jci-core-1.0.jar:/opt/accumulo/lib/commons-jci-fam-1.0.jar:/opt/accumulo/lib/commons-lang-2.4.jar:/opt/accumulo/lib/commons-logging-1.0.4.jar:/opt/accumulo/lib/commons-logging-api-1.0.4.jar:
>>>>>> exec /opt/hadoop/bin/hadoop jar lib/accumulo-core-1.4.2.jar
>>>>>> lib/accumulo-server-1.4.2.jar -libjars
>>>>>> "/opt/accumulo/lib/libthrift-0.6.1.jar,/opt/accumulo/lib/accumulo-core-1.4.2.jar,/opt/zookeeper/zookeeper-3.3.3.jar,/opt/accumulo/lib/cloudtrace-1.4.2.jar,/opt/accumulo/lib/commons-collections-3.2.jar,/opt/accumulo/lib/commons-configuration-1.5.jar,/opt/accumulo/lib/commons-io-1.4.jar,/opt/accumulo/lib/commons-jci-core-1.0.jar,/opt/accumulo/lib/commons-jci-fam-1.0.jar,/opt/accumulo/lib/commons-lang-2.4.jar,/opt/accumulo/lib/commons-logging-1.0.4.jar,/opt/accumulo/lib/commons-logging-api-1.0.4.jar"
>>>>>> Exception in thread "main" java.lang.ClassNotFoundException:
>>>>>> lib.accumulo-server-1.4.2.jar
>>>>>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>>>>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>>>>>         at java.security.AccessController.doPrivileged(Native Method)
>>>>>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>>>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>>>>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>>>>>>         at java.lang.Class.forName0(Native Method)
>>>>>>         at java.lang.Class.forName(Class.java:264)
>>>>>>         at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
>>>>>>
>>>>>> *[accumulo@node accumulo]$ ./bin/tool.sh lib/*.jar
>>>>>> org.apache.accumulo.examples.simple.mapreduce.bulk.BulkIngestExample
>>>>>> myinstance zookeepers user pswd tableName inputDir tmp/bulkWork*
>>>>>> USERJARS=
>>>>>> CLASSNAME=lib/accumulo-core-1.4.2-javadoc.jar
>>>>>>
>>>>>> HADOOP_CLASSPATH=/opt/accumulo/lib/libthrift-0.6.1.jar:/opt/accumulo/lib/accumulo-core-1.4.2.jar:/opt/zookeeper/zookeeper-3.3.3.jar:/opt/accumulo/lib/cloudtrace-1.4.2.jar:/opt/accumulo/lib/commons-collections-3.2.jar:/opt/accumulo/lib/commons-configuration-1.5.jar:/opt/accumulo/lib/commons-io-1.4.jar:/opt/accumulo/lib/commons-jci-core-1.0.jar:/opt/accumulo/lib/commons-jci-fam-1.0.jar:/opt/accumulo/lib/commons-lang-2.4.jar:/opt/accumulo/lib/commons-logging-1.0.4.jar:/opt/accumulo/lib/commons-logging-api-1.0.4.jar:
>>>>>> exec /opt/hadoop/bin/hadoop jar lib/accumulo-core-1.4.2.jar
>>>>>> lib/accumulo-core-1.4.2-javadoc.jar -libjars
>>>>>> "/opt/accumulo/lib/libthrift-0.6.1.jar,/opt/accumulo/lib/accumulo-core-1.4.2.jar,/opt/zookeeper/zookeeper-3.3.3.jar,/opt/accumulo/lib/cloudtrace-1.4.2.jar,/opt/accumulo/lib/commons-collections-3.2.jar,/opt/accumulo/lib/commons-configuration-1.5.jar,/opt/accumulo/lib/commons-io-1.4.jar,/opt/accumulo/lib/commons-jci-core-1.0.jar,/opt/accumulo/lib/commons-jci-fam-1.0.jar,/opt/accumulo/lib/commons-lang-2.4.jar,/opt/accumulo/lib/commons-logging-1.0.4.jar,/opt/accumulo/lib/commons-logging-api-1.0.4.jar"
>>>>>> Exception in thread "main" java.lang.ClassNotFoundException:
>>>>>> lib.accumulo-core-1.4.2-javadoc.jar
>>>>>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>>>>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>>>>>         at java.security.AccessController.doPrivileged(Native Method)
>>>>>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>>>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>>>>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>>>>>>         at java.lang.Class.forName0(Native Method)
>>>>>>         at java.lang.Class.forName(Class.java:264)
>>>>>>         at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
>>>>>>
>>>>>> *[accumulo@node accumulo]$ ./bin/tool.sh lib/*[^c].jar
>>>>>> org.apache.accumulo.examples.simple.mapreduce.bulk.BulkIngestExample
>>>>>> myinstance zookeepers user pswd tableName inputDir tmp/bulkWork*
>>>>>>  USERJARS=
>>>>>> CLASSNAME=lib/accumulo-core-1.4.2-sources.jar
>>>>>>
>>>>>> HADOOP_CLASSPATH=/opt/accumulo/lib/libthrift-0.6.1.jar:/opt/accumulo/lib/accumulo-core-1.4.2.jar:/opt/zookeeper/zookeeper-3.3.3.jar:/opt/accumulo/lib/cloudtrace-1.4.2.jar:/opt/accumulo/lib/commons-collections-3.2.jar:/opt/accumulo/lib/commons-configuration-1.5.jar:/opt/accumulo/lib/commons-io-1.4.jar:/opt/accumulo/lib/commons-jci-core-1.0.jar:/opt/accumulo/lib/commons-jci-fam-1.0.jar:/opt/accumulo/lib/commons-lang-2.4.jar:/opt/accumulo/lib/commons-logging-1.0.4.jar:/opt/accumulo/lib/commons-logging-api-1.0.4.jar:
>>>>>> exec /opt/hadoop/bin/hadoop jar lib/accumulo-core-1.4.2.jar
>>>>>> lib/accumulo-core-1.4.2-sources.jar -libjars
>>>>>> "/opt/accumulo/lib/libthrift-0.6.1.jar,/opt/accumulo/lib/accumulo-core-1.4.2.jar,/opt/zookeeper/zookeeper-3.3.3.jar,/opt/accumulo/lib/cloudtrace-1.4.2.jar,/opt/accumulo/lib/commons-collections-3.2.jar,/opt/accumulo/lib/commons-configuration-1.5.jar,/opt/accumulo/lib/commons-io-1.4.jar,/opt/accumulo/lib/commons-jci-core-1.0.jar,/opt/accumulo/lib/commons-jci-fam-1.0.jar,/opt/accumulo/lib/commons-lang-2.4.jar,/opt/accumulo/lib/commons-logging-1.0.4.jar,/opt/accumulo/lib/commons-logging-api-1.0.4.jar"
>>>>>> Exception in thread "main" java.lang.ClassNotFoundException:
>>>>>> lib.accumulo-core-1.4.2-sources.jar
>>>>>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>>>>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>>>>>         at java.security.AccessController.doPrivileged(Native Method)
>>>>>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>>>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>>>>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>>>>>>         at java.lang.Class.forName0(Native Method)
>>>>>>         at java.lang.Class.forName(Class.java:264)
>>>>>>         at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
>>>>>>
>>>>>> *[accumulo@node accumulo]$ ./bin/tool.sh
>>>>>> lib/examples-simple-*[^c].jar
>>>>>> org.apache.accumulo.examples.simple.mapreduce.bulk.BulkIngestExample
>>>>>> default node14.catapult.dev.boozallenet.com:2181 root password
>>>>>> test_aj /user/559599/input tmp/ajbulktest*
>>>>>> USERJARS=
>>>>>> CLASSNAME=lib/examples-simple-1.4.2-sources.jar
>>>>>>
>>>>>> HADOOP_CLASSPATH=/opt/accumulo/lib/libthrift-0.6.1.jar:/opt/accumulo/lib/accumulo-core-1.4.2.jar:/opt/zookeeper/zookeeper-3.3.3.jar:/opt/accumulo/lib/cloudtrace-1.4.2.jar:/opt/accumulo/lib/commons-collections-3.2.jar:/opt/accumulo/lib/commons-configuration-1.5.jar:/opt/accumulo/lib/commons-io-1.4.jar:/opt/accumulo/lib/commons-jci-core-1.0.jar:/opt/accumulo/lib/commons-jci-fam-1.0.jar:/opt/accumulo/lib/commons-lang-2.4.jar:/opt/accumulo/lib/commons-logging-1.0.4.jar:/opt/accumulo/lib/commons-logging-api-1.0.4.jar:
>>>>>> exec /opt/hadoop/bin/hadoop jar lib/examples-simple-1.4.2.jar
>>>>>> lib/examples-simple-1.4.2-sources.jar -libjars
>>>>>> "/opt/accumulo/lib/libthrift-0.6.1.jar,/opt/accumulo/lib/accumulo-core-1.4.2.jar,/opt/zookeeper/zookeeper-3.3.3.jar,/opt/accumulo/lib/cloudtrace-1.4.2.jar,/opt/accumulo/lib/commons-collections-3.2.jar,/opt/accumulo/lib/commons-configuration-1.5.jar,/opt/accumulo/lib/commons-io-1.4.jar,/opt/accumulo/lib/commons-jci-core-1.0.jar,/opt/accumulo/lib/commons-jci-fam-1.0.jar,/opt/accumulo/lib/commons-lang-2.4.jar,/opt/accumulo/lib/commons-logging-1.0.4.jar,/opt/accumulo/lib/commons-logging-api-1.0.4.jar"
>>>>>> Exception in thread "main" java.lang.ClassNotFoundException:
>>>>>> lib.examples-simple-1.4.2-sources.jar
>>>>>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>>>>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>>>>>         at java.security.AccessController.doPrivileged(Native Method)
>>>>>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>>>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>>>>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>>>>>>         at java.lang.Class.forName0(Native Method)
>>>>>>         at java.lang.Class.forName(Class.java:264)
>>>>>>         at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
>>>>>> *[accumulo@node accumulo]$*
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 4, 2013 at 11:55 AM, Billie Rinaldi <billie@apache.org>wrote:
>>>>>>
>>>>>>> On Thu, Apr 4, 2013 at 7:46 AM, Aji Janis <aji1705@gmail.com> wrote:
>>>>>>>
>>>>>>>> *Billie, I checked the values in tool.sh they match. I uncommented
>>>>>>>> the echo statements and reran the cmd here is what I have:*
>>>>>>>> *
>>>>>>>> *
>>>>>>>> *$ ./bin/tool.sh ./lib/examples-simple-1.4.2.jar
>>>>>>>> org.apache.accumulo.examples.simple.mapreduce.bulk.BulkIngestExample
>>>>>>>> instance zookeeper usr pswd table inputdir tmp/bulk*
>>>>>>>>
>>>>>>>> USERJARS=
>>>>>>>>
>>>>>>>> CLASSNAME=org.apache.accumulo.examples.simple.mapreduce.bulk.BulkIngestExample
>>>>>>>>
>>>>>>>> HADOOP_CLASSPATH=/opt/accumulo/lib/libthrift-0.6.1.jar:/opt/accumulo/lib/accumulo-core-1.4.2.jar:/opt/zookeeper/zookeeper-3.3.3.jar:/opt/accumulo/lib/cloudtrace-1.4.2.jar:/opt/accumulo/lib/commons-collections-3.2.jar:/opt/accumulo/lib/commons-configuration-1.5.jar:/opt/accumulo/lib/commons-io-1.4.jar:/opt/accumulo/lib/commons-jci-core-1.0.jar:/opt/accumulo/lib/commons-jci-fam-1.0.jar:/opt/accumulo/lib/commons-lang-2.4.jar:/opt/accumulo/lib/commons-logging-1.0.4.jar:/opt/accumulo/lib/commons-logging-api-1.0.4.jar:
>>>>>>>> exec /opt/hadoop/bin/hadoop jar ./lib/examples-simple-1.4.2.jar
>>>>>>>> org.apache.accumulo.examples.simple.mapreduce.bulk.BulkIngestExample
>>>>>>>> -libjars
>>>>>>>> "/opt/accumulo/lib/libthrift-0.6.1.jar,/opt/accumulo/lib/accumulo-core-1.4.2.jar,/opt/zookeeper/zookeeper-3.3.3.jar,/opt/accumulo/lib/cloudtrace-1.4.2.jar,/opt/accumulo/lib/commons-collections-3.2.jar,/opt/accumulo/lib/commons-configuration-1.5.jar,/opt/accumulo/lib/commons-io-1.4.jar,/opt/accumulo/lib/commons-jci-core-1.0.jar,/opt/accumulo/lib/commons-jci-fam-1.0.jar,/opt/accumulo/lib/commons-lang-2.4.jar,/opt/accumulo/lib/commons-logging-1.0.4.jar,/opt/accumulo/lib/commons-logging-api-1.0.4.jar"
>>>>>>>>  Exception in thread "main" java.lang.NoClassDefFoundError:
>>>>>>>> org/apache/accumulo/core/client/Instance
>>>>>>>>         at java.lang.Class.forName0(Native Method)
>>>>>>>>         at java.lang.Class.forName(Class.java:264)
>>>>>>>>         at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
>>>>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>>>>> org.apache.accumulo.core.client.Instance
>>>>>>>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>>>>>>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>>>>>>>         at java.security.AccessController.doPrivileged(Native
>>>>>>>> Method)
>>>>>>>>         at
>>>>>>>> java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>>>>>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>>>>>>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>>>>>>>>         ... 3 more
>>>>>>>>
>>>>>>>>
>>>>>>> The command looks right.  Instance should be packaged in the
>>>>>>> accumulo core jar.  To verify that, you could run:
>>>>>>> jar tf /opt/accumulo/lib/accumulo-core-1.4.2.jar | grep
>>>>>>> org/apache/accumulo/core/client/Instance
>>>>>>>
>>>>>>> I'm not sure what's going on here.  If that error is happening right
>>>>>>> away, it seems like it can't load the jar on the local machine.  If you're
>>>>>>> running multiple machines, and if the error were happening later during the
>>>>>>> MapReduce, I would suggest that you make sure accumulo is present on all
>>>>>>> the machines.
>>>>>>>
>>>>>>> You asked about the user; is the owner of the jars different than
>>>>>>> the user you're running as?  In that case, it could be a permissions
>>>>>>> issue.  Could the permissions be set so that you can list that directory
>>>>>>> but not read the jar?
>>>>>>>
>>>>>>> Billie
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> *org/apache/accumulo/core/client/Instance is located in the
>>>>>>>> src/... folder which I am not is what is packaged in the examples-
>>>>>>>> simple-[^c].jar ? *
>>>>>>>> *Sorry folks for the constant emails... just trying to get this to
>>>>>>>> work but I really appreciate the help.*
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Apr 4, 2013 at 10:18 AM, John Vines <vines@apache.org>wrote:
>>>>>>>>
>>>>>>>>> If you run tool.sh with sh -x, it will step through the script so
>>>>>>>>> you can see what jars it is picking up and perhaps why it's missing them
>>>>>>>>> for you.
>>>>>>>>>
>>>>>>>>> Sent from my phone, please pardon the typos and brevity.
>>>>>>>>> On Apr 4, 2013 10:15 AM, "Aji Janis" <aji1705@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> What user are you running the commands as ?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Apr 4, 2013 at 9:59 AM, Aji Janis <aji1705@gmail.com>wrote:
>>>>>>>>>>
>>>>>>>>>>> Where did you put all your java files?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Apr 4, 2013 at 9:55 AM, Eric Newton <
>>>>>>>>>>> eric.newton@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I was able to run the example, as written in
>>>>>>>>>>>> docs/examples/README.bulkIngest substituting my
>>>>>>>>>>>> instance/zookeeper/user/password information:
>>>>>>>>>>>>
>>>>>>>>>>>> $ pwd
>>>>>>>>>>>> /home/ecn/workspace/1.4.3
>>>>>>>>>>>> $ ls
>>>>>>>>>>>> bin      conf     docs  LICENSE  NOTICE   README  src     test
>>>>>>>>>>>> CHANGES  contrib  lib   logs     pom.xml  target  walogs
>>>>>>>>>>>>
>>>>>>>>>>>> $ ./bin/accumulo
>>>>>>>>>>>> org.apache.accumulo.examples.simple.mapreduce.bulk.SetupTable test
>>>>>>>>>>>> localhost root secret test_bulk row_00000333 row_00000666
>>>>>>>>>>>>
>>>>>>>>>>>> $ ./bin/accumulo
>>>>>>>>>>>> org.apache.accumulo.examples.simple.mapreduce.bulk.GenerateTestData 0 1000
>>>>>>>>>>>> bulk/test_1.txt
>>>>>>>>>>>>
>>>>>>>>>>>> $ ./bin/tool.sh lib/examples-simple-*[^cs].jar
>>>>>>>>>>>> org.apache.accumulo.examples.simple.mapreduce.bulk.BulkIngestExample test
>>>>>>>>>>>> localhost root secret test_bulk bulk tmp/bulkWork
>>>>>>>>>>>>
>>>>>>>>>>>> $./bin/accumulo
>>>>>>>>>>>> org.apache.accumulo.examples.simple.mapreduce.bulk.VerifyIngest test
>>>>>>>>>>>> localhost root secret test_bulk 0 1000
>>>>>>>>>>>>
>>>>>>>>>>>> -Eric
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Apr 4, 2013 at 9:33 AM, Aji Janis <aji1705@gmail.com>wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I am not sure its just a regular expression issue. Below is my
>>>>>>>>>>>>> console output. Not sure why this ClassDefFoundError occurs. Has anyone
>>>>>>>>>>>>> tried to do it successfully? Can you please tell me your env set up if you
>>>>>>>>>>>>> did.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> [user@mynode bulk]$ pwd
>>>>>>>>>>>>> /home/user/bulk
>>>>>>>>>>>>> [user@mynode bulk]$ ls
>>>>>>>>>>>>> BulkIngestExample.java  GenerateTestData.java  SetupTable.java
>>>>>>>>>>>>>  test_1.txt  VerifyIngest.java
>>>>>>>>>>>>> [user@mynode bulk]$
>>>>>>>>>>>>> *[user@mynode bulk]$ /opt/accumulo/bin/tool.sh
>>>>>>>>>>>>> /opt/accumulo/lib/examples-simple-1.4.2.jar
>>>>>>>>>>>>> org.apache.accumulo.examples.simple.mapreduce.bulk.BulkIngestExample
>>>>>>>>>>>>> myinstance zookeepers user pswd tableName inputDir tmp/bulkWork
>>>>>>>>>>>>> *
>>>>>>>>>>>>> Exception in thread "main" java.lang.NoClassDefFoundError:
>>>>>>>>>>>>> org/apache/accumulo/core/client/Instance
>>>>>>>>>>>>>         at java.lang.Class.forName0(Native Method)
>>>>>>>>>>>>>         at java.lang.Class.forName(Class.java:264)
>>>>>>>>>>>>>         at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
>>>>>>>>>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>>>>>>>>>> org.apache.accumulo.core.client.Instance
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>>>>>>>>>>>>         at java.security.AccessController.doPrivileged(Native
>>>>>>>>>>>>> Method)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>>>>>>>>>>>>>         ... 3 more
>>>>>>>>>>>>> *[user@mynode bulk]$ /opt/accumulo/bin/tool.sh
>>>>>>>>>>>>> /opt/accumulo/lib/examples-simple-*[^cs].jar
>>>>>>>>>>>>> org.apache.accumulo.examples.simple.mapreduce.bulk.BulkIngestExample
>>>>>>>>>>>>> myinstance zookeepers user pswd tableName inputDir tmp/bulkWork
>>>>>>>>>>>>> *
>>>>>>>>>>>>> Exception in thread "main" java.lang.NoClassDefFoundError:
>>>>>>>>>>>>> org/apache/accumulo/core/client/Instance
>>>>>>>>>>>>>         at java.lang.Class.forName0(Native Method)
>>>>>>>>>>>>>         at java.lang.Class.forName(Class.java:264)
>>>>>>>>>>>>>         at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
>>>>>>>>>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>>>>>>>>>> org.apache.accumulo.core.client.Instance
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>>>>>>>>>>>>         at java.security.AccessController.doPrivileged(Native
>>>>>>>>>>>>> Method)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>>>>>>>>>>>>>         at
>>>>>>>>>>>>> java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>>>>>>>>>>>>>         ... 3 more
>>>>>>>>>>>>> *[user@mynode bulk]$ /opt/accumulo/bin/tool.sh
>>>>>>>>>>>>> /opt/accumulo/lib/examples-simple-*[^c].jar
>>>>>>>>>>>>> org.apache.accumulo.examples.simple.mapreduce.bulk.BulkIngestExample
>>>>>>>>>>>>> myinstance zookeepers user pswd tableName inputDir tmp/bulkWork
>>>>>>>>>>>>> *
>>>>>>>>>>>>> Exception in thread "main" java.lang.ClassNotFoundException:
>>>>>>>>>>>>> /opt/accumulo/lib/examples-simple-1/4/2-sources/jar
>>>>>>>>>>>>>         at java.lang.Class.forName0(Native Method)
>>>>>>>>>>>>>         at java.lang.Class.forName(Class.java:264)
>>>>>>>>>>>>>         at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
>>>>>>>>>>>>> [user@mynode bulk]$
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 4:57 PM, Billie Rinaldi <
>>>>>>>>>>>>> billie@apache.org> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 1:16 PM, Christopher <
>>>>>>>>>>>>>> ctubbsii@apache.org> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Try with -libjars:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> tool.sh automatically adds libjars.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The problem is the regular expression for the examples-simple
>>>>>>>>>>>>>> jar.  It's trying to exclude the javadoc jar with ^c, but it isn't
>>>>>>>>>>>>>> excluding the sources jar. /opt/accumulo/lib/examples-simple-*[^cs].jar may
>>>>>>>>>>>>>> work, or you can just specify the jar exactly,
>>>>>>>>>>>>>> /opt/accumulo/lib/examples-simple-1.4.2.jar
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> */opt/accumulo/bin/tool.sh
>>>>>>>>>>>>>> /opt/accumulo/lib/examples-simple-*[^cs].jar
>>>>>>>>>>>>>> org.apache.accumulo.examples.simple.mapreduce.bulk.BulkIngestExample
>>>>>>>>>>>>>> myinstance zookeepers user pswd tableName inputDir tmp/bulkWork
>>>>>>>>>>>>>> *
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Billie
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /opt/accumulo/bin/tool.sh
>>>>>>>>>>>>>>> /opt/accumulo/lib/examples-simple-*[^c].jar
>>>>>>>>>>>>>>> -libjars  /opt/accumulo/lib/examples-simple-*[^c].jar
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> org.apache.accumulo.examples.simple.mapreduce.bulk.BulkIngestExample
>>>>>>>>>>>>>>> myinstance zookeepers user pswd tableName inputDir
>>>>>>>>>>>>>>> tmp/bulkWork
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Christopher L Tubbs II
>>>>>>>>>>>>>>> http://gravatar.com/ctubbsii
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Apr 3, 2013 at 4:11 PM, Aji Janis <aji1705@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> > I am trying to run the BulkIngest example (on 1.4.2
>>>>>>>>>>>>>>> accumulo) and I am not
>>>>>>>>>>>>>>> > able to run the following steps. Here is the error I get:
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > [user@mynode bulk]$ /opt/accumulo/bin/tool.sh
>>>>>>>>>>>>>>> > /opt/accumulo/lib/examples-simple-*[^c].jar
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> org.apache.accumulo.examples.simple.mapreduce.bulk.BulkIngestExample
>>>>>>>>>>>>>>> > myinstance zookeepers user pswd tableName inputDir
>>>>>>>>>>>>>>> tmp/bulkWork
>>>>>>>>>>>>>>> > Exception in thread "main"
>>>>>>>>>>>>>>> java.lang.ClassNotFoundException:
>>>>>>>>>>>>>>> > /opt/accumulo/lib/examples-simple-1/4/2-sources/jar
>>>>>>>>>>>>>>> >         at java.lang.Class.forName0(Native Method)
>>>>>>>>>>>>>>> >         at java.lang.Class.forName(Class.java:264)
>>>>>>>>>>>>>>> >         at
>>>>>>>>>>>>>>> org.apache.hadoop.util.RunJar.main(RunJar.java:149)
>>>>>>>>>>>>>>> > [user@mynode bulk]$
>>>>>>>>>>>>>>> > [user@mynode bulk]$
>>>>>>>>>>>>>>> > [user@mynode bulk]$
>>>>>>>>>>>>>>> > [user@mynode bulk]$ ls /opt/accumulo/lib/
>>>>>>>>>>>>>>> > accumulo-core-1.4.2.jar
>>>>>>>>>>>>>>> > accumulo-start-1.4.2.jar
>>>>>>>>>>>>>>> > commons-collections-3.2.jar
>>>>>>>>>>>>>>> > commons-logging-1.0.4.jar
>>>>>>>>>>>>>>> > jline-0.9.94.jar
>>>>>>>>>>>>>>> > accumulo-core-1.4.2-javadoc.jar
>>>>>>>>>>>>>>> > accumulo-start-1.4.2-javadoc.jar
>>>>>>>>>>>>>>> > commons-configuration-1.5.jar
>>>>>>>>>>>>>>> > commons-logging-api-1.0.4.jar
>>>>>>>>>>>>>>> > libthrift-0.6.1.jar
>>>>>>>>>>>>>>> > accumulo-core-1.4.2-sources.jar
>>>>>>>>>>>>>>> > accumulo-start-1.4.2-sources.jar
>>>>>>>>>>>>>>> > commons-io-1.4.jar
>>>>>>>>>>>>>>> > examples-simple-1.4.2.jar
>>>>>>>>>>>>>>> > log4j-1.2.16.jar
>>>>>>>>>>>>>>> > accumulo-server-1.4.2.jar
>>>>>>>>>>>>>>> > cloudtrace-1.4.2.jar
>>>>>>>>>>>>>>> > commons-jci-core-1.0.jar
>>>>>>>>>>>>>>> > examples-simple-1.4.2-javadoc.jar
>>>>>>>>>>>>>>> > native
>>>>>>>>>>>>>>> > accumulo-server-1.4.2-javadoc.jar
>>>>>>>>>>>>>>> > cloudtrace-1.4.2-javadoc.jar
>>>>>>>>>>>>>>> > commons-jci-fam-1.0.jar
>>>>>>>>>>>>>>> > examples-simple-1.4.2-sources.jar
>>>>>>>>>>>>>>> > wikisearch-ingest-1.4.2-javadoc.jar
>>>>>>>>>>>>>>> > accumulo-server-1.4.2-sources.jar
>>>>>>>>>>>>>>> > cloudtrace-1.4.2-sources.jar
>>>>>>>>>>>>>>> > commons-lang-2.4.jar
>>>>>>>>>>>>>>> >  ext
>>>>>>>>>>>>>>> > wikisearch-query-1.4.2-javadoc.jar
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > [user@mynode bulk]$
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > Clearly, the libraries and source file exist so I am not
>>>>>>>>>>>>>>> sure whats going
>>>>>>>>>>>>>>> > on. I tried putting in
>>>>>>>>>>>>>>> /opt/accumulo/lib/examples-simple-1.4.2-sources.jar
>>>>>>>>>>>>>>> > instead then it complains BulkIngestExample ClassNotFound.
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > Suggestions?
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>> > On Wed, Apr 3, 2013 at 2:36 PM, Eric Newton <
>>>>>>>>>>>>>>> eric.newton@gmail.com> wrote:
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> You will have to write your own InputFormat class which
>>>>>>>>>>>>>>> will parse your
>>>>>>>>>>>>>>> >> file and pass records to your reducer.
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> -Eric
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >> On Wed, Apr 3, 2013 at 2:29 PM, Aji Janis <
>>>>>>>>>>>>>>> aji1705@gmail.com> wrote:
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> >>> Looking at the BulkIngestExample, it uses
>>>>>>>>>>>>>>> GenerateTestData and creates a
>>>>>>>>>>>>>>> >>> .txt file which contians Key: Value pair and correct me
>>>>>>>>>>>>>>> if I am wrong but
>>>>>>>>>>>>>>> >>> each new line is a new row right?
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> >>> I need to know how to have family and qualifiers also.
>>>>>>>>>>>>>>> In other words,
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> >>> 1) Do I set up a .txt file that can be converted into an
>>>>>>>>>>>>>>> Accumulo RF File
>>>>>>>>>>>>>>> >>> using AccumuloFileOutputFormat  which can then be
>>>>>>>>>>>>>>> imported into my table?
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> >>> 2) if yes, what is the format of the .txt file.
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> >>> On Wed, Apr 3, 2013 at 2:19 PM, Eric Newton <
>>>>>>>>>>>>>>> eric.newton@gmail.com>
>>>>>>>>>>>>>>> >>> wrote:
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>> Your data needs to be in the RFile format, and more
>>>>>>>>>>>>>>> importantly it needs
>>>>>>>>>>>>>>> >>>> to be sorted.
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>> It's handy to use a Map/Reduce job to convert/sort your
>>>>>>>>>>>>>>> data.  See the
>>>>>>>>>>>>>>> >>>> BulkIngestExample.
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>> -Eric
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>> On Wed, Apr 3, 2013 at 2:15 PM, Aji Janis <
>>>>>>>>>>>>>>> aji1705@gmail.com> wrote:
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> I have some data in a text file in the following
>>>>>>>>>>>>>>> format.
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> rowid1 columnFamily1 colQualifier1 value
>>>>>>>>>>>>>>> >>>>> rowid1 columnFamily1 colQualifier2 value
>>>>>>>>>>>>>>> >>>>> rowid1 columnFamily2 colQualifier1 value
>>>>>>>>>>>>>>> >>>>> rowid2 columnFamily1 colQualifier1 value
>>>>>>>>>>>>>>> >>>>> rowid3 columnFamily1 colQualifier1 value
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> I want to import this data into a table in accumulo.
>>>>>>>>>>>>>>> My end goal is to
>>>>>>>>>>>>>>> >>>>> understand how to use the BulkImport feature in
>>>>>>>>>>>>>>> accumulo. I tried to login
>>>>>>>>>>>>>>> >>>>> to the accumulo shell as root and then run:
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> #table mytable
>>>>>>>>>>>>>>> >>>>> #importdirectory /home/inputDir /home/failureDir true
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> but it didn't work. My data file was saved as data.txt
>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>> >>>>> /home/inputDir. I tried to create the dir/file
>>>>>>>>>>>>>>> structure in hdfs and linux
>>>>>>>>>>>>>>> >>>>> but neither worked. When trying locally, it keeps
>>>>>>>>>>>>>>> complaining about
>>>>>>>>>>>>>>> >>>>> failureDir not existing.
>>>>>>>>>>>>>>> >>>>> ...
>>>>>>>>>>>>>>> >>>>> java.io.FileNotFoundException: File does not exist:
>>>>>>>>>>>>>>> failures
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> When trying with files on hdfs, I get no error on the
>>>>>>>>>>>>>>> console but the
>>>>>>>>>>>>>>> >>>>> logger had the following messages:
>>>>>>>>>>>>>>> >>>>> ...
>>>>>>>>>>>>>>> >>>>> [tableOps.BulkImport] WARN :
>>>>>>>>>>>>>>> hdfs://node....//inputDir/data.txt does
>>>>>>>>>>>>>>> >>>>> not have a valid extension, ignoring
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> or,
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> [tableOps.BulkImport] WARN :
>>>>>>>>>>>>>>> hdfs://node....//inputDir/data.txt is not
>>>>>>>>>>>>>>> >>>>> a map file, ignoring
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> Suggestions? Am I not setting up the job right? Thank
>>>>>>>>>>>>>>> you for help in
>>>>>>>>>>>>>>> >>>>> advance.
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>> On Wed, Apr 3, 2013 at 2:04 PM, Aji Janis <
>>>>>>>>>>>>>>> aji1705@gmail.com> wrote:
>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>> >>>>>> I have some data in a text file in the following
>>>>>>>>>>>>>>> format:
>>>>>>>>>>>>>>> >>>>>>
>>>>>>>>>>>>>>> >>>>>> rowid1 columnFamily colQualifier value
>>>>>>>>>>>>>>> >>>>>> rowid1 columnFamily colQualifier value
>>>>>>>>>>>>>>> >>>>>> rowid1 columnFamily colQualifier value
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>>
>>>>>>>>>>>>>>> >>>>
>>>>>>>>>>>>>>> >>>
>>>>>>>>>>>>>>> >>
>>>>>>>>>>>>>>> >
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message