accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aji Janis <aji1...@gmail.com>
Subject Re: importdirectory in accumulo
Date Thu, 04 Apr 2013 14:14:57 GMT
What user are you running the commands as ?


On Thu, Apr 4, 2013 at 9:59 AM, Aji Janis <aji1705@gmail.com> wrote:

> Where did you put all your java files?
>
>
> On Thu, Apr 4, 2013 at 9:55 AM, Eric Newton <eric.newton@gmail.com> wrote:
>
>> I was able to run the example, as written in
>> docs/examples/README.bulkIngest substituting my
>> instance/zookeeper/user/password information:
>>
>> $ pwd
>> /home/ecn/workspace/1.4.3
>> $ ls
>> bin      conf     docs  LICENSE  NOTICE   README  src     test
>> CHANGES  contrib  lib   logs     pom.xml  target  walogs
>>
>> $ ./bin/accumulo
>> org.apache.accumulo.examples.simple.mapreduce.bulk.SetupTable test
>> localhost root secret test_bulk row_00000333 row_00000666
>>
>> $ ./bin/accumulo
>> org.apache.accumulo.examples.simple.mapreduce.bulk.GenerateTestData 0 1000
>> bulk/test_1.txt
>>
>> $ ./bin/tool.sh lib/examples-simple-*[^cs].jar
>> org.apache.accumulo.examples.simple.mapreduce.bulk.BulkIngestExample test
>> localhost root secret test_bulk bulk tmp/bulkWork
>>
>> $./bin/accumulo
>> org.apache.accumulo.examples.simple.mapreduce.bulk.VerifyIngest test
>> localhost root secret test_bulk 0 1000
>>
>> -Eric
>>
>>
>>
>> On Thu, Apr 4, 2013 at 9:33 AM, Aji Janis <aji1705@gmail.com> wrote:
>>
>>> I am not sure its just a regular expression issue. Below is my console
>>> output. Not sure why this ClassDefFoundError occurs. Has anyone tried to do
>>> it successfully? Can you please tell me your env set up if you did.
>>>
>>>
>>> [user@mynode bulk]$ pwd
>>> /home/user/bulk
>>> [user@mynode bulk]$ ls
>>> BulkIngestExample.java  GenerateTestData.java  SetupTable.java
>>>  test_1.txt  VerifyIngest.java
>>> [user@mynode bulk]$
>>> *[user@mynode bulk]$ /opt/accumulo/bin/tool.sh
>>> /opt/accumulo/lib/examples-simple-1.4.2.jar
>>> org.apache.accumulo.examples.simple.mapreduce.bulk.BulkIngestExample
>>> myinstance zookeepers user pswd tableName inputDir tmp/bulkWork*
>>> Exception in thread "main" java.lang.NoClassDefFoundError:
>>> org/apache/accumulo/core/client/Instance
>>>         at java.lang.Class.forName0(Native Method)
>>>         at java.lang.Class.forName(Class.java:264)
>>>         at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
>>> Caused by: java.lang.ClassNotFoundException:
>>> org.apache.accumulo.core.client.Instance
>>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>>         at java.security.AccessController.doPrivileged(Native Method)
>>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>>>         ... 3 more
>>> *[user@mynode bulk]$ /opt/accumulo/bin/tool.sh
>>> /opt/accumulo/lib/examples-simple-*[^cs].jar
>>> org.apache.accumulo.examples.simple.mapreduce.bulk.BulkIngestExample
>>> myinstance zookeepers user pswd tableName inputDir tmp/bulkWork*
>>> Exception in thread "main" java.lang.NoClassDefFoundError:
>>> org/apache/accumulo/core/client/Instance
>>>         at java.lang.Class.forName0(Native Method)
>>>         at java.lang.Class.forName(Class.java:264)
>>>         at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
>>> Caused by: java.lang.ClassNotFoundException:
>>> org.apache.accumulo.core.client.Instance
>>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>>         at java.security.AccessController.doPrivileged(Native Method)
>>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>>>         ... 3 more
>>> *[user@mynode bulk]$ /opt/accumulo/bin/tool.sh
>>> /opt/accumulo/lib/examples-simple-*[^c].jar
>>> org.apache.accumulo.examples.simple.mapreduce.bulk.BulkIngestExample
>>> myinstance zookeepers user pswd tableName inputDir tmp/bulkWork*
>>> Exception in thread "main" java.lang.ClassNotFoundException:
>>> /opt/accumulo/lib/examples-simple-1/4/2-sources/jar
>>>         at java.lang.Class.forName0(Native Method)
>>>         at java.lang.Class.forName(Class.java:264)
>>>         at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
>>> [user@mynode bulk]$
>>>
>>>
>>>
>>> On Wed, Apr 3, 2013 at 4:57 PM, Billie Rinaldi <billie@apache.org>wrote:
>>>
>>>> On Wed, Apr 3, 2013 at 1:16 PM, Christopher <ctubbsii@apache.org>wrote:
>>>>
>>>>> Try with -libjars:
>>>>>
>>>>
>>>> tool.sh automatically adds libjars.
>>>>
>>>> The problem is the regular expression for the examples-simple jar.
>>>> It's trying to exclude the javadoc jar with ^c, but it isn't excluding the
>>>> sources jar. /opt/accumulo/lib/examples-simple-*[^cs].jar may work, or you
>>>> can just specify the jar exactly,
>>>> /opt/accumulo/lib/examples-simple-1.4.2.jar
>>>>
>>>> */opt/accumulo/bin/tool.sh
>>>> /opt/accumulo/lib/examples-simple-*[^cs].jar
>>>> org.apache.accumulo.examples.simple.mapreduce.bulk.BulkIngestExample
>>>> myinstance zookeepers user pswd tableName inputDir tmp/bulkWork*
>>>>
>>>> Billie
>>>>
>>>>
>>>>
>>>>>
>>>>> /opt/accumulo/bin/tool.sh /opt/accumulo/lib/examples-simple-*[^c].jar
>>>>> -libjars  /opt/accumulo/lib/examples-simple-*[^c].jar
>>>>> org.apache.accumulo.examples.simple.mapreduce.bulk.BulkIngestExample
>>>>> myinstance zookeepers user pswd tableName inputDir tmp/bulkWork
>>>>>
>>>>> --
>>>>> Christopher L Tubbs II
>>>>> http://gravatar.com/ctubbsii
>>>>>
>>>>>
>>>>> On Wed, Apr 3, 2013 at 4:11 PM, Aji Janis <aji1705@gmail.com> wrote:
>>>>> > I am trying to run the BulkIngest example (on 1.4.2 accumulo) and
I
>>>>> am not
>>>>> > able to run the following steps. Here is the error I get:
>>>>> >
>>>>> > [user@mynode bulk]$ /opt/accumulo/bin/tool.sh
>>>>> > /opt/accumulo/lib/examples-simple-*[^c].jar
>>>>> > org.apache.accumulo.examples.simple.mapreduce.bulk.BulkIngestExample
>>>>> > myinstance zookeepers user pswd tableName inputDir tmp/bulkWork
>>>>> > Exception in thread "main" java.lang.ClassNotFoundException:
>>>>> > /opt/accumulo/lib/examples-simple-1/4/2-sources/jar
>>>>> >         at java.lang.Class.forName0(Native Method)
>>>>> >         at java.lang.Class.forName(Class.java:264)
>>>>> >         at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
>>>>> > [user@mynode bulk]$
>>>>> > [user@mynode bulk]$
>>>>> > [user@mynode bulk]$
>>>>> > [user@mynode bulk]$ ls /opt/accumulo/lib/
>>>>> > accumulo-core-1.4.2.jar
>>>>> > accumulo-start-1.4.2.jar
>>>>> > commons-collections-3.2.jar
>>>>> > commons-logging-1.0.4.jar
>>>>> > jline-0.9.94.jar
>>>>> > accumulo-core-1.4.2-javadoc.jar
>>>>> > accumulo-start-1.4.2-javadoc.jar
>>>>> > commons-configuration-1.5.jar
>>>>> > commons-logging-api-1.0.4.jar
>>>>> > libthrift-0.6.1.jar
>>>>> > accumulo-core-1.4.2-sources.jar
>>>>> > accumulo-start-1.4.2-sources.jar
>>>>> > commons-io-1.4.jar
>>>>> > examples-simple-1.4.2.jar
>>>>> > log4j-1.2.16.jar
>>>>> > accumulo-server-1.4.2.jar
>>>>> > cloudtrace-1.4.2.jar
>>>>> > commons-jci-core-1.0.jar
>>>>> > examples-simple-1.4.2-javadoc.jar
>>>>> > native
>>>>> > accumulo-server-1.4.2-javadoc.jar
>>>>> > cloudtrace-1.4.2-javadoc.jar
>>>>> > commons-jci-fam-1.0.jar
>>>>> > examples-simple-1.4.2-sources.jar
>>>>> > wikisearch-ingest-1.4.2-javadoc.jar
>>>>> > accumulo-server-1.4.2-sources.jar
>>>>> > cloudtrace-1.4.2-sources.jar
>>>>> > commons-lang-2.4.jar
>>>>> >  ext
>>>>> > wikisearch-query-1.4.2-javadoc.jar
>>>>> >
>>>>> > [user@mynode bulk]$
>>>>> >
>>>>> >
>>>>> > Clearly, the libraries and source file exist so I am not sure whats
>>>>> going
>>>>> > on. I tried putting in
>>>>> /opt/accumulo/lib/examples-simple-1.4.2-sources.jar
>>>>> > instead then it complains BulkIngestExample ClassNotFound.
>>>>> >
>>>>> > Suggestions?
>>>>> >
>>>>> >
>>>>> > On Wed, Apr 3, 2013 at 2:36 PM, Eric Newton <eric.newton@gmail.com>
>>>>> wrote:
>>>>> >>
>>>>> >> You will have to write your own InputFormat class which will
parse
>>>>> your
>>>>> >> file and pass records to your reducer.
>>>>> >>
>>>>> >> -Eric
>>>>> >>
>>>>> >>
>>>>> >> On Wed, Apr 3, 2013 at 2:29 PM, Aji Janis <aji1705@gmail.com>
>>>>> wrote:
>>>>> >>>
>>>>> >>> Looking at the BulkIngestExample, it uses GenerateTestData
and
>>>>> creates a
>>>>> >>> .txt file which contians Key: Value pair and correct me
if I am
>>>>> wrong but
>>>>> >>> each new line is a new row right?
>>>>> >>>
>>>>> >>> I need to know how to have family and qualifiers also. In
other
>>>>> words,
>>>>> >>>
>>>>> >>> 1) Do I set up a .txt file that can be converted into an
Accumulo
>>>>> RF File
>>>>> >>> using AccumuloFileOutputFormat  which can then be imported
into my
>>>>> table?
>>>>> >>>
>>>>> >>> 2) if yes, what is the format of the .txt file.
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> >>> On Wed, Apr 3, 2013 at 2:19 PM, Eric Newton <eric.newton@gmail.com
>>>>> >
>>>>> >>> wrote:
>>>>> >>>>
>>>>> >>>> Your data needs to be in the RFile format, and more
importantly
>>>>> it needs
>>>>> >>>> to be sorted.
>>>>> >>>>
>>>>> >>>> It's handy to use a Map/Reduce job to convert/sort your
data.
>>>>>  See the
>>>>> >>>> BulkIngestExample.
>>>>> >>>>
>>>>> >>>> -Eric
>>>>> >>>>
>>>>> >>>>
>>>>> >>>> On Wed, Apr 3, 2013 at 2:15 PM, Aji Janis <aji1705@gmail.com>
>>>>> wrote:
>>>>> >>>>>
>>>>> >>>>> I have some data in a text file in the following
format.
>>>>> >>>>>
>>>>> >>>>> rowid1 columnFamily1 colQualifier1 value
>>>>> >>>>> rowid1 columnFamily1 colQualifier2 value
>>>>> >>>>> rowid1 columnFamily2 colQualifier1 value
>>>>> >>>>> rowid2 columnFamily1 colQualifier1 value
>>>>> >>>>> rowid3 columnFamily1 colQualifier1 value
>>>>> >>>>>
>>>>> >>>>> I want to import this data into a table in accumulo.
My end goal
>>>>> is to
>>>>> >>>>> understand how to use the BulkImport feature in
accumulo. I
>>>>> tried to login
>>>>> >>>>> to the accumulo shell as root and then run:
>>>>> >>>>>
>>>>> >>>>> #table mytable
>>>>> >>>>> #importdirectory /home/inputDir /home/failureDir
true
>>>>> >>>>>
>>>>> >>>>> but it didn't work. My data file was saved as data.txt
in
>>>>> >>>>> /home/inputDir. I tried to create the dir/file structure
in hdfs
>>>>> and linux
>>>>> >>>>> but neither worked. When trying locally, it keeps
complaining
>>>>> about
>>>>> >>>>> failureDir not existing.
>>>>> >>>>> ...
>>>>> >>>>> java.io.FileNotFoundException: File does not exist:
failures
>>>>> >>>>>
>>>>> >>>>> When trying with files on hdfs, I get no error on
the console
>>>>> but the
>>>>> >>>>> logger had the following messages:
>>>>> >>>>> ...
>>>>> >>>>> [tableOps.BulkImport] WARN : hdfs://node....//inputDir/data.txt
>>>>> does
>>>>> >>>>> not have a valid extension, ignoring
>>>>> >>>>>
>>>>> >>>>> or,
>>>>> >>>>>
>>>>> >>>>> [tableOps.BulkImport] WARN : hdfs://node....//inputDir/data.txt
>>>>> is not
>>>>> >>>>> a map file, ignoring
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> Suggestions? Am I not setting up the job right?
Thank you for
>>>>> help in
>>>>> >>>>> advance.
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> On Wed, Apr 3, 2013 at 2:04 PM, Aji Janis <aji1705@gmail.com>
>>>>> wrote:
>>>>> >>>>>>
>>>>> >>>>>> I have some data in a text file in the following
format:
>>>>> >>>>>>
>>>>> >>>>>> rowid1 columnFamily colQualifier value
>>>>> >>>>>> rowid1 columnFamily colQualifier value
>>>>> >>>>>> rowid1 columnFamily colQualifier value
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>
>>>>> >>>
>>>>> >>
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message