Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8EA97FE0A for ; Wed, 3 Apr 2013 20:16:31 +0000 (UTC) Received: (qmail 39878 invoked by uid 500); 3 Apr 2013 20:16:31 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 39824 invoked by uid 500); 3 Apr 2013 20:16:31 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 39815 invoked by uid 99); 3 Apr 2013 20:16:31 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Apr 2013 20:16:31 +0000 Received: from localhost (HELO mail-lb0-f169.google.com) (127.0.0.1) (smtp-auth username ctubbsii, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Apr 2013 20:16:31 +0000 Received: by mail-lb0-f169.google.com with SMTP id p11so2016496lbi.28 for ; Wed, 03 Apr 2013 13:16:29 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.112.132.105 with SMTP id ot9mr1870259lbb.24.1365020189237; Wed, 03 Apr 2013 13:16:29 -0700 (PDT) Received: by 10.114.93.5 with HTTP; Wed, 3 Apr 2013 13:16:29 -0700 (PDT) In-Reply-To: References: Date: Wed, 3 Apr 2013 16:16:29 -0400 Message-ID: Subject: Re: importdirectory in accumulo From: Christopher To: user@accumulo.apache.org Content-Type: text/plain; charset=ISO-8859-1 Try with -libjars: /opt/accumulo/bin/tool.sh /opt/accumulo/lib/examples-simple-*[^c].jar -libjars /opt/accumulo/lib/examples-simple-*[^c].jar org.apache.accumulo.examples.simple.mapreduce.bulk.BulkIngestExample myinstance zookeepers user pswd tableName inputDir tmp/bulkWork -- Christopher L Tubbs II http://gravatar.com/ctubbsii On Wed, Apr 3, 2013 at 4:11 PM, Aji Janis wrote: > I am trying to run the BulkIngest example (on 1.4.2 accumulo) and I am not > able to run the following steps. Here is the error I get: > > [user@mynode bulk]$ /opt/accumulo/bin/tool.sh > /opt/accumulo/lib/examples-simple-*[^c].jar > org.apache.accumulo.examples.simple.mapreduce.bulk.BulkIngestExample > myinstance zookeepers user pswd tableName inputDir tmp/bulkWork > Exception in thread "main" java.lang.ClassNotFoundException: > /opt/accumulo/lib/examples-simple-1/4/2-sources/jar > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:264) > at org.apache.hadoop.util.RunJar.main(RunJar.java:149) > [user@mynode bulk]$ > [user@mynode bulk]$ > [user@mynode bulk]$ > [user@mynode bulk]$ ls /opt/accumulo/lib/ > accumulo-core-1.4.2.jar > accumulo-start-1.4.2.jar > commons-collections-3.2.jar > commons-logging-1.0.4.jar > jline-0.9.94.jar > accumulo-core-1.4.2-javadoc.jar > accumulo-start-1.4.2-javadoc.jar > commons-configuration-1.5.jar > commons-logging-api-1.0.4.jar > libthrift-0.6.1.jar > accumulo-core-1.4.2-sources.jar > accumulo-start-1.4.2-sources.jar > commons-io-1.4.jar > examples-simple-1.4.2.jar > log4j-1.2.16.jar > accumulo-server-1.4.2.jar > cloudtrace-1.4.2.jar > commons-jci-core-1.0.jar > examples-simple-1.4.2-javadoc.jar > native > accumulo-server-1.4.2-javadoc.jar > cloudtrace-1.4.2-javadoc.jar > commons-jci-fam-1.0.jar > examples-simple-1.4.2-sources.jar > wikisearch-ingest-1.4.2-javadoc.jar > accumulo-server-1.4.2-sources.jar > cloudtrace-1.4.2-sources.jar > commons-lang-2.4.jar > ext > wikisearch-query-1.4.2-javadoc.jar > > [user@mynode bulk]$ > > > Clearly, the libraries and source file exist so I am not sure whats going > on. I tried putting in /opt/accumulo/lib/examples-simple-1.4.2-sources.jar > instead then it complains BulkIngestExample ClassNotFound. > > Suggestions? > > > On Wed, Apr 3, 2013 at 2:36 PM, Eric Newton wrote: >> >> You will have to write your own InputFormat class which will parse your >> file and pass records to your reducer. >> >> -Eric >> >> >> On Wed, Apr 3, 2013 at 2:29 PM, Aji Janis wrote: >>> >>> Looking at the BulkIngestExample, it uses GenerateTestData and creates a >>> .txt file which contians Key: Value pair and correct me if I am wrong but >>> each new line is a new row right? >>> >>> I need to know how to have family and qualifiers also. In other words, >>> >>> 1) Do I set up a .txt file that can be converted into an Accumulo RF File >>> using AccumuloFileOutputFormat which can then be imported into my table? >>> >>> 2) if yes, what is the format of the .txt file. >>> >>> >>> >>> >>> On Wed, Apr 3, 2013 at 2:19 PM, Eric Newton >>> wrote: >>>> >>>> Your data needs to be in the RFile format, and more importantly it needs >>>> to be sorted. >>>> >>>> It's handy to use a Map/Reduce job to convert/sort your data. See the >>>> BulkIngestExample. >>>> >>>> -Eric >>>> >>>> >>>> On Wed, Apr 3, 2013 at 2:15 PM, Aji Janis wrote: >>>>> >>>>> I have some data in a text file in the following format. >>>>> >>>>> rowid1 columnFamily1 colQualifier1 value >>>>> rowid1 columnFamily1 colQualifier2 value >>>>> rowid1 columnFamily2 colQualifier1 value >>>>> rowid2 columnFamily1 colQualifier1 value >>>>> rowid3 columnFamily1 colQualifier1 value >>>>> >>>>> I want to import this data into a table in accumulo. My end goal is to >>>>> understand how to use the BulkImport feature in accumulo. I tried to login >>>>> to the accumulo shell as root and then run: >>>>> >>>>> #table mytable >>>>> #importdirectory /home/inputDir /home/failureDir true >>>>> >>>>> but it didn't work. My data file was saved as data.txt in >>>>> /home/inputDir. I tried to create the dir/file structure in hdfs and linux >>>>> but neither worked. When trying locally, it keeps complaining about >>>>> failureDir not existing. >>>>> ... >>>>> java.io.FileNotFoundException: File does not exist: failures >>>>> >>>>> When trying with files on hdfs, I get no error on the console but the >>>>> logger had the following messages: >>>>> ... >>>>> [tableOps.BulkImport] WARN : hdfs://node....//inputDir/data.txt does >>>>> not have a valid extension, ignoring >>>>> >>>>> or, >>>>> >>>>> [tableOps.BulkImport] WARN : hdfs://node....//inputDir/data.txt is not >>>>> a map file, ignoring >>>>> >>>>> >>>>> Suggestions? Am I not setting up the job right? Thank you for help in >>>>> advance. >>>>> >>>>> >>>>> On Wed, Apr 3, 2013 at 2:04 PM, Aji Janis wrote: >>>>>> >>>>>> I have some data in a text file in the following format: >>>>>> >>>>>> rowid1 columnFamily colQualifier value >>>>>> rowid1 columnFamily colQualifier value >>>>>> rowid1 columnFamily colQualifier value >>>>> >>>>> >>>> >>> >> >