accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From pdread <paul.r...@siginttech.com>
Subject bulk ingest without mapred
Date Tue, 08 Apr 2014 13:40:08 GMT
Hi

I interface to an accumulo cloud (100s of nodes) which I don't maintain.
I'll try and keep this short, the interface App is used to ingest millions
of docs/week from various streams, some are required near real time. A
problem came up where the tservers would not stay up and our ingest would
halt. Now the admins are working on fixing this but I'm not optimistic.
Others who have run into this tell me its the use of Mutations that is
causing the problem and it will go away if I do bulk ingest. However
mapreduce is way to slow to spin up and does not map to our arch.

So here is what I have been trying to do. After much research I think I
should be able to bulk ingest if I create the RFile and feed this to
TableOperations.importDirectory(). I can create the RFile ok, at least I
thinks so, I create the "failure" directory using hadoops' file system. I
check that the failure directory is there and is a directory but when I feed
it to the import I get an error over on the accumulo master log that the it
can not find the failure directory. Now the interesting thing is I have
traced the code thourgh the accumulo client it checks successfully for the
load file and the failure directory. What am I doing wrong?

First the client error:

org.apache.accumulo.core.client.AccumuloException: Internal error processing
waitForTableOperation
	at
org.apache.accumulo.core.client.admin.TableOperationsImpl.doTableOperation(TableOperationsImpl.java:290)
	at
org.apache.accumulo.core.client.admin.TableOperationsImpl.doTableOperation(TableOperationsImpl.java:258)
	at
org.apache.accumulo.core.client.admin.TableOperationsImpl.importDirectory(TableOperationsImpl.java:945)
	at
airs.medr.accumulo.server.table.EntityTable.writeEntities(EntityTable.java:130)

Now the master log exception:

2014-04-08 08:33:50,609 [thrift.MasterClientService$Processor] ERROR:
Internal error processing waitForTableOperation
java.lang.RuntimeException: java.io.FileNotFoundException: File does not
exist: bulk/entities_fails/failures
        at
org.apache.accumulo.server.master.Master$MasterClientServiceHandler.waitForTableOperation(Master.java:1053)
        at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.accumulo.cloudtrace.instrument.thrift.TraceWrap$1.invoke(TraceWrap.java:59)
        at $Proxy6.waitForTableOperation(Unknown Source)
        at
org.apache.accumulo.core.master.thrift.MasterClientService$Processor$waitForTableOperation.process(MasterClientService.java:2004)
        at
org.apache.accumulo.core.master.thrift.MasterClientService$Processor.process(MasterClientService.java:1472)
        at
org.apache.accumulo.server.util.TServerUtils$TimedProcessor.process(TServerUtils.java:154)
        at
org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:631)
        at
org.apache.accumulo.server.util.TServerUtils$THsHaServer$Invocation.run(TServerUtils.java:202)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at
org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.FileNotFoundException: File does not exist:
bulk/entities_fails/failures
        at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:528)
        at
org.apache.accumulo.server.trace.TraceFileSystem.getFileStatus(TraceFileSystem.java:797)
        at
org.apache.accumulo.server.master.tableOps.BulkImport.call(BulkImport.java:157)
        at
org.apache.accumulo.server.master.tableOps.BulkImport.call(BulkImport.java:110)
        at
org.apache.accumulo.server.master.tableOps.TraceRepo.call(TraceRepo.java:65)
        at
org.apache.accumulo.server.fate.Fate$TransactionRunner.run(Fate.java:65)

 
Thoughts?

Thanks

Paul
  



--
View this message in context: http://apache-accumulo.1065345.n5.nabble.com/bulk-ingest-without-mapred-tp8904.html
Sent from the Users mailing list archive at Nabble.com.

Mime
View raw message