accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Atlas <m...@weft.io>
Subject Re: first time setup: Mkdirs failed to create hdfs directory /accumulo/recovery/
Date Wed, 07 Jan 2015 22:42:10 GMT
Yes, it is a single-node "all-in-one" accumulo server at the moment, a toy
server really for me to get familiar with the ecosystem needed to run the
GeoMesa iterators (my main interest).

I tracked it down when I noticed that there was an /accumulo directory in
the regular file system instead of the HDFS store. I suppose an additional
check on loading the value for instance.dfs.dir could check if it is a
relative or absolute URI first?

Anyways, I'm thinking of going into production for real on EMR
<https://aws.amazon.com/articles/Elastic-MapReduce/2065170233315712> using
the bootstrap actions
<https://github.com/weftio/emr-bootstrap-actions/tree/master/accumulo>
available.
I realize there's some performance implications using s3 as the backing
file store, but durability and low-management-overhead is more a priority
for me at the moment.

Thanks again for the responses.

Mike


On Wed, Jan 7, 2015 at 5:19 PM, Christopher <ctubbsii@apache.org> wrote:

> Honestly, I'm surprised that worked. This might cause problems elsewhere,
> because we may assume we can simply concatenate instance.dfs.uri with
> instance.dfs.dir. If you encounter any issues, please let us know.
>
> Also, is this a single-node instance? If not, you should replace
> "localhost" with your actual hostname, else your tservers won't be able to
> communicate with the namenode.
>
>
> --
> Christopher L Tubbs II
> http://gravatar.com/ctubbsii
>
> On Wed, Jan 7, 2015 at 2:49 PM, Mike Atlas <mike@weft.io> wrote:
>
>> I was able to narrow the problem down to my accumulo-site.xml file.
>>
>> I had the following:
>>
>>     <property>
>>       <name>instance.dfs.uri</name>
>>       <value>hdfs://localhost:9000/</value>
>>     </property>
>>     <property>
>>       <name>instance.dfs.dir</name>
>>       <value>/accumulo</value>
>>     </property>
>>
>> I changed the instance.dfs.dir value to be a full URI and my problem
>> with the "Mkdir" failures no longer happen, even on recovery bootup.
>>
>>     <property>
>>       <name>instance.dfs.dir</name>
>>       <value>hdfs://localhost:9000/accumulo</value>
>>     </property>
>>
>> Thanks for the suggestions everyone.
>>
>> -Mike
>>
>>
>> On Tue, Jan 6, 2015 at 9:48 PM, Josh Elser <josh.elser@gmail.com> wrote:
>>
>>> Is HDFS actually healthy? Have you checked the namenode status page
>>> (http://$hostname:50070 by default) to make sure the NN is up and out
>>> of safemode, expected number of DNs have reported in, hdfs reports
>>> available space, etc?
>>>
>>> Any other Hadoop details (version, etc) would be helpful too!
>>>
>>> Mike Atlas wrote:
>>>
>>>> Well, I caught the same error again after terminating my machine with a
>>>> hard stop - which isn't a normal way to do things but I fat-finger saved
>>>> an AMI image of it thinking I could boot up just fine afterward.
>>>>
>>>> The only workaround I could do to resolve it was to blow away the HDFS
>>>> /accumulo directory and re-init my accumulo instance again --- which is
>>>> fine for playing around, but I'm wondering what exactly is going on? I
>>>> don't want that to happen if I went to production and had real data.
>>>>
>>>> Thoughts on how to debug?
>>>>
>>>>
>>>> On Tue, Jan 6, 2015 at 10:40 AM, Keith Turner <keith@deenlo.com
>>>> <mailto:keith@deenlo.com>> wrote:
>>>>
>>>>
>>>>
>>>>     On Mon, Jan 5, 2015 at 6:50 PM, Mike Atlas <mike@weft.io
>>>>     <mailto:mike@weft.io>> wrote:
>>>>
>>>>         Hello,
>>>>
>>>>         I'm running Accumulo 1.5.2, trying to test out the GeoMesa
>>>>         <http://www.geomesa.org/2014/05/28/geomesa-quickstart/> family
>>>>
>>>>         of spatio-temporal iterators using their quickstart
>>>>         demonstration tool. I think I'm not making progress due to my
>>>>         Accumulo setup, though, so can someone validate that all looks
>>>>         good from here?
>>>>
>>>>         start-all.sh output:
>>>>
>>>>         hduser@accumulo:~$ $ACCUMULO_HOME/bin/start-all.sh
>>>>         Starting monitor on localhost
>>>>         Starting tablet servers .... done
>>>>         Starting tablet server on localhost
>>>>         2015-01-05 21:37:18,523 [server.Accumulo] INFO : Attempting to
>>>> talk to zookeeper
>>>>         2015-01-05 21:37:18,772 [server.Accumulo] INFO : Zookeeper
>>>> connected and initialized, attemping to talk to HDFS
>>>>         2015-01-05 21:37:19,028 [server.Accumulo] INFO : Connected to
>>>> HDFS
>>>>         Starting master on localhost
>>>>         Starting garbage collector on localhost
>>>>         Starting tracer on localhost
>>>>
>>>>         hduser@accumulo:~$
>>>>
>>>>
>>>>         I do believe my HDFS is set up correctly:
>>>>
>>>>         hduser@accumulo:/home/ubuntu/geomesa-quickstart$ hadoop fs -ls
>>>> /accumulo
>>>>         Found 5 items
>>>>         drwxrwxrwx   - hduser supergroup          0 2014-12-10 01:04
>>>> /accumulo/instance_id
>>>>         drwxrwxrwx   - hduser supergroup          0 2015-01-05 21:22
>>>> /accumulo/recovery
>>>>         drwxrwxrwx   - hduser supergroup          0 2015-01-05 20:14
>>>> /accumulo/tables
>>>>         drwxrwxrwx   - hduser supergroup          0 2014-12-10 01:04
>>>> /accumulo/version
>>>>         drwxrwxrwx   - hduser supergroup          0 2014-12-10 01:05
>>>> /accumulo/wal
>>>>
>>>>
>>>>         However, when I check the Accumulo monitor logs, I see these
>>>>         errors post-startup:
>>>>
>>>>         java.io.IOException: Mkdirs failed to create directory
>>>> /accumulo/recovery/15664488-bd10-4d8d-9584-f88d8595a07c/part-r-00000
>>>>                 java.io.IOException: Mkdirs failed to create directory
>>>> /accumulo/recovery/15664488-bd10-4d8d-9584-f88d8595a07c/part-r-00000
>>>>                         at org.apache.hadoop.io.MapFile$
>>>> Writer.<init>(MapFile.java:264)
>>>>                         at org.apache.hadoop.io.MapFile$
>>>> Writer.<init>(MapFile.java:103)
>>>>                         at org.apache.accumulo.server.
>>>> tabletserver.log.LogSorter$LogProcessor.writeBuffer(LogSorter.java:196)
>>>>                         at org.apache.accumulo.server.
>>>> tabletserver.log.LogSorter$LogProcessor.sort(LogSorter.java:166)
>>>>                         at org.apache.accumulo.server.
>>>> tabletserver.log.LogSorter$LogProcessor.process(LogSorter.java:89)
>>>>                         at org.apache.accumulo.server.zookeeper.
>>>> DistributedWorkQueue$1.run(DistributedWorkQueue.java:101)
>>>>                         at java.util.concurrent.
>>>> ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>                         at java.util.concurrent.
>>>> ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>                         at org.apache.accumulo.trace.
>>>> instrument.TraceRunnable.run(TraceRunnable.java:47)
>>>>                         at org.apache.accumulo.core.util.
>>>> LoggingRunnable.run(LoggingRunnable.java:34)
>>>>                         at java.lang.Thread.run(Thread.java:745)
>>>>
>>>>
>>>>         I don't really understand - I started accumulo as the hduser,
>>>>         which is the same user that has access to the HDFS directory
>>>>         /accumulo/recovery, and it looks like the directory was created
>>>>         actually, except for the last directory (part-r-0000):
>>>>
>>>>         hduser@accumulo:~$ hadoop fs -ls /accumulo0/recovery/
>>>>         Found 1 items
>>>>         drwxr-xr-x   - hduser supergroup          0 2015-01-05 22:11
>>>> /accumulo/recovery/87fb7aac-0274-4aea-8014-9d53dbbdfbbc
>>>>
>>>>
>>>>         I'm not out of physical disk space:
>>>>
>>>>         hduser@accumulo:~$ df -h
>>>>         Filesystem      Size  Used Avail Use% Mounted on
>>>>         /dev/xvda1     1008G  8.5G  959G   1% /
>>>>
>>>>
>>>>         What could be going on here? Any ideas on something simple I
>>>>         could have missed?
>>>>
>>>>
>>>>     One possibility is that tserver where the exception occurred had bad
>>>>     or missing config for hdfs.  In this case the hadoop code may try to
>>>>     create /accumulo/recovery/.../part-r-00000 in local fs, which would
>>>>     fail.
>>>>
>>>>
>>>>         Thanks,
>>>>         Mike
>>>>
>>>>
>>>>
>>>>
>>
>

Mime
View raw message