accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Adding a tablet to a tserver
Date Tue, 24 Feb 2015 21:33:33 GMT
Ok, that helps a bit. A few things

 > "Could not create ServerSocket.." error as it can't connect to the 
tserver.

Note that this is a Server socket. This means that the server (master or 
tabletserver) failed to bind the socket it was going to use for the 
Thrift server. This means that Accumulo will not work as the processes 
can't communicate with each other or clients. The error message should 
make it fairly obvious as to why the exception was thrown. Hopefully, 
the process killed itself too.

 > Hadoop 1.2.1

Hadoop 1 doesn't have the best track-record when it comes to ensure that 
a file is actually written to disk when we request it to be (a big part 
of the reason we suggest to move to Hadoop 2 when you can). Hard 
poweroff can result in bad Accumulo files in HDFS.

You can try adding dfs.datanode.synconclose=true to your hdfs-site.xml 
which might help protect against this, but I'm not sure of the error 
handling of actually running out of space on the local disk. HDFS' 
reserved space configuration can help remove this worry by preventing 
writes when HDFS is nearing full instead of the actual file system.

 > I deleted the wal logs, hoping that it would revert to what was in 
/accumulo/tables

Deleting the WALs also isn't doing what you expect it to :). The WALs, 
especially for the metadata table, are extremely important and are 
needed to ensure that data is not lost (if WALs for the metadata table 
are lost, the table might be in an inconsistent state that Accumulo 
can't automatically recover from).

This is probably why your tables are not coming online.

Recovering your existing instance might not be worth the hassle. It's 
likely easier to just move the RFiles in HDFS out of the way, and then 
reimport them into a reinitialized Accumulo.

An outline of how to do this can be found at 
http://accumulo.apache.org/1.6/accumulo_user_manual.html#_hdfs_failure 
under the *Q* "The metadata (or root) table has references to a corrupt 
WAL". If you need some more guidance than what is listed there, please 
feel free to ask!

Kina Winoto wrote:
> Hi Josh,
>
>  > Versions of Hadoop and Accumulo:
> Hadoop 1.2.1
> Accumulo 1.6.1
>  > Are the accumulo.metadata/!METADATA and/or accumulo.root tables online?
> Nope.. I tried to scan the tables -- it just hangs
>  > Have you checked the logs of the Master and/or TabletServer for any
> exceptions?
> The master log is locked for read operation (an info message). I tried
> to shutdown the master with accumulo admin -f stopMaster, but it's still
> unhappy.
> The tserver log doesn't have any exceptions. However, if I run accumulo
> tserver -a localhost, then I'll get a "Could not create ServerSocket.."
> error as it can't connect to the tserver.
>
> For more context, I ran into all of this because I'm running this on a
> vm and I ran out of disk space so Accumulo could no longer write to the
> wal reliably and then checksums weren't matching up. After I created
> more space on my vm, I deleted the wal logs, hoping that it would revert
> to what was in /accumulo/tables, but then ran into this error where I
> have zero tablets.
>
> Thanks for any suggestions on what to do next!
>
> - Kina
>
> On Tue, Feb 24, 2015 at 11:13 AM, Josh Elser <josh.elser@gmail.com
> <mailto:josh.elser@gmail.com>> wrote:
>
>     Hi Kina,
>
>     Can you share some more information?
>
>     * Versions of Hadoop and Accumulo
>     * Are the accumulo.metadata/!METADATA and/or accumulo.root tables
>     online?
>     * Have you checked the logs of the Master and/or TabletServer for
>     any exceptions?
>
>     - Josh
>
>     Kina Winoto wrote:
>
>         Hi,
>
>         I'm running a local instance of accumulo with just one tablet
>         server. I
>         got into a rut and now I don't have any tablets. There is data
>         still in
>         hdfs but I assume the data is corrupted so the tablets aren't being
>         assigned to the tablet server. Is there a way I can force a
>         tablet to be
>         assigned? I don't mind giving up a portion of my data (or all of
>         it) at
>         this point. I'd just rather not have to reinitialize accumulo and
>         recreate all the users and set up all my tables again. Maybe I
>         can force
>         a tablet assignment and then delete the tables that are corrupted?
>
>         I've encountered a similar issue on a many-node cluster and
>         would like
>         to know if my only option is to reinitialize accumulo.
>
>         Thanks!
>
>         - Kina
>
>         —
>         Sent from Mailbox <https://www.dropbox.com/__mailbox
>         <https://www.dropbox.com/mailbox>>
>
>

Mime
View raw message