accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kina Winoto <winoto.kin...@gmail.com>
Subject Re: Adding a tablet to a tserver
Date Tue, 24 Feb 2015 23:21:15 GMT
Thanks Josh. That was helpful; yes a migration to hadoop 2 is in our future!

In the end, I decided to start a new instance like you ended up suggesting
and bulk importing.

Thanks for the help!

On Tue, Feb 24, 2015 at 1:33 PM, Josh Elser <josh.elser@gmail.com> wrote:

> Ok, that helps a bit. A few things
>
> > "Could not create ServerSocket.." error as it can't connect to the
> tserver.
>
> Note that this is a Server socket. This means that the server (master or
> tabletserver) failed to bind the socket it was going to use for the Thrift
> server. This means that Accumulo will not work as the processes can't
> communicate with each other or clients. The error message should make it
> fairly obvious as to why the exception was thrown. Hopefully, the process
> killed itself too.
>
> > Hadoop 1.2.1
>
> Hadoop 1 doesn't have the best track-record when it comes to ensure that a
> file is actually written to disk when we request it to be (a big part of
> the reason we suggest to move to Hadoop 2 when you can). Hard poweroff can
> result in bad Accumulo files in HDFS.
>
> You can try adding dfs.datanode.synconclose=true to your hdfs-site.xml
> which might help protect against this, but I'm not sure of the error
> handling of actually running out of space on the local disk. HDFS' reserved
> space configuration can help remove this worry by preventing writes when
> HDFS is nearing full instead of the actual file system.
>
> > I deleted the wal logs, hoping that it would revert to what was in
> /accumulo/tables
>
> Deleting the WALs also isn't doing what you expect it to :). The WALs,
> especially for the metadata table, are extremely important and are needed
> to ensure that data is not lost (if WALs for the metadata table are lost,
> the table might be in an inconsistent state that Accumulo can't
> automatically recover from).
>
> This is probably why your tables are not coming online.
>
> Recovering your existing instance might not be worth the hassle. It's
> likely easier to just move the RFiles in HDFS out of the way, and then
> reimport them into a reinitialized Accumulo.
>
> An outline of how to do this can be found at http://accumulo.apache.org/1.
> 6/accumulo_user_manual.html#_hdfs_failure under the *Q* "The metadata (or
> root) table has references to a corrupt WAL". If you need some more
> guidance than what is listed there, please feel free to ask!
>
> Kina Winoto wrote:
>
>> Hi Josh,
>>
>>  > Versions of Hadoop and Accumulo:
>> Hadoop 1.2.1
>> Accumulo 1.6.1
>>  > Are the accumulo.metadata/!METADATA and/or accumulo.root tables online?
>> Nope.. I tried to scan the tables -- it just hangs
>>  > Have you checked the logs of the Master and/or TabletServer for any
>> exceptions?
>> The master log is locked for read operation (an info message). I tried
>> to shutdown the master with accumulo admin -f stopMaster, but it's still
>> unhappy.
>> The tserver log doesn't have any exceptions. However, if I run accumulo
>> tserver -a localhost, then I'll get a "Could not create ServerSocket.."
>> error as it can't connect to the tserver.
>>
>> For more context, I ran into all of this because I'm running this on a
>> vm and I ran out of disk space so Accumulo could no longer write to the
>> wal reliably and then checksums weren't matching up. After I created
>> more space on my vm, I deleted the wal logs, hoping that it would revert
>> to what was in /accumulo/tables, but then ran into this error where I
>> have zero tablets.
>>
>> Thanks for any suggestions on what to do next!
>>
>> - Kina
>>
>> On Tue, Feb 24, 2015 at 11:13 AM, Josh Elser <josh.elser@gmail.com
>> <mailto:josh.elser@gmail.com>> wrote:
>>
>>     Hi Kina,
>>
>>     Can you share some more information?
>>
>>     * Versions of Hadoop and Accumulo
>>     * Are the accumulo.metadata/!METADATA and/or accumulo.root tables
>>     online?
>>     * Have you checked the logs of the Master and/or TabletServer for
>>     any exceptions?
>>
>>     - Josh
>>
>>     Kina Winoto wrote:
>>
>>         Hi,
>>
>>         I'm running a local instance of accumulo with just one tablet
>>         server. I
>>         got into a rut and now I don't have any tablets. There is data
>>         still in
>>         hdfs but I assume the data is corrupted so the tablets aren't
>> being
>>         assigned to the tablet server. Is there a way I can force a
>>         tablet to be
>>         assigned? I don't mind giving up a portion of my data (or all of
>>         it) at
>>         this point. I'd just rather not have to reinitialize accumulo and
>>         recreate all the users and set up all my tables again. Maybe I
>>         can force
>>         a tablet assignment and then delete the tables that are corrupted?
>>
>>         I've encountered a similar issue on a many-node cluster and
>>         would like
>>         to know if my only option is to reinitialize accumulo.
>>
>>         Thanks!
>>
>>         - Kina
>>
>>         —
>>         Sent from Mailbox <https://www.dropbox.com/__mailbox
>>         <https://www.dropbox.com/mailbox>>
>>
>>
>>

Mime
View raw message