Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@accumulo.apache.org
Received-SPF: pass (athena.apache.org: domain of winoto.kina.s@gmail.com
 designates 209.85.216.177 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <54ECEE2D.9040306@gmail.com>
References: <1424801301188.9a04ce1e@Nodemailer>
	<54ECCD73.10000@gmail.com>
	<CAL-1RxeLfNan7hRyTbCVCbsg43Os+6k89VinkuMBKJnEQGTKbg@mail.gmail.com>
	<54ECEE2D.9040306@gmail.com>
Date: Tue, 24 Feb 2015 15:21:15 -0800
Message-ID: 
 <CAL-1RxcZfu=YM6uuZ28vkg0Gxj+-VtW=qFFXMXrFOS+2GHpAbA@mail.gmail.com>
Subject: Re: Adding a tablet to a tserver
From: Kina Winoto <winoto.kina.s@gmail.com>
To: user@accumulo.apache.org
Content-Type: multipart/alternative; boundary=001a1139624475d4e6050fddc618

--001a1139624475d4e6050fddc618
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Thanks Josh. That was helpful; yes a migration to hadoop 2 is in our future=
!

In the end, I decided to start a new instance like you ended up suggesting
and bulk importing.

Thanks for the help!

On Tue, Feb 24, 2015 at 1:33 PM, Josh Elser <josh.elser@gmail.com> wrote:

> Ok, that helps a bit. A few things
>
> > "Could not create ServerSocket.." error as it can't connect to the
> tserver.
>
> Note that this is a Server socket. This means that the server (master or
> tabletserver) failed to bind the socket it was going to use for the Thrif=
t
> server. This means that Accumulo will not work as the processes can't
> communicate with each other or clients. The error message should make it
> fairly obvious as to why the exception was thrown. Hopefully, the process
> killed itself too.
>
> > Hadoop 1.2.1
>
> Hadoop 1 doesn't have the best track-record when it comes to ensure that =
a
> file is actually written to disk when we request it to be (a big part of
> the reason we suggest to move to Hadoop 2 when you can). Hard poweroff ca=
n
> result in bad Accumulo files in HDFS.
>
> You can try adding dfs.datanode.synconclose=3Dtrue to your hdfs-site.xml
> which might help protect against this, but I'm not sure of the error
> handling of actually running out of space on the local disk. HDFS' reserv=
ed
> space configuration can help remove this worry by preventing writes when
> HDFS is nearing full instead of the actual file system.
>
> > I deleted the wal logs, hoping that it would revert to what was in
> /accumulo/tables
>
> Deleting the WALs also isn't doing what you expect it to :). The WALs,
> especially for the metadata table, are extremely important and are needed
> to ensure that data is not lost (if WALs for the metadata table are lost,
> the table might be in an inconsistent state that Accumulo can't
> automatically recover from).
>
> This is probably why your tables are not coming online.
>
> Recovering your existing instance might not be worth the hassle. It's
> likely easier to just move the RFiles in HDFS out of the way, and then
> reimport them into a reinitialized Accumulo.
>
> An outline of how to do this can be found at http://accumulo.apache.org/1=
.
> 6/accumulo_user_manual.html#_hdfs_failure under the *Q* "The metadata (or
> root) table has references to a corrupt WAL". If you need some more
> guidance than what is listed there, please feel free to ask!
>
> Kina Winoto wrote:
>
>> Hi Josh,
>>
>>  > Versions of Hadoop and Accumulo:
>> Hadoop 1.2.1
>> Accumulo 1.6.1
>>  > Are the accumulo.metadata/!METADATA and/or accumulo.root tables onlin=
e?
>> Nope.. I tried to scan the tables -- it just hangs
>>  > Have you checked the logs of the Master and/or TabletServer for any
>> exceptions?
>> The master log is locked for read operation (an info message). I tried
>> to shutdown the master with accumulo admin -f stopMaster, but it's still
>> unhappy.
>> The tserver log doesn't have any exceptions. However, if I run accumulo
>> tserver -a localhost, then I'll get a "Could not create ServerSocket.."
>> error as it can't connect to the tserver.
>>
>> For more context, I ran into all of this because I'm running this on a
>> vm and I ran out of disk space so Accumulo could no longer write to the
>> wal reliably and then checksums weren't matching up. After I created
>> more space on my vm, I deleted the wal logs, hoping that it would revert
>> to what was in /accumulo/tables, but then ran into this error where I
>> have zero tablets.
>>
>> Thanks for any suggestions on what to do next!
>>
>> - Kina
>>
>> On Tue, Feb 24, 2015 at 11:13 AM, Josh Elser <josh.elser@gmail.com
>> <mailto:josh.elser@gmail.com>> wrote:
>>
>>     Hi Kina,
>>
>>     Can you share some more information?
>>
>>     * Versions of Hadoop and Accumulo
>>     * Are the accumulo.metadata/!METADATA and/or accumulo.root tables
>>     online?
>>     * Have you checked the logs of the Master and/or TabletServer for
>>     any exceptions?
>>
>>     - Josh
>>
>>     Kina Winoto wrote:
>>
>>         Hi,
>>
>>         I'm running a local instance of accumulo with just one tablet
>>         server. I
>>         got into a rut and now I don't have any tablets. There is data
>>         still in
>>         hdfs but I assume the data is corrupted so the tablets aren't
>> being
>>         assigned to the tablet server. Is there a way I can force a
>>         tablet to be
>>         assigned? I don't mind giving up a portion of my data (or all of
>>         it) at
>>         this point. I'd just rather not have to reinitialize accumulo an=
d
>>         recreate all the users and set up all my tables again. Maybe I
>>         can force
>>         a tablet assignment and then delete the tables that are corrupte=
d?
>>
>>         I've encountered a similar issue on a many-node cluster and
>>         would like
>>         to know if my only option is to reinitialize accumulo.
>>
>>         Thanks!
>>
>>         - Kina
>>
>>         =E2=80=94
>>         Sent from Mailbox <https://www.dropbox.com/__mailbox
>>         <https://www.dropbox.com/mailbox>>
>>
>>
>>

--001a1139624475d4e6050fddc618
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><div>Thanks Josh. That was helpful; yes a migration t=
o hadoop 2 is in our future!<br><br></div>In the end, I decided to start a =
new instance like you ended up suggesting and bulk importing. <br><br></div=
>Thanks for the help!<br> </div><div class=3D"gmail_extra"><br><div class=
=3D"gmail_quote">On Tue, Feb 24, 2015 at 1:33 PM, Josh Elser <span dir=3D"l=
tr">&lt;<a href=3D"mailto:josh.elser@gmail.com" target=3D"_blank">josh.else=
r@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" styl=
e=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Ok, tha=
t helps a bit. A few things<span class=3D""><br>
<br>
&gt; &quot;Could not create ServerSocket..&quot; error as it can&#39;t conn=
ect to the tserver.<br>
<br></span>
Note that this is a Server socket. This means that the server (master or ta=
bletserver) failed to bind the socket it was going to use for the Thrift se=
rver. This means that Accumulo will not work as the processes can&#39;t com=
municate with each other or clients. The error message should make it fairl=
y obvious as to why the exception was thrown. Hopefully, the process killed=
 itself too.<br>
<br>
&gt; Hadoop 1.2.1<br>
<br>
Hadoop 1 doesn&#39;t have the best track-record when it comes to ensure tha=
t a file is actually written to disk when we request it to be (a big part o=
f the reason we suggest to move to Hadoop 2 when you can). Hard poweroff ca=
n result in bad Accumulo files in HDFS.<br>
<br>
You can try adding dfs.datanode.synconclose=3Dtrue to your hdfs-site.xml wh=
ich might help protect against this, but I&#39;m not sure of the error hand=
ling of actually running out of space on the local disk. HDFS&#39; reserved=
 space configuration can help remove this worry by preventing writes when H=
DFS is nearing full instead of the actual file system.<span class=3D""><br>
<br>
&gt; I deleted the wal logs, hoping that it would revert to what was in /ac=
cumulo/tables<br>
<br></span>
Deleting the WALs also isn&#39;t doing what you expect it to :). The WALs, =
especially for the metadata table, are extremely important and are needed t=
o ensure that data is not lost (if WALs for the metadata table are lost, th=
e table might be in an inconsistent state that Accumulo can&#39;t automatic=
ally recover from).<br>
<br>
This is probably why your tables are not coming online.<br>
<br>
Recovering your existing instance might not be worth the hassle. It&#39;s l=
ikely easier to just move the RFiles in HDFS out of the way, and then reimp=
ort them into a reinitialized Accumulo.<br>
<br>
An outline of how to do this can be found at <a href=3D"http://accumulo.apa=
che.org/1.6/accumulo_user_manual.html#_hdfs_failure" target=3D"_blank">http=
://accumulo.apache.org/1.<u></u>6/accumulo_user_manual.html#_<u></u>hdfs_fa=
ilure</a> under the *Q* &quot;The metadata (or root) table has references t=
o a corrupt WAL&quot;. If you need some more guidance than what is listed t=
here, please feel free to ask!<br>
<br>
Kina Winoto wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><span class=3D"">
Hi Josh,<br>
<br>
=C2=A0&gt; Versions of Hadoop and Accumulo:<br>
Hadoop 1.2.1<br>
Accumulo 1.6.1<br>
=C2=A0&gt; Are the accumulo.metadata/!METADATA and/or accumulo.root tables =
online?<br>
Nope.. I tried to scan the tables -- it just hangs<br>
=C2=A0&gt; Have you checked the logs of the Master and/or TabletServer for =
any<br>
exceptions?<br>
The master log is locked for read operation (an info message). I tried<br>
to shutdown the master with accumulo admin -f stopMaster, but it&#39;s stil=
l<br>
unhappy.<br>
The tserver log doesn&#39;t have any exceptions. However, if I run accumulo=
<br>
tserver -a localhost, then I&#39;ll get a &quot;Could not create ServerSock=
et..&quot;<br>
error as it can&#39;t connect to the tserver.<br>
<br>
For more context, I ran into all of this because I&#39;m running this on a<=
br>
vm and I ran out of disk space so Accumulo could no longer write to the<br>
wal reliably and then checksums weren&#39;t matching up. After I created<br=
>
more space on my vm, I deleted the wal logs, hoping that it would revert<br=
>
to what was in /accumulo/tables, but then ran into this error where I<br>
have zero tablets.<br>
<br>
Thanks for any suggestions on what to do next!<br>
<br>
- Kina<br>
<br>
On Tue, Feb 24, 2015 at 11:13 AM, Josh Elser &lt;<a href=3D"mailto:josh.els=
er@gmail.com" target=3D"_blank">josh.elser@gmail.com</a><br></span><div><di=
v class=3D"h5">
&lt;mailto:<a href=3D"mailto:josh.elser@gmail.com" target=3D"_blank">josh.e=
lser@gmail.com</a>&gt;&gt; wrote:<br>
<br>
=C2=A0 =C2=A0 Hi Kina,<br>
<br>
=C2=A0 =C2=A0 Can you share some more information?<br>
<br>
=C2=A0 =C2=A0 * Versions of Hadoop and Accumulo<br>
=C2=A0 =C2=A0 * Are the accumulo.metadata/!METADATA and/or accumulo.root ta=
bles<br>
=C2=A0 =C2=A0 online?<br>
=C2=A0 =C2=A0 * Have you checked the logs of the Master and/or TabletServer=
 for<br>
=C2=A0 =C2=A0 any exceptions?<br>
<br>
=C2=A0 =C2=A0 - Josh<br>
<br>
=C2=A0 =C2=A0 Kina Winoto wrote:<br>
<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Hi,<br>
<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 I&#39;m running a local instance of accumulo wi=
th just one tablet<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 server. I<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 got into a rut and now I don&#39;t have any tab=
lets. There is data<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 still in<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 hdfs but I assume the data is corrupted so the =
tablets aren&#39;t being<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 assigned to the tablet server. Is there a way I=
 can force a<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 tablet to be<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 assigned? I don&#39;t mind giving up a portion =
of my data (or all of<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 it) at<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 this point. I&#39;d just rather not have to rei=
nitialize accumulo and<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 recreate all the users and set up all my tables=
 again. Maybe I<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 can force<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 a tablet assignment and then delete the tables =
that are corrupted?<br>
<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 I&#39;ve encountered a similar issue on a many-=
node cluster and<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 would like<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 to know if my only option is to reinitialize ac=
cumulo.<br>
<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Thanks!<br>
<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 - Kina<br>
<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =E2=80=94<br></div></div>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Sent from Mailbox &lt;<a href=3D"https://www.dr=
opbox.com/__mailbox" target=3D"_blank">https://www.dropbox.com/__<u></u>mai=
lbox</a><br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 &lt;<a href=3D"https://www.dropbox.com/mailbox"=
 target=3D"_blank">https://www.dropbox.com/<u></u>mailbox</a>&gt;&gt;<br>
<br>
<br>
</blockquote>
</blockquote></div><br></div>

--001a1139624475d4e6050fddc618--