Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@accumulo.apache.org
Received-SPF: pass (nike.apache.org: domain of eric.newton@gmail.com
 designates 209.85.212.173 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CANrNQ6-fcJ7Wt4o2afgUnwahX+k=MbhFm6b-Jq8gYynxvGLT+A@mail.gmail.com>
References: 
 <CANrNQ69q_h7A9FtbS0kBfTpKQW1keK0D1dPT+S1Bii3ncz3K2w@mail.gmail.com>
	<CADxc9BmPWjLeTAEQjbejYq8P_K5f-nTNfzFsT0ERyPVoSXfxYA@mail.gmail.com>
	<CANrNQ69ik=hDunHy6WfwiUP28pjxDoawD-fuEeWXWZrUmtw6pw@mail.gmail.com>
	<CADxc9Bk8Y__2OGySwBNMx2tUbWXthNg78y9n1PuFn-H=z7dSZQ@mail.gmail.com>
	<CANrNQ6-zWVAPZFNPu3akH+WYCHwfCCJmYmZoWgo96sP-C3bz4w@mail.gmail.com>
	<CADxc9B=CYftthxwy7DPp_Opvt7Rt2wqqoEcHaLRd-iZe-GnTyQ@mail.gmail.com>
	<CANrNQ68O0RE1oBO4kndgWwCSHejhCavT0w4x6k7a8UFfMyuBsQ@mail.gmail.com>
	<CADxc9BkF17zWD0pDwOR6NNxH_JEsJGLn_ZycRomaEy8OPoQvww@mail.gmail.com>
	<CANrNQ69PrDYfnbFwUHWi6Dm8XEd8Wi25WcWr3E1wisd7ohqfEA@mail.gmail.com>
	<51531536.90808@gmail.com>
	<CANrNQ69LVAzX79xn8KvXToLni2zV=PcWsXRynRboMp4LtHAOkg@mail.gmail.com>
	<CADxc9BkTaVXE12gjqOv4+AqU3zh=avi+pB7bfUeNs9uHCptAVw@mail.gmail.com>
	<CANrNQ6-fcJ7Wt4o2afgUnwahX+k=MbhFm6b-Jq8gYynxvGLT+A@mail.gmail.com>
Date: Wed, 27 Mar 2013 16:19:43 -0400
Message-ID: 
 <CADxc9BnGm7CeacrLnG7FS2AhJzi3RYUMKXBvSajG5qibO9z3dw@mail.gmail.com>
Subject: Re: Waiting for accumulo to be initialized
From: Eric Newton <eric.newton@gmail.com>
To: "user@accumulo.apache.org" <user@accumulo.apache.org>
Content-Type: multipart/alternative; boundary=047d7b5d88ed2cbbdf04d8edc21e

--047d7b5d88ed2cbbdf04d8edc21e
Content-Type: text/plain; charset=ISO-8859-1

I should write this up in the user manual.  It's not that hard, but it's
really not the first thing you want to tackle while learning how to use
accumulo.  I just opened
ACCUMULO-1217<https://issues.apache.org/jira/browse/ACCUMULO-1217> to
do that.

I wrote this from memory: expect errors.  Needless to say, you would only
want to do this when you are more comfortable with hadoop, zookeeper and
accumulo.

First, get zookeeper up and running, even if you have delete all its data.

Next, attempt to determine the mapping of table names to tableIds.  You can
do this in the shell when your accumulo instance is healthy.  If it isn't
healthy, you will have to guess based on the data in the files in HDFS.

So, for example, the table "trace" is probably table id "1".  You can find
the files for trace in /accumulo/tables/1.

Don't worry if you get the names wrong.  You can always rename the tables
later.

Move the old files for accumulo out of the way and re-initialize:

$ hadoop fs -mv /accumulo /accumulo-old
$ ./bin/accumulo init
$ ./bin/start-all.sh

Recreate your tables:

$ ./bin/accumulo shell -u root -p mysecret
shell > createtable table1

Learn the new table id mapping:
shell > tables -l
!METADATA => !0
trace => 1
table1 => 2
...

Bulk import all your data back into the new table ids:
Assuming you have determined that "table1" used to be table id "a" and is
now "2",
you do something like this:

$ hadoop fs -mkdir /tmp/failed
$ ./bin/accumulo shell -u root -p mysecret
shell > table table1
shell table1 > importdirectory /accumulo-old/tables/a/default_tablet
/tmp/failed true

There are lots of directories under every table id directory.  You will
need to import each of them.  I suggest creating a script and passing it to
the shell on the command line.

I know of instances in which trillions of entries were recovered and
available in a matter of hours.

-Eric


On Wed, Mar 27, 2013 at 3:39 PM, Aji Janis <aji1705@gmail.com> wrote:

> when you say " you can move the files aside in HDFS" .. which files are
> you referring to? I have never set up zookeeper myself so I am not aware of
> all the changes needed.
>
>
>
> On Wed, Mar 27, 2013 at 3:33 PM, Eric Newton <eric.newton@gmail.com>wrote:
>
>> If you lose zookeeper, you can move the files aside in HDFS, recreate
>> your instance in zookeeper and bulk import all of the old files.  It's not
>> perfect: you lose table configurations, split points and user permissions,
>> but you do preserve most of the data.
>>
>> You can back up each of these bits of information periodically if you
>> like.  Outside of the files in HDFS, the configuration information is
>> pretty small.
>>
>> -Eric
>>
>>
>>
>> On Wed, Mar 27, 2013 at 3:18 PM, Aji Janis <aji1705@gmail.com> wrote:
>>
>>> Eric and Josh thanks for all your feedback. We ended up *loosing all
>>> our accumulo data* because I had to reformat hadoop. Here is in a
>>> nutshell what I did:
>>>
>>>
>>>    1. Stop accumulo
>>>    2. Stop hadoop
>>>    3. On hadoop master and all datanodes, from dfs.data.dir
>>>    (hdfs-site.xml) remove everything under the data folder
>>>    4. On hadoop master, from dfs.name.dir (hdfs-site.xml) remove
>>>    everything under the name folder
>>>    5. As hadoop user, execute.../hadoop/bin/hadoop namenode -format
>>>    6. As hadoop user, execute.../hadoop/bin/start-all.sh ==> should
>>>    populate data/ and name/ dirs that was erased in steps 3, 4.
>>>    7. Initialized Accumulo - as accumulo user,
>>>     ../accumulo/bin/accumulo init (I created a new instance)
>>>    8. Start accumulo
>>>
>>> I was wondering if anyone had suggestions or thoughts on how I could
>>> have solved the original issue of accumulo waiting initialization without
>>> loosing my accumulo data? Is it possible to do so?
>>>
>>
>>
>

--047d7b5d88ed2cbbdf04d8edc21e
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div style>I should write this up in the user manual. =A0I=
t&#39;s not that hard, but it&#39;s really not the first thing you want to =
tackle while learning how to use accumulo. =A0I just opened=A0<a href=3D"ht=
tps://issues.apache.org/jira/browse/ACCUMULO-1217">ACCUMULO-1217</a>=A0to d=
o that.</div>
<div style><br></div><div style>I wrote this from memory: expect errors. =
=A0Needless to say, you would only want to do this when you are more comfor=
table with hadoop, zookeeper and accumulo.=A0</div><div style><br></div><di=
v style>
First, get zookeeper up and running, even if you have delete all its data. =
=A0</div><div style><br></div><div style>Next, attempt to determine the map=
ping of table names to tableIds. =A0You can do this in the shell when your =
accumulo instance is healthy. =A0If it isn&#39;t healthy, you will have to =
guess based on the data in the files in HDFS.</div>
<div style><br></div><div style>So, for example, the table &quot;trace&quot=
; is probably table id &quot;1&quot;. =A0You can find the files for trace i=
n /accumulo/tables/1.</div><div style><br></div><div style>Don&#39;t worry =
if you get the names wrong. =A0You can always rename the tables later.=A0</=
div>
<div style><br></div><div style>Move the old files for accumulo out of the =
way and re-initialize:</div><div style><br></div>$ hadoop fs -mv /accumulo =
/accumulo-old<div style>$ ./bin/accumulo init</div><div style>$ ./bin/start=
-all.sh</div>
<div style><br></div><div style>Recreate your tables:</div><div style><br><=
/div><div style>$ ./bin/accumulo shell -u root -p mysecret</div><div style>=
shell &gt; createtable table1</div><div style><br></div><div style>Learn th=
e new table id mapping:</div>
<div style>shell &gt; tables -l</div><div style>!METADATA =3D&gt; !0</div><=
div style>trace =3D&gt; 1</div><div style>table1 =3D&gt; 2</div><div style>=
...</div><div style><br></div><div style>Bulk import all your data back int=
o the new table ids:</div>
<div style>Assuming you have determined that &quot;table1&quot; used to be =
table id &quot;a&quot; and is now &quot;2&quot;,</div><div style>you do som=
ething like this:</div><div style><br></div><div style>$ hadoop fs -mkdir /=
tmp/failed</div>
<div style>$ ./bin/accumulo shell -u root -p mysecret</div><div style>shell=
 &gt; table table1</div><div style>shell table1 &gt; importdirectory /accum=
ulo-old/tables/a/default_tablet /tmp/failed true</div><div style><br></div>
<div style>There are lots of directories under every table id directory. =
=A0You will need to import each of them. =A0I suggest creating a script and=
 passing it to the shell on the command line.</div><div style><br></div><di=
v style>
I know of instances in which trillions of entries were recovered and availa=
ble in a matter of hours.</div><div style><br></div><div style>-Eric</div><=
div style><br></div></div><div class=3D"gmail_extra"><br><br><div class=3D"=
gmail_quote">
On Wed, Mar 27, 2013 at 3:39 PM, Aji Janis <span dir=3D"ltr">&lt;<a href=3D=
"mailto:aji1705@gmail.com" target=3D"_blank">aji1705@gmail.com</a>&gt;</spa=
n> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;b=
order-left:1px #ccc solid;padding-left:1ex">
when you say &quot;<span style=3D"color:rgb(34,34,34);font-size:12.66666698=
4558105px;font-family:arial,sans-serif">=A0you can move the files aside in =
HDFS&quot; .. which files are you referring to? I have never set up zookeep=
er myself so I am not aware of all the changes needed.</span><div class=3D"=
HOEnZb">
<div class=3D"h5"><div>
<font color=3D"#222222" face=3D"arial, sans-serif"><span style=3D"font-size=
:12.666666984558105px"><br></span></font></div><div><font color=3D"#222222"=
 face=3D"arial, sans-serif"><span style=3D"font-size:12.666666984558105px">=
<br></span></font><br>

<div class=3D"gmail_quote">On Wed, Mar 27, 2013 at 3:33 PM, Eric Newton <sp=
an dir=3D"ltr">&lt;<a href=3D"mailto:eric.newton@gmail.com" target=3D"_blan=
k">eric.newton@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmai=
l_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left=
:1ex">

<div dir=3D"ltr">If you lose zookeeper, you can move the files aside in HDF=
S, recreate your instance in zookeeper and bulk import all of the old files=
. =A0It&#39;s not perfect: you lose table configurations, split points and =
user permissions, but you do preserve most of the data.<div>


<br></div><div>You can back up each of these bits of information periodical=
ly if you like. =A0Outside of the files in HDFS, the configuration informat=
ion is pretty small.</div><span><font color=3D"#888888"><div>
<br></div><div>-Eric</div>
<div><br></div></font></span></div><div><div><div class=3D"gmail_extra"><br=
><br><div class=3D"gmail_quote">On Wed, Mar 27, 2013 at 3:18 PM, Aji Janis =
<span dir=3D"ltr">&lt;<a href=3D"mailto:aji1705@gmail.com" target=3D"_blank=
">aji1705@gmail.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">Eric and Josh thanks for all your feedback. =
We ended up <u>loosing all our accumulo data</u> because I had to reformat =
hadoop. Here is in a nutshell what I did:<div>


<br></div><div><ol><li>Stop accumulo=A0</li><li>Stop hadoop</li>
<li>On hadoop master and all datanodes, from=A0dfs.data.dir (hdfs-site.xml)=
 remove everything under the data folder</li><li>On hadoop master, from=A0d=
fs.name.dir (hdfs-site.xml) remove everything under the name folder</li><li=
>


As hadoop user, execute.../hadoop/bin/hadoop namenode -format</li><li>As ha=
doop user, execute.../hadoop/bin/start-all.sh =3D=3D&gt; should populate da=
ta/ and name/ dirs that was erased in steps 3, 4.</li><li>Initialized Accum=
ulo - as accumulo user, =A0../accumulo/bin/accumulo init (I created a new i=
nstance)</li>


<li>Start accumulo</li></ol></div><div>I was wondering if anyone had sugges=
tions or thoughts on how I could have solved the original issue of accumulo=
 waiting initialization without loosing my accumulo data? Is it possible to=
 do so?
</div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>

--047d7b5d88ed2cbbdf04d8edc21e--