accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Krishmin Rai <kr...@missionfoc.us>
Subject Re: Waiting for accumulo to be initialized
Date Wed, 27 Mar 2013 21:00:27 GMT
Hi Aji,
I wrote the original question linked below (about re-initing Accumulo over an existing installation).
 For what it's worth, I believe that my ZooKeeper data loss was related to the linux+java
leap second bug -- not likely to be affecting you now (I did not go back and attempt to re-create
the issue, so it's also possible there were other compounding issues). We have not encountered
any ZK data-loss problems since. 

At the time, I did some basic experiments to understand the process better, and successfully
followed (essentially) the steps Eric has described. The only real difficulty I had was identifying
which directories corresponded to which tables; I ended up iterating over individual RFiles
and manually identifying tables based on expected data. This was a somewhat painful process,
but at least made me confident that it would be possible in production.

It's also important to note that, at least according to my understanding, this procedure still
potentially loses data: mutations written after the last minor compaction will only have reached
the write-ahead-logs and will not be available in the raw RFiles you're importing from.

-Krishmin

On Mar 27, 2013, at 4:45 PM, Aji Janis wrote:

> Eric, Really appreciate you jotting this down. Too late to try it out this time but will
give this a try (if, hopefully not) there is a next time to be had. 
> 
> Thanks again.
> 
> 
> 
> On Wed, Mar 27, 2013 at 4:19 PM, Eric Newton <eric.newton@gmail.com> wrote:
> I should write this up in the user manual.  It's not that hard, but it's really not the
first thing you want to tackle while learning how to use accumulo.  I just opened ACCUMULO-1217
to do that.
> 
> I wrote this from memory: expect errors.  Needless to say, you would only want to do
this when you are more comfortable with hadoop, zookeeper and accumulo. 
> 
> First, get zookeeper up and running, even if you have delete all its data.  
> 
> Next, attempt to determine the mapping of table names to tableIds.  You can do this in
the shell when your accumulo instance is healthy.  If it isn't healthy, you will have to guess
based on the data in the files in HDFS.
> 
> So, for example, the table "trace" is probably table id "1".  You can find the files
for trace in /accumulo/tables/1.
> 
> Don't worry if you get the names wrong.  You can always rename the tables later. 
> 
> Move the old files for accumulo out of the way and re-initialize:
> 
> $ hadoop fs -mv /accumulo /accumulo-old
> $ ./bin/accumulo init
> $ ./bin/start-all.sh
> 
> Recreate your tables:
> 
> $ ./bin/accumulo shell -u root -p mysecret
> shell > createtable table1
> 
> Learn the new table id mapping:
> shell > tables -l
> !METADATA => !0
> trace => 1
> table1 => 2
> ...
> 
> Bulk import all your data back into the new table ids:
> Assuming you have determined that "table1" used to be table id "a" and is now "2",
> you do something like this:
> 
> $ hadoop fs -mkdir /tmp/failed
> $ ./bin/accumulo shell -u root -p mysecret
> shell > table table1
> shell table1 > importdirectory /accumulo-old/tables/a/default_tablet /tmp/failed true
> 
> There are lots of directories under every table id directory.  You will need to import
each of them.  I suggest creating a script and passing it to the shell on the command line.
> 
> I know of instances in which trillions of entries were recovered and available in a matter
of hours.
> 
> -Eric
> 
> 
> 
> On Wed, Mar 27, 2013 at 3:39 PM, Aji Janis <aji1705@gmail.com> wrote:
> when you say " you can move the files aside in HDFS" .. which files are you referring
to? I have never set up zookeeper myself so I am not aware of all the changes needed.
> 
> 
> 
> On Wed, Mar 27, 2013 at 3:33 PM, Eric Newton <eric.newton@gmail.com> wrote:
> If you lose zookeeper, you can move the files aside in HDFS, recreate your instance in
zookeeper and bulk import all of the old files.  It's not perfect: you lose table configurations,
split points and user permissions, but you do preserve most of the data.
> 
> You can back up each of these bits of information periodically if you like.  Outside
of the files in HDFS, the configuration information is pretty small.
> 
> -Eric
> 
> 
> 
> On Wed, Mar 27, 2013 at 3:18 PM, Aji Janis <aji1705@gmail.com> wrote:
> Eric and Josh thanks for all your feedback. We ended up loosing all our accumulo data
because I had to reformat hadoop. Here is in a nutshell what I did:
> 
> Stop accumulo 
> Stop hadoop
> On hadoop master and all datanodes, from dfs.data.dir (hdfs-site.xml) remove everything
under the data folder
> On hadoop master, from dfs.name.dir (hdfs-site.xml) remove everything under the name
folder
> As hadoop user, execute.../hadoop/bin/hadoop namenode -format
> As hadoop user, execute.../hadoop/bin/start-all.sh ==> should populate data/ and name/
dirs that was erased in steps 3, 4.
> Initialized Accumulo - as accumulo user,  ../accumulo/bin/accumulo init (I created a
new instance)
> Start accumulo
> I was wondering if anyone had suggestions or thoughts on how I could have solved the
original issue of accumulo waiting initialization without loosing my accumulo data? Is it
possible to do so?
> 
> 
> 
> 


Mime
View raw message