accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aji Janis <>
Subject Re: Waiting for accumulo to be initialized
Date Thu, 28 Mar 2013 12:56:37 GMT

Thank you for the response. Its always great to hear from someone who has
tried out the steps (even if you had a different issue). Like you said I am
not really sure what caused the crash in our evn in the first place but
having a plan is always good...

Thanks again all,

On Wed, Mar 27, 2013 at 5:00 PM, Krishmin Rai <> wrote:

> Hi Aji,
> I wrote the original question linked below (about re-initing Accumulo over
> an existing installation).  For what it's worth, I believe that my
> ZooKeeper data loss was related to the linux+java leap second bug<>
-- not
> likely to be affecting you now (I did not go back and attempt to re-create
> the issue, so it's also possible there were other compounding issues). We
> have not encountered any ZK data-loss problems since.
> At the time, I did some basic experiments to understand the process
> better, and successfully followed (essentially) the steps Eric has
> described. The only real difficulty I had was identifying which directories
> corresponded to which tables; I ended up iterating over individual RFiles
> and manually identifying tables based on expected data. This was a somewhat
> painful process, but at least made me confident that it would be possible
> in production.
> It's also important to note that, at least according to my understanding,
> this procedure still potentially loses data: mutations written after the
> last minor compaction will only have reached the write-ahead-logs and will
> not be available in the raw RFiles you're importing from.
> -Krishmin
> On Mar 27, 2013, at 4:45 PM, Aji Janis wrote:
> Eric, Really appreciate you jotting this down. Too late to try it out this
> time but will give this a try (if, hopefully not) there is a next time to
> be had.
> Thanks again.
> On Wed, Mar 27, 2013 at 4:19 PM, Eric Newton <>wrote:
>> I should write this up in the user manual.  It's not that hard, but it's
>> really not the first thing you want to tackle while learning how to use
>> accumulo.  I just opened ACCUMULO-1217<>
>> do that.
>> I wrote this from memory: expect errors.  Needless to say, you would only
>> want to do this when you are more comfortable with hadoop, zookeeper and
>> accumulo.
>> First, get zookeeper up and running, even if you have delete all its
>> data.
>> Next, attempt to determine the mapping of table names to tableIds.  You
>> can do this in the shell when your accumulo instance is healthy.  If it
>> isn't healthy, you will have to guess based on the data in the files in
>> HDFS.
>> So, for example, the table "trace" is probably table id "1".  You can
>> find the files for trace in /accumulo/tables/1.
>> Don't worry if you get the names wrong.  You can always rename the tables
>> later.
>> Move the old files for accumulo out of the way and re-initialize:
>> $ hadoop fs -mv /accumulo /accumulo-old
>> $ ./bin/accumulo init
>> $ ./bin/
>> Recreate your tables:
>> $ ./bin/accumulo shell -u root -p mysecret
>> shell > createtable table1
>> Learn the new table id mapping:
>> shell > tables -l
>> !METADATA => !0
>> trace => 1
>> table1 => 2
>> ...
>> Bulk import all your data back into the new table ids:
>> Assuming you have determined that "table1" used to be table id "a" and is
>> now "2",
>> you do something like this:
>> $ hadoop fs -mkdir /tmp/failed
>> $ ./bin/accumulo shell -u root -p mysecret
>> shell > table table1
>> shell table1 > importdirectory /accumulo-old/tables/a/default_tablet
>> /tmp/failed true
>> There are lots of directories under every table id directory.  You will
>> need to import each of them.  I suggest creating a script and passing it to
>> the shell on the command line.
>> I know of instances in which trillions of entries were recovered and
>> available in a matter of hours.
>> -Eric
>> On Wed, Mar 27, 2013 at 3:39 PM, Aji Janis <> wrote:
>>> when you say " you can move the files aside in HDFS" .. which files are
>>> you referring to? I have never set up zookeeper myself so I am not aware of
>>> all the changes needed.
>>> On Wed, Mar 27, 2013 at 3:33 PM, Eric Newton <>wrote:
>>>> If you lose zookeeper, you can move the files aside in HDFS, recreate
>>>> your instance in zookeeper and bulk import all of the old files.  It's not
>>>> perfect: you lose table configurations, split points and user permissions,
>>>> but you do preserve most of the data.
>>>> You can back up each of these bits of information periodically if you
>>>> like.  Outside of the files in HDFS, the configuration information is
>>>> pretty small.
>>>> -Eric
>>>> On Wed, Mar 27, 2013 at 3:18 PM, Aji Janis <> wrote:
>>>>> Eric and Josh thanks for all your feedback. We ended up *loosing all
>>>>> our accumulo data* because I had to reformat hadoop. Here is in a
>>>>> nutshell what I did:
>>>>>    1. Stop accumulo
>>>>>    2. Stop hadoop
>>>>>    3. On hadoop master and all datanodes, from
>>>>>    (hdfs-site.xml) remove everything under the data folder
>>>>>    4. On hadoop master, from (hdfs-site.xml) remove
>>>>>    everything under the name folder
>>>>>    5. As hadoop user, execute.../hadoop/bin/hadoop namenode -format
>>>>>    6. As hadoop user, execute.../hadoop/bin/ ==> should
>>>>>    populate data/ and name/ dirs that was erased in steps 3, 4.
>>>>>    7. Initialized Accumulo - as accumulo user,
>>>>>     ../accumulo/bin/accumulo init (I created a new instance)
>>>>>    8. Start accumulo
>>>>> I was wondering if anyone had suggestions or thoughts on how I could
>>>>> have solved the original issue of accumulo waiting initialization without
>>>>> loosing my accumulo data? Is it possible to do so?

View raw message