accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Wall <mjw...@gmail.com>
Subject Re: Fix "Table x has a hole" [SEC=UNOFFICIAL]
Date Wed, 01 Mar 2017 12:56:52 GMT
Matt,

This sentence is concerning to me "I've always removed the referenced
tablet in the metadata table to fix this and had no issues in the past."  I
rarely make edits to the metadata table and am very, very cautious when I
do.  This should not be part of normal operating procedures.  Can you
provide more context?

Mike

On Wed, Mar 1, 2017 at 12:23 AM, Dickson, Matt MR <
matt.dickson@defence.gov.au> wrote:

> UNOFFICIAL
>
> Thanks for that Keith,
>
> That's got it working again.  As for the cause, I had an error in the logs
> stating a tablet was hosted and assigned.  I've always removed the
> referenced tablet in the metadata table to fix this and had no issues in
> the past.  It looks like I fat fingered the deletion which removed the
> wrong entry so not an issue with Accumulo.
>
> Thanks.
>
> -----Original Message-----
> From: Keith Turner [mailto:keith@deenlo.com]
> Sent: Wednesday, 1 March 2017 03:36
> To: user@accumulo.apache.org
> Subject: Re: Fix "Table x has a hole" [SEC=UNOFFICIAL]
>
> Below are some commands that show how to recreate this problem and how
> to fix it.   Each table in the metadata table has a pointer to the
> previous tablets.  Adding and removing splits to a table changes this.
>
>   root@uno> createtable test
>
> Get the tables ID below we will need it later.
>
>   root@uno test> tables -l
>   accumulo.metadata    =>        !0
>   accumulo.replication =>      +rep
>   accumulo.root        =>        +r
>   test                 =>         3
>   trace                =>         1
>
> Add some splits and then scan the metadata table.  The pointers to the
> previous tablet are in the ~tab:~pr column.  The scan below uses the table
> id above.
>
>   root@uno test> addsplits 11111111 3333333
>   root@uno test> scan -t accumulo.metadata -c ~tab:~pr -b 3; -e 3<
>   3;11111111 ~tab:~pr []    \x00
>   3;3333333 ~tab:~pr []    \x0111111111
>   3< ~tab:~pr []    \x013333333
>
> Add another split and rescan the metadata table.
>
>   root@uno test> addsplits 2222222
>   root@uno test> scan -t accumulo.metadata -c ~tab:~pr -b 3; -e 3<
>   3;11111111 ~tab:~pr []    \x00
>   3;2222222 ~tab:~pr []    \x0111111111
>   3;3333333 ~tab:~pr []    \x012222222
>   3< ~tab:~pr []    \x013333333
>
> Grant permission to write to the metadata table and then recreate the
> problem you have.
>
>   root@uno test> grant Table.WRITE -u root -t accumulo.metadata
>   root@uno test> table accumulo.metadata
>   root@uno accumulo.metadata> insert 3;3333333 ~tab ~pr \x0111111111
>   root@uno accumulo.metadata> scan -t accumulo.metadata -c ~tab:~pr -b 3;
> -e 3<
>   3;11111111 ~tab:~pr []    \x00
>   3;2222222 ~tab:~pr []    \x0111111111
>   3;3333333 ~tab:~pr []    \x0111111111
>   3< ~tab:~pr []    \x013333333
>
> If you ran check for metadata problems here, should see the error message
> you saw.  Below, the pointer is fixed and write permission is revoked (to
> prevent accidental writes in the future).
>
>   root@uno accumulo.metadata> insert 3;3333333 ~tab ~pr \x012222222
>   root@uno accumulo.metadata> revoke Table.WRITE -u root -t
> accumulo.metadata
>   root@uno accumulo.metadata>
>
> After running the command above to fix the potiner, check for metadata
> problems should be happy.
>
> It would be nice to try to track down the cause of this.  Spliting a
> tablet involves three metadata operations.  For fault tolerance, the
> columns ~tab:oldprevrow and ~tab:splitRatio are temporarily written.
> If a tablet server dies in the middle of splitting a tablet, then Accumulo
> will see these temporary columns and attempt to continue the split.  So I
> am curious if you see these columns?
>
> On Sun, Feb 26, 2017 at 6:49 PM, Dickson, Matt MR <
> matt.dickson@defence.gov.au> wrote:
> > UNOFFICIAL
> >
> > Running the CheckForMetadataProblems on Accumulo is listing
> >
> > Table xxx has a hole 11111111 != 2222222
> >
> > Is there a correct way to repair this?
> >
> > Thanks in advance.
>

Mime
View raw message