accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Newton (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-315) Hole in metadata table occurred during random walk test
Date Tue, 17 Jan 2012 21:25:41 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13188015#comment-13188015
] 

Eric Newton commented on ACCUMULO-315:
--------------------------------------

The master was performing a range-delete.  It split a tablet into three sections, to remove
the center with tablet operations.

While it was working through the online->chop->offline state transition, the last tablet
of the three had a split.  This caused the main loop to miss a tablet, and to have bad counts.
 The master then mistakenly believed that all the tablets needed to be offline had been taken
offline.  The master then updated the prevRow of the last tablet while the tablet was still
online, which caused the hole in the metadata table.

                
> Hole in metadata table occurred during random walk test
> -------------------------------------------------------
>
>                 Key: ACCUMULO-315
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-315
>             Project: Accumulo
>          Issue Type: Bug
>          Components: master, tserver
>         Environment: Running 1.4.0 SNAPSHOT on 10 node cluster.
>            Reporter: Keith Turner
>            Assignee: Keith Turner
>            Priority: Critical
>             Fix For: 1.4.0
>
>
> While running the random walk test a hole in the metadata table occurred.  A client tried
to delete the table with the whole and the fate op got stuck.  Was continually seeing the
following in the master logs.
> {noformat}
> 14 00:02:11,273 [tableOps.CleanUp] DEBUG: Still waiting for table to be deleted: 4ct
locationState: 4ct;4d2d3be2823b0bf4;27b693c626c2d4ef@(null,xxx.xxx.xxx.xxx:9997[134d7425fc503e1],null)
> {noformat}
> The metadata table contained the following.  Tablet 4ct;4d2d3be2823b0bf4 had a location.
> {noformat}
> 4ct;262249211a62cd6f ~tab:~pr []    \x011819e56edae21302
> 4ct;27b693c626c2d4ef ~tab:~pr []    \x01262249211a62cd6f
> 4ct;43422047c78fa52b ~tab:~pr []    \x0141ea825af0f262d9
> 4ct;4d2d3be2823b0bf4 ~tab:~pr []    \x0127b693c626c2d4ef
> 4ct;4f89df61392bb311 ~tab:~pr []    \x014d2d3be2823b0bf4
> {noformat}
> Found the following events on a tablet server.
> {noformat}
> 21:36:04,369 [tabletserver.Tablet] TABLET_HIST: 4ct;4d2d3be2823b0bf4;27b693c626c2d4ef
split 4ct;41ea825af0f262d9;27b693c626c2d4ef 4ct;4d2d3be2823b0bf4;41ea825af0f262d9
> 21:36:06,351 [tabletserver.Tablet] TABLET_HIST: 4ct;4d2d3be2823b0bf4;41ea825af0f262d9
split 4ct;43422047c78fa52b;41ea825af0f262d9 4ct;4d2d3be2823b0bf4;43422047c78fa52b
> {noformat}
> Saw the following on the tablet server serving the metadata tablet at around the time
of the splits.  Not sure if this is related.
> {noformat}
> 13 21:36:10,956 [server.TNonblockingServer] WARN : Got an IOException in internalRead!
> java.io.IOException: Connection reset by peer
>         at sun.nio.ch.FileDispatcher.read0(Native Method)
>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
>         at sun.nio.ch.IOUtil.read(IOUtil.java:171)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
>         at org.apache.thrift.transport.TNonblockingSocket.read(TNonblockingSocket.java:141)
>         at org.apache.thrift.server.TNonblockingServer$FrameBuffer.internalRead(TNonblockingServer.java:668)
>         at org.apache.thrift.server.TNonblockingServer$FrameBuffer.read(TNonblockingServer.java:457)
>         at org.apache.thrift.server.TNonblockingServer$SelectThread.handleRead(TNonblockingServer.java:358)
>         at org.apache.thrift.server.TNonblockingServer$SelectThread.select(TNonblockingServer.java:303)
>         at org.apache.thrift.server.TNonblockingServer$SelectThread.run(TNonblockingServer.java:242)
> {noformat}
> Not sure what caused the metadata problem.  Further investigation is needed.  Also, while
debugging the master started assigning and unassigning metadata tablets rapidly.  Did not
get a change to investigate this, it stopped when I stopped the random walk test.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message