hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hsieh (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-7339) Splitting a hfilelink causes region servers to go down.
Date Wed, 12 Dec 2012 20:08:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13530267#comment-13530267
] 

Jonathan Hsieh edited comment on HBASE-7339 at 12/12/12 8:06 PM:
-----------------------------------------------------------------

This was encountered when testing online snapshots, but will affect offline snapshots as well.

Suggested solutions:
1) Make opening the hfile-link daughter reference more robust, by attempting to treat as a
reference if treating as link fails.  Hacky but "should" work.
2) Change the regex's used to differentiate references and hfilelinks more strict so that
we can differentiate. Hacky, not sure if it will work.
3) Change daughter reference link file name to be more robust.  Currently '<hfile>.<parentregion>',
maybe chanage to '<hfile>@<parentregion>'. This would then allow '<hfile>\-<region>\-<table>@<parentreigon>'
to be interpreted correctly.  This is the "right way" but breaks compatibility

Other follow-ons -- ideally we are more robust by quarantining a bad region or hfiles/linksfiles
if it has killed a few nodes in the cluster.
                
      was (Author: jmhsieh):
    This was encountered when testing online snapshots, but will affect offline snapshots
as well.

Suggested solutions:
1) Make opening the hfile-link daughter reference more robust, by attempting to treat as a
reference if treating as link fails.  Hacky but "should" work.
2) Change the regex's used to differentiate references and hfilelinks more strict so that
we can differentiate. Hacky, not sure if it will work.
3) Change daughter reference link file name to be more robust.  Currently '<hfile>.<parentregion>',
maybe chanage to '<hfile>@<parentregion>'. This would then allow '<hfile>-<region>-<table>@<parentreigon>'
to be interpreted correctly.  This is the "right way" but breaks compatibility

Other follow-ons -- ideally we are more robust by quarantining a bad region or hfiles/linksfiles
if it has killed a few nodes in the cluster.
                  
> Splitting a hfilelink causes region servers to go down.
> -------------------------------------------------------
>
>                 Key: HBASE-7339
>                 URL: https://issues.apache.org/jira/browse/HBASE-7339
>             Project: HBase
>          Issue Type: Sub-task
>          Components: snapshots
>    Affects Versions: hbase-6055
>            Reporter: Jonathan Hsieh
>            Assignee: Jonathan Hsieh
>            Priority: Blocker
>             Fix For: hbase-6055
>
>
> Steps:
> - Have a single region table with 15 hfiles in it.
> - Snapshot it. (was done using online snapshot from HBASE-7321)
> - Clone a snapshot 
> - region post-open task attempts to compact region.  policy does not compact all files.
(default seems to be 10)
> - after compaction we have hfile links and real hfiles mixed in the region
> - it starts splitting
> - creating split references, opening daughers fails 
> - hfile links are "split", creating hfile link daughter refs.  {{<<hfile>\-<region>\-<table>>.<parentregion>}}
> - these "split" hfile links are interpreted as hfile links with table {{<table>.<parentregion>}}
-> {{<<hfile>\-<region>>\-<<table>.<parentregion>>}}
 (groupings interpreted incorrectly)
> - Since this is after the splitting PONR, this aborts the server.  It then spreads to
the next server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message