hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hsieh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7419) revisit hfilelink file name format.
Date Fri, 04 Jan 2013 17:58:13 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13544072#comment-13544072
] 

Jonathan Hsieh commented on HBASE-7419:
---------------------------------------

I'm actually not clear what the concerns are about special characters outside of the hdfs
context, and when pertaining to windows or web context.  I assume that since we are on hdfs,
characters valid there would be valid regardless if they are not in the underlaying file system.

In the web case, I buy having to have to escape filenames for web lookups (no &%\/@).
 Are these problems at the tooling level (hdfs dfs -ls), the web level or at testing?  what
are characters that are valid in hdfs that we should avoid, and more importantly, why?

[~eclark], [~stack] you brought up the concerns about web stuff, comments?

[~enis] any quick pointers for hdfs file name and windows compat issues?
                
> revisit hfilelink file name format.
> -----------------------------------
>
>                 Key: HBASE-7419
>                 URL: https://issues.apache.org/jira/browse/HBASE-7419
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Client, master, regionserver, snapshots, Zookeeper
>            Reporter: Jonathan Hsieh
>            Assignee: Matteo Bertozzi
>             Fix For: hbase-6055, 0.96.0
>
>         Attachments: HBASE-7419-v0.patch
>
>
> Valid table names are concatted with a '.' to a valid regions names is also a valid table
name, and lead to the incorrect interpretation.
> {code}
> true hfile name constraints: [0-9]+(?:_SeqID_[0-9]+)?
> region name constraints    : [a-f0-9]{16}  (but we currently just use [a-f0-9]+.)
> table name constraints     : [a-zA-Z0-9_][a-zA-Z0-9_.-]*
> {code}
> Notice that the table name constraints completely covers all region name constraints
and true hfile name constraints.   (a valid hfile name is a valid part of a table name, and
a valid enc region name is a valid part of a table name.
> Currently the hfilelink filename convention is <hfile>-<region>-<table>.
 Unfortunately, making a ref to this uses the name <hfile>-<region>-<table>.<parentregion>
-- the contactnation of <table>.<parentregion> is a valid table name used to get
interpreted as such.  The fix in HBASE-7339 requires a FileNotFoundException before going
down the hfile link resolution path. 
> Regardless of what we do, we need to add some char invalid for table names to the hfilelink
or reference filename convention.
> Suggestion: if we changed the order of the hfile-link name we could avoid some of the
confusion -- <table>@<region>-<hfile>.<parentregion> (or some other
separator char than '@') could be used to avoid handling on the initial filenotfoundexception
but I think we'd still need a good chunk of the logic to handle opening half-storefile reader
throw a hfilelink.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message