hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "liubangchen (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HIVE-18582) MSCK REPAIR TABLE Throw MetaException
Date Wed, 31 Jan 2018 07:55:02 GMT

    [ https://issues.apache.org/jira/browse/HIVE-18582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16346386#comment-16346386
] 

liubangchen edited comment on HIVE-18582 at 1/31/18 7:54 AM:
-------------------------------------------------------------

We can add a method to valid the method findUnknownPartitions of class HiveMetaStoreChecker 

 
{code:java}

void findUnknownPartitions(Table table, Set<Path> partPaths,
      CheckResult result) throws IOException, HiveException {

    Path tablePath = table.getPath();
    // now check the table folder and see if we find anything
    // that isn't in the metastore
    Set<Path> allPartDirs = new HashSet<Path>();
    getAllLeafDirs(tablePath, allPartDirs);
    // don't want the table dir
    allPartDirs.remove(tablePath);

    // remove the partition paths we know about
    allPartDirs.removeAll(partPaths);

    // we should now only have the unexpected folders left
    for (Path partPath : allPartDirs) {
      if(!isVaildPartitionPath(table,partPath)){
        LOG.warn("invalid data path:"+partPath.toString());
        continue;
      }
      FileSystem fs = partPath.getFileSystem(conf);
      String partitionName = getPartitionName(fs.makeQualified(tablePath),
          partPath);

      if (partitionName != null) {
        PartitionResult pr = new PartitionResult();
        pr.setPartitionName(partitionName);
        pr.setTableName(table.getTableName());

        result.getPartitionsNotInMs().add(pr);
      }
    }
  }

  boolean isVaildPartitionPath(Table table,Path partpath){
    Path tablePath = table.getPath();
    String partpathinfo=partpath.toString();
    String partinfo=partpathinfo.substring(tablePath.toString().length()+1,partpathinfo.length());
    if(partinfo==null||"".equals(partinfo)){
      return false;
    }
    String[] parts=partinfo.split("/");
    if(parts==null||parts.length==0){
      return false;
    }
    Map<String,String> partsmap=new java.util.HashMap<String,String>();
    for(String part:parts){
      int index=part.indexOf("=");
      if(index<0){
        continue;
      }
      String partname=part.substring(0,index);
      partsmap.put(partname,partname);
    }
    for (FieldSchema field : table.getPartCols()) {
      String val = partsmap.get(field.getName());
      if (val == null || val.isEmpty()) {
        return false;
      }
    }
    return true;
  }
{code}

Let me submit the patch.
 


was (Author: liubangchen):
We can add a method to valid the method findUnknownPartitions of class HiveMetaStoreChecker 

 
{code:java}

void findUnknownPartitions(Table table, Set<Path> partPaths,
      CheckResult result) throws IOException, HiveException {

    Path tablePath = table.getPath();
    // now check the table folder and see if we find anything
    // that isn't in the metastore
    Set<Path> allPartDirs = new HashSet<Path>();
    getAllLeafDirs(tablePath, allPartDirs);
    // don't want the table dir
    allPartDirs.remove(tablePath);

    // remove the partition paths we know about
    allPartDirs.removeAll(partPaths);

    // we should now only have the unexpected folders left
    for (Path partPath : allPartDirs) {
      if(!isVaildPartitionPath(table,partPath)){
        LOG.warn("invalid data path:"+partPath.toString());
        continue;
      }
      FileSystem fs = partPath.getFileSystem(conf);
      String partitionName = getPartitionName(fs.makeQualified(tablePath),
          partPath);

      if (partitionName != null) {
        PartitionResult pr = new PartitionResult();
        pr.setPartitionName(partitionName);
        pr.setTableName(table.getTableName());

        result.getPartitionsNotInMs().add(pr);
      }
    }
  }

  boolean isVaildPartitionPath(Table table,Path partpath){
    Path tablePath = table.getPath();
    String partpathinfo=partpath.toString();
    String partinfo=partpathinfo.substring(tablePath.toString().length()+1,partpathinfo.length());
    if(partinfo==null||"".equals(partinfo)){
      return false;
    }
    String[] parts=partinfo.split("/");
    if(parts==null||parts.length==0){
      return false;
    }
    Map<String,String> partsmap=new java.util.HashMap<String,String>();
    for(String part:parts){
      int index=part.indexOf("=");
      if(index<0){
        continue;
      }
      String partname=part.substring(0,index);
      partsmap.put(partname,partname);
    }
    for (FieldSchema field : table.getPartCols()) {
      String val = partsmap.get(field.getName());
      if (val == null || val.isEmpty()) {
        return false;
      }
    }
    return true;
  }
{code}

Let me submit the patche.
 

>  MSCK REPAIR TABLE Throw MetaException
> --------------------------------------
>
>                 Key: HIVE-18582
>                 URL: https://issues.apache.org/jira/browse/HIVE-18582
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Planning
>    Affects Versions: 2.1.1
>            Reporter: liubangchen
>            Priority: Major
>
> while executing query MSCK REPAIR TABLE tablename I got Exception:
> {code:java}
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Expected 1 components,
got 2 (log_date=2015121309/vgameid=lyjt))
> at org.apache.hadoop.hive.ql.exec.DDLTask.msck(DDLTask.java:1847)
> at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:402)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
> at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2073)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1744)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1453)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1171)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1161)
> at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
> --
> Caused by: MetaException(message:Expected 1 components, got 2 (log_date=2015121309/vgameid=lyjt))
> at org.apache.hadoop.hive.metastore.Warehouse.makeValsFromName(Warehouse.java:385)
> at org.apache.hadoop.hive.ql.exec.DDLTask.msck(DDLTask.java:1845)
> {code}
> table PARTITIONED by (log_date,vgameid)
> The data file on HDFS is:
>  
> {code:java}
> /usr/hive/warehouse/a.db/tablename/log_date=2015063023
> drwxr-xr-x - root supergroup 0 2018-01-26 09:41 /usr/hive/warehouse/a.db/tablename/log_date=2015121309/vgameid=lyjt
> {code}
> The subdir of log_data=2015063023 is empty
> If i set  hive.msck.path.validation=ignore Then msck repair table will executed ok.
> Then I found code like this:
> {code:java}
> private int msck(Hive db, MsckDesc msckDesc) {
>   CheckResult result = new CheckResult();
>   List<String> repairOutput = new ArrayList<String>();
>   try {
>     HiveMetaStoreChecker checker = new HiveMetaStoreChecker(db);
>     String[] names = Utilities.getDbTableName(msckDesc.getTableName());
>     checker.checkMetastore(names[0], names[1], msckDesc.getPartSpecs(), result);
>     List<CheckResult.PartitionResult> partsNotInMs = result.getPartitionsNotInMs();
>     if (msckDesc.isRepairPartitions() && !partsNotInMs.isEmpty()) {
>      //I think bug is here
>       AbstractList<String> vals = null;
>       String settingStr = HiveConf.getVar(conf, HiveConf.ConfVars.HIVE_MSCK_PATH_VALIDATION);
>       boolean doValidate = !("ignore".equals(settingStr));
>       boolean doSkip = doValidate && "skip".equals(settingStr);
>       // The default setting is "throw"; assume doValidate && !doSkip means throw.
>       if (doValidate) {
>         // Validate that we can add partition without escaping. Escaping was originally
intended
>         // to avoid creating invalid HDFS paths; however, if we escape the HDFS path
(that we
>         // deem invalid but HDFS actually supports - it is possible to create HDFS paths
with
>         // unprintable characters like ASCII 7), metastore will create another directory
instead
>         // of the one we are trying to "repair" here.
>         Iterator<CheckResult.PartitionResult> iter = partsNotInMs.iterator();
>         while (iter.hasNext()) {
>           CheckResult.PartitionResult part = iter.next();
>           try {
>             vals = Warehouse.makeValsFromName(part.getPartitionName(), vals);
>           } catch (MetaException ex) {
>             throw new HiveException(ex);
>           }
>           for (String val : vals) {
>             String escapedPath = FileUtils.escapePathName(val);
>             assert escapedPath != null;
>             if (escapedPath.equals(val)) continue;
>             String errorMsg = "Repair: Cannot add partition " + msckDesc.getTableName()
>                 + ':' + part.getPartitionName() + " due to invalid characters in the
name";
>             if (doSkip) {
>               repairOutput.add(errorMsg);
>               iter.remove();
>             } else {
>               throw new HiveException(errorMsg);
>             }
>           }
>         }
>       }
> {code}
> I think  AbstractList<String> vals = null; must placed after  "while (iter.hasNext())
{" will work ok.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message