hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Dimiduk (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10385) ImportTsv to parse date time from typical loader formats
Date Mon, 03 Feb 2014 21:51:07 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13889961#comment-13889961
] 

Nick Dimiduk commented on HBASE-10385:
--------------------------------------

Sorry [~ericavijay], it looks like your heroic patch slipped through the cracks. After attaching
a patch file vs trunk, the next step is to click the "submit patch" button. This will queue
it up for our QABot to do the due diligence. Patches are generally not accepted unless they
pass the precommit verification that bot performs. I've clicked the button, we'll see if your
patch still applies cleanly.

As for exposing ParsedLine, I vaguely remember running into something similar in a previous
life. My workaround was to place my custom mapper in the org.apache.hadoop.hbase namespace.
I think this is tedious and should not be necessary, so I'm fine with exposing it as a public
class. I think that'll facilitate general reuse. I think that's a separate issue, right?

> ImportTsv to parse date time from typical loader formats
> --------------------------------------------------------
>
>                 Key: HBASE-10385
>                 URL: https://issues.apache.org/jira/browse/HBASE-10385
>             Project: HBase
>          Issue Type: New Feature
>          Components: mapreduce
>    Affects Versions: 0.96.1.1
>            Reporter: Vijay Sarvepali
>            Priority: Minor
>              Labels: importtsv
>         Attachments: HBASE-10385.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Simple patch to enable parsing of standard date time fields from TSV files into Hbase.
> ***************
> *** 57,62 ****
> --- 57,70 ----
>   import com.google.common.base.Splitter;
>   import com.google.common.collect.Lists;
>   
> + //2013-08-19T04:39:07
> + import java.text.DateFormat;
> + import java.util.*;
> + import java.text.SimpleDateFormat;
> + import java.text.ParseException;
> + 
> + 
> + 
>   /**
>    * Tool to import data from a TSV file.
>    *
> ***************
> *** 220,229 ****
>               getColumnOffset(timestampKeyColumnIndex),
>               getColumnLength(timestampKeyColumnIndex));
>           try {
> !           return Long.parseLong(timeStampStr);
>           } catch (NumberFormatException nfe) {
>             // treat this record as bad record
> !           throw new BadTsvLineException("Invalid timestamp " + timeStampStr);
>           }
>         }
>         
> --- 228,239 ----
>               getColumnOffset(timestampKeyColumnIndex),
>               getColumnLength(timestampKeyColumnIndex));
>           try {
> ! 	    return Long.parseLong(timeStampStr);
>           } catch (NumberFormatException nfe) {
> + 	    // Try this record with string to date in mseconds long
> + 	    return extractTimestampInput(timeStampStr);
>             // treat this record as bad record
> !           //throw new BadTsvLineException("Invalid timestamp " + timeStampStr);
>           }
>         }
>         
> ***************
> *** 243,248 ****
> --- 253,274 ----
>           return lineBytes;
>         }
>       }
> +  public static long extractTimestampInput(String strDate) throws BadTsvLineException{
> +     final List<String> dateFormats = Arrays.asList("yyyy-MM-dd HH:mm:ss.SSS",
"yyyy-MM-dd'T'HH:mm:ss");    
> + 
> +     for(String format: dateFormats){
> +         SimpleDateFormat sdf = new SimpleDateFormat(format);
> +         try{
> +             Date d= sdf.parse(strDate);
> + 	    long msecs = d.getTime();
> + 	    return msecs;
> +         } catch (ParseException e) {
> + 	    //intentionally empty
> +         }
> +     }
> +     // If we come here we have a problem with converting timestamps for this row.
> +     throw new BadTsvLineException("Invalid timestamp " + strDate); 
> +  } 
>   
>       public static class BadTsvLineException extends Exception {
>         public BadTsvLineException(String err) {



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message