phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Taylor (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-1142) Improve CsvBulkLoadTool to parse different Date formats
Date Mon, 02 Feb 2015 21:51:35 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14301993#comment-14301993
] 

James Taylor commented on PHOENIX-1142:
---------------------------------------

Let's let [~gabriel.reid] review too. Here's some additional concerns:
- The SimpleDateFormat is not thread safe for parsing, so you can't store it in a local like
this in DateUtil (but maybe I'm missing something?):
{code}
+    private static DateTimeParser defaultDateTimeParser;
{code}
- There's a facility to set the date timezone and date format on a config property which might
occur at connection time. This code seems to disable that?
{code}
diff --git phoenix-core/src/main/java/org/apache/phoenix/parse/ToDateParseNode.java phoenix-core/src/main/java/org/apache/phoenix/parse/ToDateParseNode.java
index 46bca63..ab1d76f 100644
--- phoenix-core/src/main/java/org/apache/phoenix/parse/ToDateParseNode.java
+++ phoenix-core/src/main/java/org/apache/phoenix/parse/ToDateParseNode.java
@@ -38,10 +38,10 @@ public class ToDateParseNode extends FunctionParseNode {
 
     @Override
     public FunctionExpression create(List<Expression> children, StatementContext context)
throws SQLException {
-        Format dateParser;
+        DateUtil.DateTimeParser dateParser;
         String dateFormat = (String) ((LiteralExpression) children.get(1)).getValue();
         String timeZoneId = (String) ((LiteralExpression) children.get(2)).getValue();
-        TimeZone parserTimeZone = context.getDateFormatTimeZone();
+        TimeZone parserTimeZone = null;
         if (dateFormat == null) {
             dateFormat = context.getDateFormat();
         }
{code}
- It appears that the default date parsing is being set as a static variable (JVM-wide), while
ConnectionQueryServicesImpl is a per cluster object. This doesn't seem to be correct. I think
we should consider where the DateTimeParser should live/be scoped to:
{code}
@@ -249,6 +238,8 @@ public class ConnectionQueryServicesImpl extends DelegateQueryServices
implement
                 .maximumSize(MAX_TABLE_STATS_CACHE_ENTRIES)
                 .expireAfterWrite(halfStatsUpdateFreq, TimeUnit.MILLISECONDS)
                 .build();
+        // Initialize DateUtil DefaultDatetimeParser
+        DateUtil.init(this.config);
     }
{code}
- In general, we use the QueryServices.getProps() method to access property values rather
than go through Configuration. The reason is that we've seen this impact performance because
Configuration does locking while ReadOnlyProps does not. The locking is not necessary, because
the Configuration is not changing in-place.
- What will the impact be of changing the DATE_TIME_PARSER (now ISO_DATE_TIME_PARSER) on CSV
bulk loading? We append the ' ' literal because we use a YYYY-MM-DD HH:MM:SS.zzz format (which
isn't ISO standard, I believe, as ISO wants a 'T' instead).
{code}
+    private static final DateTimeFormatter ISO_DATE_TIME_PARSER = new DateTimeFormatterBuilder()
             .append(ISODateTimeFormat.dateParser())
             .appendOptional(new DateTimeFormatterBuilder()
-                    .appendLiteral(' ')
+                    .appendLiteral(' ').toParser())
+            .appendOptional(new DateTimeFormatterBuilder()
                     .append(ISODateTimeFormat.timeParser()).toParser())
             .toFormatter()
+            .withZoneUTC()
             .withChronology(ISOChronology.getInstanceUTC());
{code}

> Improve CsvBulkLoadTool to parse different Date formats
> -------------------------------------------------------
>
>                 Key: PHOENIX-1142
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1142
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: David Kjerrumgaard
>            Assignee: Jeffrey Zhong
>         Attachments: PHOENIX-1142.patch
>
>
> Currently, the CsvBulkLoadTool uses a single 'default' format to parse dates from the
source CSV file. This can sometimes cause issues when the date fields don't match this default
format.
> Would it be possible to add a method for specifying / overriding the 'default' date format?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message