Return-Path: X-Original-To: apmail-phoenix-commits-archive@minotaur.apache.org Delivered-To: apmail-phoenix-commits-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 05B0E10440 for ; Wed, 5 Mar 2014 22:57:38 +0000 (UTC) Received: (qmail 90130 invoked by uid 500); 5 Mar 2014 22:56:24 -0000 Delivered-To: apmail-phoenix-commits-archive@phoenix.apache.org Received: (qmail 89661 invoked by uid 500); 5 Mar 2014 22:56:03 -0000 Mailing-List: contact commits-help@phoenix.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@phoenix.incubator.apache.org Delivered-To: mailing list commits@phoenix.incubator.apache.org Received: (qmail 88854 invoked by uid 99); 5 Mar 2014 22:55:41 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Mar 2014 22:55:41 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO mail.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 05 Mar 2014 22:55:37 +0000 Received: (qmail 86630 invoked by uid 99); 5 Mar 2014 22:54:49 -0000 Received: from tyr.zones.apache.org (HELO tyr.zones.apache.org) (140.211.11.114) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Mar 2014 22:54:49 +0000 Received: by tyr.zones.apache.org (Postfix, from userid 65534) id B11CA937E01; Wed, 5 Mar 2014 22:54:47 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: jeffreyz@apache.org To: commits@phoenix.incubator.apache.org Date: Wed, 05 Mar 2014 22:55:11 -0000 Message-Id: In-Reply-To: <6712aa6a6ff5432cb9414eaff469087b@git.apache.org> References: <6712aa6a6ff5432cb9414eaff469087b@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [25/50] [abbrv] git commit: PHOENIX-53 Replace CSV loader with Apache Commons CSV loader (JamesViolette) X-Virus-Checked: Checked by ClamAV on apache.org PHOENIX-53 Replace CSV loader with Apache Commons CSV loader (JamesViolette) Project: http://git-wip-us.apache.org/repos/asf/incubator-phoenix/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-phoenix/commit/738619db Tree: http://git-wip-us.apache.org/repos/asf/incubator-phoenix/tree/738619db Diff: http://git-wip-us.apache.org/repos/asf/incubator-phoenix/diff/738619db Branch: refs/heads/4.0 Commit: 738619db474c79fd37a6648914b2257602fbd528 Parents: 3ed0f61 Author: James Taylor Authored: Tue Mar 4 11:23:54 2014 -0800 Committer: James Taylor Committed: Tue Mar 4 11:23:54 2014 -0800 ---------------------------------------------------------------------- phoenix-core/lib/commons-csv-1.0-SNAPSHOT.jar | Bin 32446 -> 0 bytes phoenix-core/pom.xml | 7 - .../java/org/apache/commons/csv/Assertions.java | 36 + .../java/org/apache/commons/csv/CSVFormat.java | 884 +++++++++++++++++++ .../java/org/apache/commons/csv/CSVParser.java | 465 ++++++++++ .../java/org/apache/commons/csv/CSVPrinter.java | 427 +++++++++ .../java/org/apache/commons/csv/CSVRecord.java | 224 +++++ .../java/org/apache/commons/csv/Constants.java | 68 ++ .../commons/csv/ExtendedBufferedReader.java | 178 ++++ .../main/java/org/apache/commons/csv/Lexer.java | 431 +++++++++ .../main/java/org/apache/commons/csv/Quote.java | 48 + .../main/java/org/apache/commons/csv/Token.java | 75 ++ .../org/apache/commons/csv/package-info.java | 82 ++ .../apache/phoenix/util/CSVCommonsLoader.java | 5 - .../phoenix/end2end/CSVCommonsLoaderTest.java | 1 - 15 files changed, 2918 insertions(+), 13 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-phoenix/blob/738619db/phoenix-core/lib/commons-csv-1.0-SNAPSHOT.jar ---------------------------------------------------------------------- diff --git a/phoenix-core/lib/commons-csv-1.0-SNAPSHOT.jar b/phoenix-core/lib/commons-csv-1.0-SNAPSHOT.jar deleted file mode 100644 index b83be56..0000000 Binary files a/phoenix-core/lib/commons-csv-1.0-SNAPSHOT.jar and /dev/null differ http://git-wip-us.apache.org/repos/asf/incubator-phoenix/blob/738619db/phoenix-core/pom.xml ---------------------------------------------------------------------- diff --git a/phoenix-core/pom.xml b/phoenix-core/pom.xml index bcd1cff..1e83ad6 100644 --- a/phoenix-core/pom.xml +++ b/phoenix-core/pom.xml @@ -197,13 +197,6 @@ org.antlr antlr-runtime - - org.apache.commons.csv - commons-csv - 1.0-SNAPSHOT - system - ${project.basedir}/lib/commons-csv-1.0-SNAPSHOT.jar - jline jline http://git-wip-us.apache.org/repos/asf/incubator-phoenix/blob/738619db/phoenix-core/src/main/java/org/apache/commons/csv/Assertions.java ---------------------------------------------------------------------- diff --git a/phoenix-core/src/main/java/org/apache/commons/csv/Assertions.java b/phoenix-core/src/main/java/org/apache/commons/csv/Assertions.java new file mode 100644 index 0000000..63c330a --- /dev/null +++ b/phoenix-core/src/main/java/org/apache/commons/csv/Assertions.java @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.commons.csv; + +/** + * Utility class for input parameter validation + * + * @version $Id: Assertions.java 1559908 2014-01-21 02:44:30Z ggregory $ + */ +final class Assertions { + + private Assertions() { + // can not be instantiated + } + + public static void notNull(final Object parameter, final String parameterName) { + if (parameter == null) { + throw new IllegalArgumentException("Parameter '" + parameterName + "' must not be null!"); + } + } +} http://git-wip-us.apache.org/repos/asf/incubator-phoenix/blob/738619db/phoenix-core/src/main/java/org/apache/commons/csv/CSVFormat.java ---------------------------------------------------------------------- diff --git a/phoenix-core/src/main/java/org/apache/commons/csv/CSVFormat.java b/phoenix-core/src/main/java/org/apache/commons/csv/CSVFormat.java new file mode 100644 index 0000000..88c2a7f --- /dev/null +++ b/phoenix-core/src/main/java/org/apache/commons/csv/CSVFormat.java @@ -0,0 +1,884 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.commons.csv; + +import static org.apache.commons.csv.Constants.BACKSLASH; +import static org.apache.commons.csv.Constants.COMMA; +import static org.apache.commons.csv.Constants.CR; +import static org.apache.commons.csv.Constants.CRLF; +import static org.apache.commons.csv.Constants.DOUBLE_QUOTE_CHAR; +import static org.apache.commons.csv.Constants.LF; +import static org.apache.commons.csv.Constants.TAB; + +import java.io.IOException; +import java.io.Reader; +import java.io.Serializable; +import java.io.StringWriter; +import java.util.Arrays; +import java.util.HashSet; +import java.util.Set; + +/** + * Specifies the format of a CSV file and parses input. + * + *

Using predefined formats

+ * + *

+ * You can use one of the predefined formats: + *

+ * + *
    + *
  • {@link #DEFAULT}
  • + *
  • {@link #EXCEL}
  • + *
  • {@link #MYSQL}
  • + *
  • {@link #RFC4180}
  • + *
  • {@link #TDF}
  • + *
+ * + *

+ * For example: + *

+ * + *
+ * CSVParser parser = CSVFormat.EXCEL.parse(reader);
+ * 
+ * + *

+ * The {@link CSVRecord} provides static methods to parse other input types, for example: + *

+ * + *
CSVParser parser = CSVFormat.parseFile(file, CSVFormat.EXCEL);
+ * + *

Defining formats

+ * + *

+ * You can extend a format by calling the {@code with} methods. For example: + *

+ * + *
+ * CSVFormat.EXCEL
+ *   .withNullString("N/A")
+ *   .withIgnoreSurroundingSpaces(true);
+ * 
+ * + *

Defining column names

+ * + *

+ * To define the column names you want to use to access records, write: + *

+ * + *
+ * CSVFormat.EXCEL.withHeader("Col1", "Col2", "Col3");
+ * 
+ * + *

+ * Calling {@link #withHeader(String...)} let's you use the given names to address values in a {@link CSVRecord}, and + * assumes that your CSV source does not contain a first record that also defines column names. + * + * If it does, then you are overriding this metadata with your names and you should skip the first record by calling + * {@link #withSkipHeaderRecord(boolean)} with {@code true}. + *

+ * + *

Parsing

+ * + *

+ * You can use a format directly to parse a reader. For example, to parse an Excel file with columns header, write: + *

+ * + *
+ * Reader in = ...;
+ * CSVFormat.EXCEL.withHeader("Col1", "Col2", "Col3").parse(in);
+ * 
+ * + *

+ * For other input types, like resources, files, and URLs, use the static methods on {@link CSVParser}. + *

+ * + *

Referencing columns safely

+ * + *

+ * If your source contains a header record, you can simplify your code and safely reference columns, + * by using {@link #withHeader(String...)} with no arguments: + *

+ * + *
+ * CSVFormat.EXCEL.withHeader();
+ * 
+ * + *

+ * This causes the parser to read the first record and use its values as column names. + * + * Then, call one of the {@link CSVRecord} get method that takes a String column name argument: + *

+ * + *
+ * String value = record.get("Col1");
+ * 
+ * + *

+ * This makes your code impervious to changes in column order in the CSV file. + *

+ * + *

Notes

+ * + *

+ * This class is immutable. + *

+ * + * @version $Id: CSVFormat.java 1559908 2014-01-21 02:44:30Z ggregory $ + */ +public final class CSVFormat implements Serializable { + + private static final long serialVersionUID = 1L; + + private final char delimiter; + private final Character quoteChar; // null if quoting is disabled + private final Quote quotePolicy; + private final Character commentStart; // null if commenting is disabled + private final Character escape; // null if escaping is disabled + private final boolean ignoreSurroundingSpaces; // Should leading/trailing spaces be ignored around values? + private final boolean ignoreEmptyLines; + private final String recordSeparator; // for outputs + private final String nullString; // the string to be used for null values + private final String[] header; + private final boolean skipHeaderRecord; + + /** + * Standard comma separated format, as for {@link #RFC4180} but allowing empty lines. + *

RFC 4180:

+ *
    + *
  • withDelimiter(',')
  • + *
  • withQuoteChar('"')
  • + *
  • withRecordSeparator(CRLF)
  • + *
+ *

Additional:

+ *
    + *
  • withIgnoreEmptyLines(true)
  • + *
+ */ + public static final CSVFormat DEFAULT = new CSVFormat(COMMA, DOUBLE_QUOTE_CHAR, null, null, null, + false, true, CRLF, null, null, false); + + /** + * Comma separated format as defined by RFC 4180. + *

RFC 4180:

+ *
    + *
  • withDelimiter(',')
  • + *
  • withQuoteChar('"')
  • + *
  • withRecordSeparator(CRLF)
  • + *
+ */ + public static final CSVFormat RFC4180 = DEFAULT.withIgnoreEmptyLines(false); + + /** + * Excel file format (using a comma as the value delimiter). Note that the actual value delimiter used by Excel is + * locale dependent, it might be necessary to customize this format to accommodate to your regional settings. + *

+ * For example for parsing or generating a CSV file on a French system the following format will be used: + * + *

+     * CSVFormat fmt = CSVFormat.newBuilder(EXCEL).withDelimiter(';');
+     * 
+ * Settings are: + *
    + *
  • withDelimiter(',')
  • + *
  • withQuoteChar('"')
  • + *
  • withRecordSeparator(CRLF)
  • + *
+ * Note: this is currently the same as RFC4180 + */ + public static final CSVFormat EXCEL = DEFAULT.withIgnoreEmptyLines(false); + + /** Tab-delimited format, with quote; leading and trailing spaces ignored. */ + public static final CSVFormat TDF = + DEFAULT + .withDelimiter(TAB) + .withIgnoreSurroundingSpaces(true); + + /** + * Default MySQL format used by the SELECT INTO OUTFILE and LOAD DATA INFILE operations. This is + * a tab-delimited format with a LF character as the line separator. Values are not quoted and special characters + * are escaped with '\'. + * + * @see + * http://dev.mysql.com/doc/refman/5.1/en/load-data.html + */ + public static final CSVFormat MYSQL = + DEFAULT + .withDelimiter(TAB) + .withEscape(BACKSLASH) + .withIgnoreEmptyLines(false) + .withQuoteChar(null) + .withRecordSeparator(LF); + + /** + * Returns true if the given character is a line break character. + * + * @param c + * the character to check + * + * @return true if c is a line break character + */ + private static boolean isLineBreak(final char c) { + return c == LF || c == CR; + } + + /** + * Returns true if the given character is a line break character. + * + * @param c + * the character to check, may be null + * + * @return true if c is a line break character (and not null) + */ + private static boolean isLineBreak(final Character c) { + return c != null && isLineBreak(c.charValue()); + } + + /** + * Creates a new CSV format with the specified delimiter. + * + * @param delimiter + * the char used for value separation, must not be a line break character + * @return a new CSV format. + * @throws IllegalArgumentException if the delimiter is a line break character + */ + public static CSVFormat newFormat(final char delimiter) { + return new CSVFormat(delimiter, null, null, null, null, false, false, null, null, null, false); + } + + /** + * Creates a customized CSV format. + * + * @param delimiter + * the char used for value separation, must not be a line break character + * @param quoteChar + * the Character used as value encapsulation marker, may be {@code null} to disable + * @param quotePolicy + * the quote policy + * @param commentStart + * the Character used for comment identification, may be {@code null} to disable + * @param escape + * the Character used to escape special characters in values, may be {@code null} to disable + * @param ignoreSurroundingSpaces + * true when whitespaces enclosing values should be ignored + * @param ignoreEmptyLines + * true when the parser should skip empty lines + * @param recordSeparator + * the line separator to use for output + * @param nullString + * the line separator to use for output + * @param header + * the header + * @param skipHeaderRecord TODO + * @throws IllegalArgumentException if the delimiter is a line break character + */ + // package protected to give access without needing a synthetic accessor + CSVFormat(final char delimiter, final Character quoteChar, + final Quote quotePolicy, final Character commentStart, + final Character escape, final boolean ignoreSurroundingSpaces, + final boolean ignoreEmptyLines, final String recordSeparator, + final String nullString, final String[] header, final boolean skipHeaderRecord) { + if (isLineBreak(delimiter)) { + throw new IllegalArgumentException("The delimiter cannot be a line break"); + } + this.delimiter = delimiter; + this.quoteChar = quoteChar; + this.quotePolicy = quotePolicy; + this.commentStart = commentStart; + this.escape = escape; + this.ignoreSurroundingSpaces = ignoreSurroundingSpaces; + this.ignoreEmptyLines = ignoreEmptyLines; + this.recordSeparator = recordSeparator; + this.nullString = nullString; + this.header = header == null ? null : header.clone(); + this.skipHeaderRecord = skipHeaderRecord; + } + + @Override + public boolean equals(final Object obj) { + if (this == obj) { + return true; + } + if (obj == null) { + return false; + } + if (getClass() != obj.getClass()) { + return false; + } + + final CSVFormat other = (CSVFormat) obj; + if (delimiter != other.delimiter) { + return false; + } + if (quotePolicy != other.quotePolicy) { + return false; + } + if (quoteChar == null) { + if (other.quoteChar != null) { + return false; + } + } else if (!quoteChar.equals(other.quoteChar)) { + return false; + } + if (commentStart == null) { + if (other.commentStart != null) { + return false; + } + } else if (!commentStart.equals(other.commentStart)) { + return false; + } + if (escape == null) { + if (other.escape != null) { + return false; + } + } else if (!escape.equals(other.escape)) { + return false; + } + if (!Arrays.equals(header, other.header)) { + return false; + } + if (ignoreSurroundingSpaces != other.ignoreSurroundingSpaces) { + return false; + } + if (ignoreEmptyLines != other.ignoreEmptyLines) { + return false; + } + if (recordSeparator == null) { + if (other.recordSeparator != null) { + return false; + } + } else if (!recordSeparator.equals(other.recordSeparator)) { + return false; + } + return true; + } + + /** + * Formats the specified values. + * + * @param values + * the values to format + * @return the formatted values + */ + public String format(final Object... values) { + final StringWriter out = new StringWriter(); + try { + new CSVPrinter(out, this).printRecord(values); + return out.toString().trim(); + } catch (final IOException e) { + // should not happen because a StringWriter does not do IO. + throw new IllegalStateException(e); + } + } + + /** + * Returns the character marking the start of a line comment. + * + * @return the comment start marker, may be {@code null} + */ + public Character getCommentStart() { + return commentStart; + } + + /** + * Returns the character delimiting the values (typically ';', ',' or '\t'). + * + * @return the delimiter character + */ + public char getDelimiter() { + return delimiter; + } + + /** + * Returns the escape character. + * + * @return the escape character, may be {@code null} + */ + public Character getEscape() { + return escape; + } + + /** + * Returns a copy of the header array. + * + * @return a copy of the header array + */ + public String[] getHeader() { + return header != null ? header.clone() : null; + } + + /** + * Specifies whether empty lines between records are ignored when parsing input. + * + * @return true if empty lines between records are ignored, false if they are turned into empty + * records. + */ + public boolean getIgnoreEmptyLines() { + return ignoreEmptyLines; + } + + /** + * Specifies whether spaces around values are ignored when parsing input. + * + * @return true if spaces around values are ignored, false if they are treated as part of the + * value. + */ + public boolean getIgnoreSurroundingSpaces() { + return ignoreSurroundingSpaces; + } + + /** + * Gets the String to convert to and from {@code null}. + *
    + *
  • + * Reading: Converts strings equal to the given {@code nullString} to {@code null} when reading + * records. + *
  • + *
  • + * Writing: Writes {@code null} as the given {@code nullString} when writing records.
  • + *
+ * + * @return the String to convert to and from {@code null}. No substitution occurs if {@code null} + */ + public String getNullString() { + return nullString; + } + + /** + * Returns the character used to encapsulate values containing special characters. + * + * @return the quoteChar character, may be {@code null} + */ + public Character getQuoteChar() { + return quoteChar; + } + + /** + * Returns the quote policy output fields. + * + * @return the quote policy + */ + public Quote getQuotePolicy() { + return quotePolicy; + } + + /** + * Returns the line separator delimiting output records. + * + * @return the line separator + */ + public String getRecordSeparator() { + return recordSeparator; + } + + /** + * Returns whether to skip the header record. + * + * @return whether to skip the header record. + */ + public boolean getSkipHeaderRecord() { + return skipHeaderRecord; + } + + @Override + public int hashCode() + { + final int prime = 31; + int result = 1; + + result = prime * result + delimiter; + result = prime * result + ((quotePolicy == null) ? 0 : quotePolicy.hashCode()); + result = prime * result + ((quoteChar == null) ? 0 : quoteChar.hashCode()); + result = prime * result + ((commentStart == null) ? 0 : commentStart.hashCode()); + result = prime * result + ((escape == null) ? 0 : escape.hashCode()); + result = prime * result + (ignoreSurroundingSpaces ? 1231 : 1237); + result = prime * result + (ignoreEmptyLines ? 1231 : 1237); + result = prime * result + ((recordSeparator == null) ? 0 : recordSeparator.hashCode()); + result = prime * result + Arrays.hashCode(header); + return result; + } + + /** + * Specifies whether comments are supported by this format. + * + * Note that the comment introducer character is only recognized at the start of a line. + * + * @return true is comments are supported, false otherwise + */ + public boolean isCommentingEnabled() { + return commentStart != null; + } + + /** + * Returns whether escape are being processed. + * + * @return {@code true} if escapes are processed + */ + public boolean isEscaping() { + return escape != null; + } + + /** + * Returns whether a nullString has been defined. + * + * @return {@code true} if a nullString is defined + */ + public boolean isNullHandling() { + return nullString != null; + } + + /** + * Returns whether a quoteChar has been defined. + * + * @return {@code true} if a quoteChar is defined + */ + public boolean isQuoting() { + return quoteChar != null; + } + + /** + * Parses the specified content. + * + *

+ * See also the various static parse methods on {@link CSVParser}. + *

+ * + * @param in + * the input stream + * @return a parser over a stream of {@link CSVRecord}s. + * @throws IOException + * If an I/O error occurs + */ + public CSVParser parse(final Reader in) throws IOException { + return new CSVParser(in, this); + } + + @Override + public String toString() { + final StringBuilder sb = new StringBuilder(); + sb.append("Delimiter=<").append(delimiter).append('>'); + if (isEscaping()) { + sb.append(' '); + sb.append("Escape=<").append(escape).append('>'); + } + if (isQuoting()) { + sb.append(' '); + sb.append("QuoteChar=<").append(quoteChar).append('>'); + } + if (isCommentingEnabled()) { + sb.append(' '); + sb.append("CommentStart=<").append(commentStart).append('>'); + } + if (isNullHandling()) { + sb.append(' '); + sb.append("NullString=<").append(nullString).append('>'); + } + if(recordSeparator != null) { + sb.append(' '); + sb.append("RecordSeparator=<").append(recordSeparator).append('>'); + } + if (getIgnoreEmptyLines()) { + sb.append(" EmptyLines:ignored"); + } + if (getIgnoreSurroundingSpaces()) { + sb.append(" SurroundingSpaces:ignored"); + } + sb.append(" SkipHeaderRecord:").append(skipHeaderRecord); + if (header != null) { + sb.append(' '); + sb.append("Header:").append(Arrays.toString(header)); + } + return sb.toString(); + } + + /** + * Verifies the consistency of the parameters and throws an IllegalStateException if necessary. + * + * @throws IllegalStateException + */ + void validate() throws IllegalStateException { + if (quoteChar != null && delimiter == quoteChar.charValue()) { + throw new IllegalStateException( + "The quoteChar character and the delimiter cannot be the same ('" + quoteChar + "')"); + } + + if (escape != null && delimiter == escape.charValue()) { + throw new IllegalStateException( + "The escape character and the delimiter cannot be the same ('" + escape + "')"); + } + + if (commentStart != null && delimiter == commentStart.charValue()) { + throw new IllegalStateException( + "The comment start character and the delimiter cannot be the same ('" + commentStart + "')"); + } + + if (quoteChar != null && quoteChar.equals(commentStart)) { + throw new IllegalStateException( + "The comment start character and the quoteChar cannot be the same ('" + commentStart + "')"); + } + + if (escape != null && escape.equals(commentStart)) { + throw new IllegalStateException( + "The comment start and the escape character cannot be the same ('" + commentStart + "')"); + } + + if (escape == null && quotePolicy == Quote.NONE) { + throw new IllegalStateException("No quotes mode set but no escape character is set"); + } + + if (header != null) { + final Set set = new HashSet(header.length); + set.addAll(Arrays.asList(header)); + if (set.size() != header.length) { + throw new IllegalStateException("The header contains duplicate names: " + Arrays.toString(header)); + } + } + } + + /** + * Sets the comment start marker of the format to the specified character. + * + * Note that the comment start character is only recognized at the start of a line. + * + * @param commentStart + * the comment start marker + * @return A new CSVFormat that is equal to this one but with the specified character as the comment start marker + * @throws IllegalArgumentException + * thrown if the specified character is a line break + */ + public CSVFormat withCommentStart(final char commentStart) { + return withCommentStart(Character.valueOf(commentStart)); + } + + /** + * Sets the comment start marker of the format to the specified character. + * + * Note that the comment start character is only recognized at the start of a line. + * + * @param commentStart + * the comment start marker, use {@code null} to disable + * @return A new CSVFormat that is equal to this one but with the specified character as the comment start marker + * @throws IllegalArgumentException + * thrown if the specified character is a line break + */ + public CSVFormat withCommentStart(final Character commentStart) { + if (isLineBreak(commentStart)) { + throw new IllegalArgumentException("The comment start character cannot be a line break"); + } + return new CSVFormat(delimiter, quoteChar, quotePolicy, commentStart, escape, + ignoreSurroundingSpaces, ignoreEmptyLines, recordSeparator, nullString, header, skipHeaderRecord); + } + + /** + * Sets the delimiter of the format to the specified character. + * + * @param delimiter + * the delimiter character + * @return A new CSVFormat that is equal to this with the specified character as delimiter + * @throws IllegalArgumentException + * thrown if the specified character is a line break + */ + public CSVFormat withDelimiter(final char delimiter) { + if (isLineBreak(delimiter)) { + throw new IllegalArgumentException("The delimiter cannot be a line break"); + } + return new CSVFormat(delimiter, quoteChar, quotePolicy, commentStart, escape, + ignoreSurroundingSpaces, ignoreEmptyLines, recordSeparator, nullString, header, skipHeaderRecord); + } + + /** + * Sets the escape character of the format to the specified character. + * + * @param escape + * the escape character + * @return A new CSVFormat that is equal to his but with the specified character as the escape character + * @throws IllegalArgumentException + * thrown if the specified character is a line break + */ + public CSVFormat withEscape(final char escape) { + return withEscape(Character.valueOf(escape)); + } + + /** + * Sets the escape character of the format to the specified character. + * + * @param escape + * the escape character, use {@code null} to disable + * @return A new CSVFormat that is equal to this but with the specified character as the escape character + * @throws IllegalArgumentException + * thrown if the specified character is a line break + */ + public CSVFormat withEscape(final Character escape) { + if (isLineBreak(escape)) { + throw new IllegalArgumentException("The escape character cannot be a line break"); + } + return new CSVFormat(delimiter, quoteChar, quotePolicy, commentStart, escape, + ignoreSurroundingSpaces, ignoreEmptyLines, recordSeparator, nullString, header, skipHeaderRecord); + } + + /** + * Sets the header of the format. The header can either be parsed automatically from the input file with: + * + *
+     * CSVFormat format = aformat.withHeader();
+ * + * or specified manually with: + * + *
+     * CSVFormat format = aformat.withHeader("name", "email", "phone");
+ * + * @param header + * the header, null if disabled, empty if parsed automatically, user specified otherwise. + * + * @return A new CSVFormat that is equal to this but with the specified header + * @see #withSkipHeaderRecord(boolean) + */ + public CSVFormat withHeader(final String... header) { + return new CSVFormat(delimiter, quoteChar, quotePolicy, commentStart, escape, + ignoreSurroundingSpaces, ignoreEmptyLines, recordSeparator, nullString, header, skipHeaderRecord); + } + + /** + * Sets the empty line skipping behavior of the format. + * + * @param ignoreEmptyLines + * the empty line skipping behavior, true to ignore the empty lines between the records, + * false to translate empty lines to empty records. + * @return A new CSVFormat that is equal to this but with the specified empty line skipping behavior. + */ + public CSVFormat withIgnoreEmptyLines(final boolean ignoreEmptyLines) { + return new CSVFormat(delimiter, quoteChar, quotePolicy, commentStart, escape, + ignoreSurroundingSpaces, ignoreEmptyLines, recordSeparator, nullString, header, skipHeaderRecord); + } + + /** + * Sets the trimming behavior of the format. + * + * @param ignoreSurroundingSpaces + * the trimming behavior, true to remove the surrounding spaces, false to leave the + * spaces as is. + * @return A new CSVFormat that is equal to this but with the specified trimming behavior. + */ + public CSVFormat withIgnoreSurroundingSpaces(final boolean ignoreSurroundingSpaces) { + return new CSVFormat(delimiter, quoteChar, quotePolicy, commentStart, escape, + ignoreSurroundingSpaces, ignoreEmptyLines, recordSeparator, nullString, header, skipHeaderRecord); + } + + /** + * Performs conversions to and from null for strings on input and output. + *
    + *
  • + * Reading: Converts strings equal to the given {@code nullString} to {@code null} when reading + * records.
  • + *
  • + * Writing: Writes {@code null} as the given {@code nullString} when writing records.
  • + *
+ * + * @param nullString + * the String to convert to and from {@code null}. No substitution occurs if {@code null} + * + * @return A new CSVFormat that is equal to this but with the specified null conversion string. + */ + public CSVFormat withNullString(final String nullString) { + return new CSVFormat(delimiter, quoteChar, quotePolicy, commentStart, escape, + ignoreSurroundingSpaces, ignoreEmptyLines, recordSeparator, nullString, header, skipHeaderRecord); + } + + /** + * Sets the quoteChar of the format to the specified character. + * + * @param quoteChar + * the quoteChar character + * @return A new CSVFormat that is equal to this but with the specified character as quoteChar + * @throws IllegalArgumentException + * thrown if the specified character is a line break + */ + public CSVFormat withQuoteChar(final char quoteChar) { + return withQuoteChar(Character.valueOf(quoteChar)); + } + + /** + * Sets the quoteChar of the format to the specified character. + * + * @param quoteChar + * the quoteChar character, use {@code null} to disable + * @return A new CSVFormat that is equal to this but with the specified character as quoteChar + * @throws IllegalArgumentException + * thrown if the specified character is a line break + */ + public CSVFormat withQuoteChar(final Character quoteChar) { + if (isLineBreak(quoteChar)) { + throw new IllegalArgumentException("The quoteChar cannot be a line break"); + } + return new CSVFormat(delimiter, quoteChar, quotePolicy, commentStart, escape, + ignoreSurroundingSpaces, ignoreEmptyLines, recordSeparator, nullString, header, skipHeaderRecord); + } + + /** + * Sets the output quote policy of the format to the specified value. + * + * @param quotePolicy + * the quote policy to use for output. + * + * @return A new CSVFormat that is equal to this but with the specified quote policy + */ + public CSVFormat withQuotePolicy(final Quote quotePolicy) { + return new CSVFormat(delimiter, quoteChar, quotePolicy, commentStart, escape, + ignoreSurroundingSpaces, ignoreEmptyLines, recordSeparator, nullString, header, skipHeaderRecord); + } + + /** + * Sets the record separator of the format to the specified character. + * + * @param recordSeparator + * the record separator to use for output. + * + * @return A new CSVFormat that is equal to this but with the the specified output record separator + */ + public CSVFormat withRecordSeparator(final char recordSeparator) { + return withRecordSeparator(String.valueOf(recordSeparator)); + } + + /** + * Sets the record separator of the format to the specified String. + * + * @param recordSeparator + * the record separator to use for output. + * + * @return A new CSVFormat that is equal to this but with the the specified output record separator + */ + public CSVFormat withRecordSeparator(final String recordSeparator) { + return new CSVFormat(delimiter, quoteChar, quotePolicy, commentStart, escape, + ignoreSurroundingSpaces, ignoreEmptyLines, recordSeparator, nullString, header, skipHeaderRecord); + } + + /** + * Sets whether to skip the header record. + * + * @param skipHeaderRecord + * whether to skip the header record. + * + * @return A new CSVFormat that is equal to this but with the the specified skipHeaderRecord setting. + * @see #withHeader(String...) + */ + public CSVFormat withSkipHeaderRecord(final boolean skipHeaderRecord) { + return new CSVFormat(delimiter, quoteChar, quotePolicy, commentStart, escape, + ignoreSurroundingSpaces, ignoreEmptyLines, recordSeparator, nullString, header, skipHeaderRecord); + } +} http://git-wip-us.apache.org/repos/asf/incubator-phoenix/blob/738619db/phoenix-core/src/main/java/org/apache/commons/csv/CSVParser.java ---------------------------------------------------------------------- diff --git a/phoenix-core/src/main/java/org/apache/commons/csv/CSVParser.java b/phoenix-core/src/main/java/org/apache/commons/csv/CSVParser.java new file mode 100644 index 0000000..1903bb9 --- /dev/null +++ b/phoenix-core/src/main/java/org/apache/commons/csv/CSVParser.java @@ -0,0 +1,465 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.commons.csv; + +import static org.apache.commons.csv.Token.Type.TOKEN; + +import java.io.Closeable; +import java.io.File; +import java.io.FileReader; +import java.io.IOException; +import java.io.InputStreamReader; +import java.io.Reader; +import java.io.StringReader; +import java.net.URL; +import java.nio.charset.Charset; +import java.util.ArrayList; +import java.util.Iterator; +import java.util.LinkedHashMap; +import java.util.List; +import java.util.Map; +import java.util.NoSuchElementException; + +/** + * Parses CSV files according to the specified format. + * + * Because CSV appears in many different dialects, the parser supports many formats by allowing the + * specification of a {@link CSVFormat}. + * + * The parser works record wise. It is not possible to go back, once a record has been parsed from the input stream. + * + *

Creating instances

+ * There are several static factory methods that can be used to create instances for various types of resources: + *

+ *

    + *
  • {@link #parse(java.io.File, CSVFormat)}
  • + *
  • {@link #parse(String, CSVFormat)}
  • + *
  • {@link #parse(java.net.URL, java.nio.charset.Charset, CSVFormat)}
  • + *
+ *

+ *

+ * Alternatively parsers can also be created by passing a {@link Reader} directly to the sole constructor. + * + * For those who like fluent APIs, parsers can be created using {@link CSVFormat#parse(java.io.Reader)} as a shortcut: + *

+ *
+ * for(CSVRecord record : CSVFormat.EXCEL.parse(in)) {
+ *     ...
+ * }
+ * 
+ * + *

Parsing record wise

+ *

+ * To parse a CSV input from a file, you write: + *

+ * + *
+ * File csvData = new File("/path/to/csv");
+ * CSVParser parser = CSVParser.parse(csvData, CSVFormat.RFC4180);
+ * for (CSVRecord csvRecord : parser) {
+ *     ...
+ * }
+ * 
+ * + *

+ * This will read the parse the contents of the file using the + * RFC 4180 format. + *

+ * + *

+ * To parse CSV input in a format like Excel, you write: + *

+ * + *
+ * CSVParser parser = CSVParser.parse(csvData, CSVFormat.EXCEL);
+ * for (CSVRecord csvRecord : parser) {
+ *     ...
+ * }
+ * 
+ * + *

+ * If the predefined formats don't match the format at hands, custom formats can be defined. More information about + * customising CSVFormats is available in {@link CSVFormat CSVFormat JavaDoc}. + *

+ * + *

Parsing into memory

+ *

+ * If parsing record wise is not desired, the contents of the input can be read completely into memory. + *

+ * + *
+ * Reader in = new StringReader("a;b\nc;d");
+ * CSVParser parser = new CSVParser(in, CSVFormat.EXCEL);
+ * List<CSVRecord> list = parser.getRecords();
+ * 
+ * + *

+ * There are two constraints that have to be kept in mind: + *

+ * + *

+ *

    + *
  1. Parsing into memory starts at the current position of the parser. If you have already parsed records from + * the input, those records will not end up in the in memory representation of your CSV data.
  2. + *
  3. Parsing into memory may consume a lot of system resources depending on the input. For example if you're + * parsing a 150MB file of CSV data the contents will be read completely into memory.
  4. + *
+ *

+ * + *

Notes

+ *

+ * Internal parser state is completely covered by the format and the reader-state. + *

+ * + * @version $Id: CSVParser.java 1559908 2014-01-21 02:44:30Z ggregory $ + * + * @see package documentation for more details + */ +public final class CSVParser implements Iterable, Closeable { + + /** + * Creates a parser for the given {@link File}. + * + * @param file + * a CSV file. Must not be null. + * @param format + * the CSVFormat used for CSV parsing. Must not be null. + * @return a new parser + * @throws IllegalArgumentException + * If the parameters of the format are inconsistent or if either file or format are null. + * @throws IOException + * If an I/O error occurs + */ + public static CSVParser parse(final File file, final CSVFormat format) throws IOException { + Assertions.notNull(file, "file"); + Assertions.notNull(format, "format"); + + return new CSVParser(new FileReader(file), format); + } + + /** + * Creates a parser for the given {@link String}. + * + * @param string + * a CSV string. Must not be null. + * @param format + * the CSVFormat used for CSV parsing. Must not be null. + * @return a new parser + * @throws IllegalArgumentException + * If the parameters of the format are inconsistent or if either string or format are null. + * @throws IOException + * If an I/O error occurs + */ + public static CSVParser parse(final String string, final CSVFormat format) throws IOException { + Assertions.notNull(string, "string"); + Assertions.notNull(format, "format"); + + return new CSVParser(new StringReader(string), format); + } + + /** + * Creates a parser for the given URL. + * + *

+ * If you do not read all records from the given {@code url}, you should call {@link #close()} on the parser, unless + * you close the {@code url}. + *

+ * + * @param url + * a URL. Must not be null. + * @param charset + * the charset for the resource. Must not be null. + * @param format + * the CSVFormat used for CSV parsing. Must not be null. + * @return a new parser + * @throws IllegalArgumentException + * If the parameters of the format are inconsistent or if either url, charset or format are null. + * @throws IOException + * If an I/O error occurs + */ + public static CSVParser parse(final URL url, final Charset charset, final CSVFormat format) throws IOException { + Assertions.notNull(url, "url"); + Assertions.notNull(charset, "charset"); + Assertions.notNull(format, "format"); + + return new CSVParser(new InputStreamReader(url.openStream(), + charset == null ? Charset.forName("UTF-8") : charset), format); + } + + // the following objects are shared to reduce garbage + + private final CSVFormat format; + + /** A mapping of column names to column indices */ + private final Map headerMap; + + private final Lexer lexer; + + /** A record buffer for getRecord(). Grows as necessary and is reused. */ + private final List record = new ArrayList(); + + private long recordNumber; + + private final Token reusableToken = new Token(); + + /** + * Customized CSV parser using the given {@link CSVFormat} + * + *

+ * If you do not read all records from the given {@code reader}, you should call {@link #close()} on the parser, + * unless you close the {@code reader}. + *

+ * + * @param reader + * a Reader containing CSV-formatted input. Must not be null. + * @param format + * the CSVFormat used for CSV parsing. Must not be null. + * @throws IllegalArgumentException + * If the parameters of the format are inconsistent or if either reader or format are null. + * @throws IOException + * If an I/O error occurs + */ + public CSVParser(final Reader reader, final CSVFormat format) throws IOException { + Assertions.notNull(reader, "reader"); + Assertions.notNull(format, "format"); + + format.validate(); + this.format = format; + this.lexer = new Lexer(format, new ExtendedBufferedReader(reader)); + this.headerMap = this.initializeHeader(); + } + + private void addRecordValue() { + final String input = this.reusableToken.content.toString(); + final String nullString = this.format.getNullString(); + if (nullString == null) { + this.record.add(input); + } else { + this.record.add(input.equalsIgnoreCase(nullString) ? null : input); + } + } + + /** + * Closes resources. + * + * @throws IOException + * If an I/O error occurs + */ + public void close() throws IOException { + if (this.lexer != null) { + this.lexer.close(); + } + } + + /** + * Returns the current line number in the input stream. + *

+ * ATTENTION: If your CSV input has multi-line values, the returned number does not correspond to the record number. + * + * @return current line number + */ + public long getCurrentLineNumber() { + return this.lexer.getCurrentLineNumber(); + } + + /** + * Returns a copy of the header map that iterates in column order. + *

+ * The map keys are column names. The map values are 0-based indices. + *

+ * @return a copy of the header map that iterates in column order. + */ + public Map getHeaderMap() { + return this.headerMap == null ? null : new LinkedHashMap(this.headerMap); + } + + /** + * Returns the current record number in the input stream. + *

+ * ATTENTION: If your CSV input has multi-line values, the returned number does not correspond to the line number. + * + * @return current line number + */ + public long getRecordNumber() { + return this.recordNumber; + } + + /** + * Parses the CSV input according to the given format and returns the content as a list of + * {@link CSVRecord CSVRecords}. + *

+ * The returned content starts at the current parse-position in the stream. + * + * @return list of {@link CSVRecord CSVRecords}, may be empty + * @throws IOException + * on parse error or input read-failure + */ + public List getRecords() throws IOException { + final List records = new ArrayList(); + CSVRecord rec; + while ((rec = this.nextRecord()) != null) { + records.add(rec); + } + return records; + } + + /** + * Initializes the name to index mapping if the format defines a header. + * + * @return null if the format has no header. + */ + private Map initializeHeader() throws IOException { + Map hdrMap = null; + final String[] formatHeader = this.format.getHeader(); + if (formatHeader != null) { + hdrMap = new LinkedHashMap(); + + String[] header = null; + if (formatHeader.length == 0) { + // read the header from the first line of the file + final CSVRecord nextRecord = this.nextRecord(); + if (nextRecord != null) { + header = nextRecord.values(); + } + } else { + if (this.format.getSkipHeaderRecord()) { + this.nextRecord(); + } + header = formatHeader; + } + + // build the name to index mappings + if (header != null) { + for (int i = 0; i < header.length; i++) { + hdrMap.put(header[i], Integer.valueOf(i)); + } + } + } + return hdrMap; + } + + public boolean isClosed() { + return this.lexer.isClosed(); + } + + /** + * Returns an iterator on the records. + * + *

IOExceptions occurring during the iteration are wrapped in a + * RuntimeException. + * If the parser is closed a call to {@code next()} will throw a + * NoSuchElementException.

+ */ + public Iterator iterator() { + return new Iterator() { + private CSVRecord current; + + private CSVRecord getNextRecord() { + try { + return CSVParser.this.nextRecord(); + } catch (final IOException e) { + // TODO: This is not great, throw an ISE instead? + throw new RuntimeException(e); + } + } + + public boolean hasNext() { + if (CSVParser.this.isClosed()) { + return false; + } + if (this.current == null) { + this.current = this.getNextRecord(); + } + + return this.current != null; + } + + public CSVRecord next() { + if (CSVParser.this.isClosed()) { + throw new NoSuchElementException("CSVParser has been closed"); + } + CSVRecord next = this.current; + this.current = null; + + if (next == null) { + // hasNext() wasn't called before + next = this.getNextRecord(); + if (next == null) { + throw new NoSuchElementException("No more CSV records available"); + } + } + + return next; + } + + public void remove() { + throw new UnsupportedOperationException(); + } + }; + } + + /** + * Parses the next record from the current point in the stream. + * + * @return the record as an array of values, or null if the end of the stream has been reached + * @throws IOException + * on parse error or input read-failure + */ + CSVRecord nextRecord() throws IOException { + CSVRecord result = null; + this.record.clear(); + StringBuilder sb = null; + do { + this.reusableToken.reset(); + this.lexer.nextToken(this.reusableToken); + switch (this.reusableToken.type) { + case TOKEN: + this.addRecordValue(); + break; + case EORECORD: + this.addRecordValue(); + break; + case EOF: + if (this.reusableToken.isReady) { + this.addRecordValue(); + } + break; + case INVALID: + throw new IOException("(line " + this.getCurrentLineNumber() + ") invalid parse sequence"); + case COMMENT: // Ignored currently + if (sb == null) { // first comment for this record + sb = new StringBuilder(); + } else { + sb.append(Constants.LF); + } + sb.append(this.reusableToken.content); + this.reusableToken.type = TOKEN; // Read another token + break; + } + } while (this.reusableToken.type == TOKEN); + + if (!this.record.isEmpty()) { + this.recordNumber++; + final String comment = sb == null ? null : sb.toString(); + result = new CSVRecord(this.record.toArray(new String[this.record.size()]), this.headerMap, comment, + this.recordNumber); + } + return result; + } + +} http://git-wip-us.apache.org/repos/asf/incubator-phoenix/blob/738619db/phoenix-core/src/main/java/org/apache/commons/csv/CSVPrinter.java ---------------------------------------------------------------------- diff --git a/phoenix-core/src/main/java/org/apache/commons/csv/CSVPrinter.java b/phoenix-core/src/main/java/org/apache/commons/csv/CSVPrinter.java new file mode 100644 index 0000000..8b98e50 --- /dev/null +++ b/phoenix-core/src/main/java/org/apache/commons/csv/CSVPrinter.java @@ -0,0 +1,427 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.commons.csv; + +import static org.apache.commons.csv.Constants.COMMENT; +import static org.apache.commons.csv.Constants.CR; +import static org.apache.commons.csv.Constants.LF; +import static org.apache.commons.csv.Constants.SP; + +import java.io.Closeable; +import java.io.Flushable; +import java.io.IOException; +import java.sql.ResultSet; +import java.sql.SQLException; + +/** + * Prints values in a CSV format. + * + * @version $Id: CSVPrinter.java 1560384 2014-01-22 15:27:35Z ggregory $ + */ +public final class CSVPrinter implements Flushable, Closeable { + + /** The place that the values get written. */ + private final Appendable out; + private final CSVFormat format; + + /** True if we just began a new record. */ + private boolean newRecord = true; + + /** + * Creates a printer that will print values to the given stream following the CSVFormat. + *

+ * Currently, only a pure encapsulation format or a pure escaping format is supported. Hybrid formats + * (encapsulation and escaping with a different character) are not supported. + * + * @param out + * stream to which to print. Must not be null. + * @param format + * the CSV format. Must not be null. + * @throws IllegalArgumentException + * thrown if the parameters of the format are inconsistent or if either out or format are null. + */ + public CSVPrinter(final Appendable out, final CSVFormat format) { + Assertions.notNull(out, "out"); + Assertions.notNull(format, "format"); + + this.out = out; + this.format = format; + this.format.validate(); + } + + // ====================================================== + // printing implementation + // ====================================================== + + public void close() throws IOException { + if (out instanceof Closeable) { + ((Closeable) out).close(); + } + } + + /** + * Flushes the underlying stream. + * + * @throws IOException + * If an I/O error occurs + */ + public void flush() throws IOException { + if (out instanceof Flushable) { + ((Flushable) out).flush(); + } + } + + /** + * Prints the string as the next value on the line. The value will be escaped or encapsulated as needed. + * + * @param value + * value to be output. + * @throws IOException + * If an I/O error occurs + */ + public void print(final Object value) throws IOException { + // null values are considered empty + String strValue; + if (value == null) { + final String nullString = format.getNullString(); + strValue = nullString == null ? Constants.EMPTY : nullString; + } else { + strValue = value.toString(); + } + this.print(value, strValue, 0, strValue.length()); + } + + private void print(final Object object, final CharSequence value, + final int offset, final int len) throws IOException { + if (!newRecord) { + out.append(format.getDelimiter()); + } + if (format.isQuoting()) { + // the original object is needed so can check for Number + printAndQuote(object, value, offset, len); + } else if (format.isEscaping()) { + printAndEscape(value, offset, len); + } else { + out.append(value, offset, offset + len); + } + newRecord = false; + } + + /* + * Note: must only be called if escaping is enabled, otherwise will generate NPE + */ + private void printAndEscape(final CharSequence value, final int offset, final int len) throws IOException { + int start = offset; + int pos = offset; + final int end = offset + len; + + final char delim = format.getDelimiter(); + final char escape = format.getEscape().charValue(); + + while (pos < end) { + char c = value.charAt(pos); + if (c == CR || c == LF || c == delim || c == escape) { + // write out segment up until this char + if (pos > start) { + out.append(value, start, pos); + } + if (c == LF) { + c = 'n'; + } else if (c == CR) { + c = 'r'; + } + + out.append(escape); + out.append(c); + + start = pos + 1; // start on the current char after this one + } + + pos++; + } + + // write last segment + if (pos > start) { + out.append(value, start, pos); + } + } + + /* + * Note: must only be called if quoting is enabled, otherwise will generate NPE + */ + // the original object is needed so can check for Number + private void printAndQuote(final Object object, final CharSequence value, + final int offset, final int len) throws IOException { + boolean quote = false; + int start = offset; + int pos = offset; + final int end = offset + len; + + final char delimChar = format.getDelimiter(); + final char quoteChar = format.getQuoteChar().charValue(); + + Quote quotePolicy = format.getQuotePolicy(); + if (quotePolicy == null) { + quotePolicy = Quote.MINIMAL; + } + switch (quotePolicy) { + case ALL: + quote = true; + break; + case NON_NUMERIC: + quote = !(object instanceof Number); + break; + case NONE: + // Use the existing escaping code + printAndEscape(value, offset, len); + return; + case MINIMAL: + if (len <= 0) { + // always quote an empty token that is the first + // on the line, as it may be the only thing on the + // line. If it were not quoted in that case, + // an empty line has no tokens. + if (newRecord) { + quote = true; + } + } else { + char c = value.charAt(pos); + + // Hmmm, where did this rule come from? + if (newRecord && (c < '0' || (c > '9' && c < 'A') || (c > 'Z' && c < 'a') || (c > 'z'))) { + quote = true; + // } else if (c == ' ' || c == '\f' || c == '\t') { + } else if (c <= COMMENT) { + // Some other chars at the start of a value caused the parser to fail, so for now + // encapsulate if we start in anything less than '#'. We are being conservative + // by including the default comment char too. + quote = true; + } else { + while (pos < end) { + c = value.charAt(pos); + if (c == LF || c == CR || c == quoteChar || c == delimChar) { + quote = true; + break; + } + pos++; + } + + if (!quote) { + pos = end - 1; + c = value.charAt(pos); + // if (c == ' ' || c == '\f' || c == '\t') { + // Some other chars at the end caused the parser to fail, so for now + // encapsulate if we end in anything less than ' ' + if (c <= SP) { + quote = true; + } + } + } + } + + if (!quote) { + // no encapsulation needed - write out the original value + out.append(value, start, end); + return; + } + break; + } + + if (!quote) { + // no encapsulation needed - write out the original value + out.append(value, start, end); + return; + } + + // we hit something that needed encapsulation + out.append(quoteChar); + + // Pick up where we left off: pos should be positioned on the first character that caused + // the need for encapsulation. + while (pos < end) { + final char c = value.charAt(pos); + if (c == quoteChar) { + // write out the chunk up until this point + + // add 1 to the length to write out the encapsulator also + out.append(value, start, pos + 1); + // put the next starting position on the encapsulator so we will + // write it out again with the next string (effectively doubling it) + start = pos; + } + pos++; + } + + // write the last segment + out.append(value, start, pos); + out.append(quoteChar); + } + + /** + * Prints a comment on a new line among the delimiter separated values. Comments will always begin on a new line + * and occupy a least one full line. The character specified to start comments and a space will be inserted at the + * beginning of each new line in the comment. + *

+ * If comments are disabled in the current CSV format this method does nothing. + * + * @param comment + * the comment to output + * @throws IOException + * If an I/O error occurs + */ + public void printComment(final String comment) throws IOException { + if (!format.isCommentingEnabled()) { + return; + } + if (!newRecord) { + println(); + } + out.append(format.getCommentStart().charValue()); + out.append(SP); + for (int i = 0; i < comment.length(); i++) { + final char c = comment.charAt(i); + switch (c) { + case CR: + if (i + 1 < comment.length() && comment.charAt(i + 1) == LF) { + i++; + } + //$FALL-THROUGH$ break intentionally excluded. + case LF: + println(); + out.append(format.getCommentStart().charValue()); + out.append(SP); + break; + default: + out.append(c); + break; + } + } + println(); + } + + /** + * Outputs the record separator. + * + * @throws IOException + * If an I/O error occurs + */ + public void println() throws IOException { + out.append(format.getRecordSeparator()); + newRecord = true; + } + + /** + * Prints a single line of delimiter separated values. The values will be quoted if needed. Quotes and newLine + * characters will be escaped. + * + * @param values + * values to output. + * @throws IOException + * If an I/O error occurs + */ + public void printRecord(final Iterable values) throws IOException { + for (final Object value : values) { + print(value); + } + println(); + } + + /** + * Prints a single line of delimiter separated values. The values will be quoted if needed. Quotes and newLine + * characters will be escaped. + * + * @param values + * values to output. + * @throws IOException + * If an I/O error occurs + */ + public void printRecord(final Object... values) throws IOException { + for (final Object value : values) { + print(value); + } + println(); + } + + /** + * Prints all the objects in the given collection. + * + * @param values + * the values to print. + * @throws IOException + * If an I/O error occurs + */ + public void printRecords(final Iterable values) throws IOException { + for (final Object value : values) { + if (value instanceof Object[]) { + this.printRecord((Object[]) value); + } else if (value instanceof Iterable) { + this.printRecord((Iterable) value); + } else { + this.printRecord(value); + } + } + } + + /** + * Prints all the objects in the given array. + * + * @param values + * the values to print. + * @throws IOException + * If an I/O error occurs + */ + public void printRecords(final Object[] values) throws IOException { + for (final Object value : values) { + if (value instanceof Object[]) { + this.printRecord((Object[]) value); + } else if (value instanceof Iterable) { + this.printRecord((Iterable) value); + } else { + this.printRecord(value); + } + } + } + + /** + * Prints all the objects in the given JDBC result set. + * + * @param resultSet result set + * the values to print. + * @throws IOException + * If an I/O error occurs + * @throws SQLException if a database access error occurs + */ + public void printRecords(final ResultSet resultSet) throws SQLException, IOException { + final int columnCount = resultSet.getMetaData().getColumnCount(); + while (resultSet.next()) { + for (int i = 1; i <= columnCount; i++) { + print(resultSet.getString(i)); + } + println(); + } + } + + /** + * Gets the target Appendable. + * + * @return the target Appendable. + */ + public Appendable getOut() { + return this.out; + } +} http://git-wip-us.apache.org/repos/asf/incubator-phoenix/blob/738619db/phoenix-core/src/main/java/org/apache/commons/csv/CSVRecord.java ---------------------------------------------------------------------- diff --git a/phoenix-core/src/main/java/org/apache/commons/csv/CSVRecord.java b/phoenix-core/src/main/java/org/apache/commons/csv/CSVRecord.java new file mode 100644 index 0000000..52aee96 --- /dev/null +++ b/phoenix-core/src/main/java/org/apache/commons/csv/CSVRecord.java @@ -0,0 +1,224 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.commons.csv; + +import java.io.Serializable; +import java.util.Arrays; +import java.util.HashMap; +import java.util.Iterator; +import java.util.List; +import java.util.Map; +import java.util.Map.Entry; + +/** + * A CSV record parsed from a CSV file. + * + * @version $Id: CSVRecord.java 1560399 2014-01-22 16:07:23Z ggregory $ + */ +public final class CSVRecord implements Serializable, Iterable { + + private static final String[] EMPTY_STRING_ARRAY = new String[0]; + + private static final long serialVersionUID = 1L; + + /** The accumulated comments (if any) */ + private final String comment; + + /** The column name to index mapping. */ + private final Map mapping; + + /** The record number. */ + private final long recordNumber; + + /** The values of the record */ + private final String[] values; + + CSVRecord(final String[] values, final Map mapping, + final String comment, final long recordNumber) { + this.recordNumber = recordNumber; + this.values = values != null ? values : EMPTY_STRING_ARRAY; + this.mapping = mapping; + this.comment = comment; + } + + /** + * Returns a value by {@link Enum}. + * + * @param e + * an enum + * @return the String at the given enum String + */ + public String get(final Enum e) { + return get(e.toString()); + } + + /** + * Returns a value by index. + * + * @param i + * a column index (0-based) + * @return the String at the given index + */ + public String get(final int i) { + return values[i]; + } + + /** + * Returns a value by name. + * + * @param name + * the name of the column to be retrieved. + * @return the column value, maybe null depending on {@link CSVFormat#getNullString()}. + * @throws IllegalStateException + * if no header mapping was provided + * @throws IllegalArgumentException + * if {@code name} is not mapped or if the record is inconsistent + * @see #isConsistent() + * @see CSVFormat#withNullString(String) + */ + public String get(final String name) { + if (mapping == null) { + throw new IllegalStateException( + "No header mapping was specified, the record values can't be accessed by name"); + } + final Integer index = mapping.get(name); + if (index == null) { + throw new IllegalArgumentException(String.format("Mapping for %s not found, expected one of %s", name, + mapping.keySet())); + } + try { + return values[index.intValue()]; + } catch (final ArrayIndexOutOfBoundsException e) { + throw new IllegalArgumentException(String.format( + "Index for header '%s' is %d but CSVRecord only has %d values!", name, index, + Integer.valueOf(values.length))); + } + } + + /** + * Returns the comment for this record, if any. + * + * @return the comment for this record, or null if no comment for this + * record is available. + */ + public String getComment() { + return comment; + } + + /** + * Returns the number of this record in the parsed CSV file. + * + * @return the number of this record. + */ + public long getRecordNumber() { + return recordNumber; + } + + /** + * Returns true if this record is consistent, false if not. Currently, the only check is matching the record size to + * the header size. Some programs can export files that fails this test but still produce parsable files. + * + * @return true of this record is valid, false if not + */ + public boolean isConsistent() { + return mapping == null ? true : mapping.size() == values.length; + } + + /** + * Checks whether a given column is mapped, i.e. its name has been defined to the parser. + * + * @param name + * the name of the column to be retrieved. + * @return whether a given column is mapped. + */ + public boolean isMapped(final String name) { + return mapping != null ? mapping.containsKey(name) : false; + } + + /** + * Checks whether a given columns is mapped and has a value. + * + * @param name + * the name of the column to be retrieved. + * @return whether a given columns is mapped and has a value + */ + public boolean isSet(final String name) { + return isMapped(name) && mapping.get(name).intValue() < values.length; + } + + /** + * Returns an iterator over the values of this record. + * + * @return an iterator over the values of this record. + */ + public Iterator iterator() { + return toList().iterator(); + } + + /** + * Puts all values of this record into the given Map. + * + * @param map The Map to populate. + * @return the given map. + */ + > M putIn(final M map) { + for (final Entry entry : mapping.entrySet()) { + map.put(entry.getKey(), values[entry.getValue().intValue()]); + } + return map; + } + + /** + * Returns the number of values in this record. + * + * @return the number of values. + */ + public int size() { + return values.length; + } + + /** + * Converts the values to a List. + * + * TODO: Maybe make this public? + * @return a new List + */ + private List toList() { + return Arrays.asList(values); + } + + /** + * Copies this record into a new Map. The new map is not connect + * + * @return A new Map. The map is empty if the record has no headers. + */ + public Map toMap() { + return putIn(new HashMap(values.length)); + } + + @Override + public String toString() { + return Arrays.toString(values); + } + + String[] values() { + return values; + } + + +} http://git-wip-us.apache.org/repos/asf/incubator-phoenix/blob/738619db/phoenix-core/src/main/java/org/apache/commons/csv/Constants.java ---------------------------------------------------------------------- diff --git a/phoenix-core/src/main/java/org/apache/commons/csv/Constants.java b/phoenix-core/src/main/java/org/apache/commons/csv/Constants.java new file mode 100644 index 0000000..9817158 --- /dev/null +++ b/phoenix-core/src/main/java/org/apache/commons/csv/Constants.java @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.commons.csv; + +/** + * Constants for this package. + * + * @version $Id: Constants.java 1509069 2013-08-01 02:04:27Z ggregory $ + */ +final class Constants { + + static final char BACKSPACE = '\b'; + static final char COMMA = ','; + + /** + * Starts a comment, the remainder of the line is the comment. + */ + static final char COMMENT = '#'; + + static final char CR = '\r'; + static final Character DOUBLE_QUOTE_CHAR = Character.valueOf('"'); + static final char BACKSLASH = '\\'; + static final char FF = '\f'; + static final char LF = '\n'; + static final char SP = ' '; + static final char TAB = '\t'; + static final String EMPTY = ""; + + /** The end of stream symbol */ + static final int END_OF_STREAM = -1; + + /** Undefined state for the lookahead char */ + static final int UNDEFINED = -2; + + /** According to RFC 4180, line breaks are delimited by CRLF */ + static final String CRLF = "\r\n"; + + /** + * Unicode line separator. + */ + static final String LINE_SEPARATOR = "\u2028"; + + /** + * Unicode paragraph separator. + */ + static final String PARAGRAPH_SEPARATOR = "\u2029"; + + /** + * Unicode next line. + */ + static final String NEXT_LINE = "\u0085"; + +} http://git-wip-us.apache.org/repos/asf/incubator-phoenix/blob/738619db/phoenix-core/src/main/java/org/apache/commons/csv/ExtendedBufferedReader.java ---------------------------------------------------------------------- diff --git a/phoenix-core/src/main/java/org/apache/commons/csv/ExtendedBufferedReader.java b/phoenix-core/src/main/java/org/apache/commons/csv/ExtendedBufferedReader.java new file mode 100644 index 0000000..c50d339 --- /dev/null +++ b/phoenix-core/src/main/java/org/apache/commons/csv/ExtendedBufferedReader.java @@ -0,0 +1,178 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.commons.csv; + +import static org.apache.commons.csv.Constants.CR; +import static org.apache.commons.csv.Constants.END_OF_STREAM; +import static org.apache.commons.csv.Constants.LF; +import static org.apache.commons.csv.Constants.UNDEFINED; + +import java.io.BufferedReader; +import java.io.IOException; +import java.io.Reader; + +/** + * A special buffered reader which supports sophisticated read access. + *

+ * In particular the reader supports a look-ahead option, which allows you to see the next char returned by + * {@link #read()}. + * + * @version $Id: ExtendedBufferedReader.java 1512625 2013-08-10 11:07:15Z britter $ + */ +final class ExtendedBufferedReader extends BufferedReader { + + /** The last char returned */ + private int lastChar = UNDEFINED; + + /** The count of EOLs (CR/LF/CRLF) seen so far */ + private long eolCounter = 0; + + private boolean closed; + + /** + * Created extended buffered reader using default buffer-size + */ + ExtendedBufferedReader(final Reader reader) { + super(reader); + } + + @Override + public int read() throws IOException { + final int current = super.read(); + if (current == CR || (current == LF && lastChar != CR)) { + eolCounter++; + } + lastChar = current; + return lastChar; + } + + /** + * Returns the last character that was read as an integer (0 to 65535). This will be the last character returned by + * any of the read methods. This will not include a character read using the {@link #lookAhead()} method. If no + * character has been read then this will return {@link Constants#UNDEFINED}. If the end of the stream was reached + * on the last read then this will return {@link Constants#END_OF_STREAM}. + * + * @return the last character that was read + */ + int getLastChar() { + return lastChar; + } + + @Override + public int read(final char[] buf, final int offset, final int length) throws IOException { + if (length == 0) { + return 0; + } + + final int len = super.read(buf, offset, length); + + if (len > 0) { + + for (int i = offset; i < offset + len; i++) { + final char ch = buf[i]; + if (ch == LF) { + if (CR != (i > 0 ? buf[i - 1] : lastChar)) { + eolCounter++; + } + } else if (ch == CR) { + eolCounter++; + } + } + + lastChar = buf[offset + len - 1]; + + } else if (len == -1) { + lastChar = END_OF_STREAM; + } + + return len; + } + + /** + * Calls {@link BufferedReader#readLine()} which drops the line terminator(s). This method should only be called + * when processing a comment, otherwise information can be lost. + *

+ * Increments {@link #eolCounter} + *

+ * Sets {@link #lastChar} to {@link Constants#END_OF_STREAM} at EOF, otherwise to LF + * + * @return the line that was read, or null if reached EOF. + */ + @Override + public String readLine() throws IOException { + final String line = super.readLine(); + + if (line != null) { + lastChar = LF; // needed for detecting start of line + eolCounter++; + } else { + lastChar = END_OF_STREAM; + } + + return line; + } + + /** + * Returns the next character in the current reader without consuming it. So the next call to {@link #read()} will + * still return this value. Does not affect line number or last character. + * + * @return the next character + * + * @throws IOException + * if there is an error in reading + */ + int lookAhead() throws IOException { + super.mark(1); + final int c = super.read(); + super.reset(); + + return c; + } + + /** + * Returns the current line number + * + * @return the current line number + */ + long getCurrentLineNumber() { + // Check if we are at EOL or EOF or just starting + if (lastChar == CR || lastChar == LF || lastChar == UNDEFINED || lastChar == END_OF_STREAM) { + return eolCounter; // counter is accurate + } + return eolCounter + 1; // Allow for counter being incremented only at EOL + } + + public boolean isClosed() { + return closed; + } + + /** + * Closes the stream. + * + * @throws IOException + * If an I/O error occurs + */ + @Override + public void close() throws IOException { + // Set ivars before calling super close() in case close() throws an IOException. + closed = true; + lastChar = END_OF_STREAM; + super.close(); + } + +}