spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From HyukjinKwon <...@git.apache.org>
Subject [GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...
Date Wed, 12 Sep 2018 08:32:30 GMT
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22227#discussion_r216941249
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
    @@ -2546,15 +2546,51 @@ object functions {
       def soundex(e: Column): Column = withExpr { SoundEx(e.expr) }
     
       /**
    -   * Splits str around pattern (pattern is a regular expression).
    +   * Splits str around matches of the given regex.
        *
    -   * @note Pattern is a string representation of the regular expression.
    +   * @param str a string expression to split
    +   * @param regex a string representing a regular expression. The regex string should
be
    +   *              a Java regular expression.
        *
        * @group string_funcs
        * @since 1.5.0
        */
    -  def split(str: Column, pattern: String): Column = withExpr {
    -    StringSplit(str.expr, lit(pattern).expr)
    +  def split(str: Column, regex: String): Column = withExpr {
    +    StringSplit(str.expr, Literal(regex), Literal(-1))
    +  }
    +
    +  /**
    +   * Splits str around matches of the given regex.
    +   *
    +   * @param str a string expression to split
    +   * @param regex a string representing a regular expression. The regex string should
be
    +   *              a Java regular expression.
    +   * @param limit an integer expression which controls the number of times the regex
is applied.
    +   *        <ul>
    +   *          <li>limit greater than 0
    +   *            <ul>
    +   *              <li>
    +   *                The resulting array's length will not be more than limit,
    +   *                and the resulting array's last entry will contain all input
    +   *                beyond the last matched regex.
    +   *             </li>
    +   *            </ul>
    +   *          </li>
    +   *          <li>limit less than or equal to 0
    +   *            <ul>
    +   *              <li>
    +   *                `regex` will be applied as many times as possible,
    +   *                and the resulting array can be of any size.
    +   *              </li>
    +   *            </ul>
    +   *          </li>
    +   *        </ul>
    --- End diff --
    
    I think you can just:
    
    ```
       *        <ul>
       *          <li>limit greater than 0: The resulting array's length will not be
more than limit,
       *          and the resulting array's last entry will contain all input
       *          beyond the last matched regex.</li>
       *          <li>limit less than or equal to 0: `regex` will be applied as many
times as possible,
       *          and the resulting array can be of any size.</li>
       *        </ul>
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message