spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From viirya <...@git.apache.org>
Subject [GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...
Date Mon, 11 Jun 2018 09:44:25 GMT
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21501#discussion_r194346431
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/StopWordsRemoverSuite.scala
---
    @@ -65,6 +65,56 @@ class StopWordsRemoverSuite extends MLTest with DefaultReadWriteTest
{
         testStopWordsRemover(remover, dataSet)
       }
     
    +  test("StopWordsRemover with localed input (case insensitive)") {
    +    val stopWords = Array("milk", "cookie")
    +    val remover = new StopWordsRemover()
    +      .setInputCol("raw")
    +      .setOutputCol("filtered")
    +      .setStopWords(stopWords)
    +      .setLocale("tr")  // Turkish alphabet: has no Q, W, X but has dotted and dotless
'I's.
    --- End diff --
    
    Lets explicitly call  `.setCaseSensitive(false)` here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message