spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From MaxGekk <...@git.apache.org>
Subject [GitHub] spark issue #20937: [SPARK-23094][SPARK-23723][SPARK-23724][SQL] Support cus...
Date Mon, 09 Apr 2018 09:27:45 GMT
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/20937
  
    @HyukjinKwon Let's sync.
    
    > Automatic encoding detection doesn't work for newlines and schema inference when
multiLine is disabled
    
    I don't know about you but I used to think if something doesn't work it means it doesn't
work in ALL cases. You write some statements that are partially correct or incorrect. About
this statement, here are counterexamples:
    1. File in UTF-8, multiline is disabled - newline and schema will be inferred correctly?
Yes
    2. File in ISO 8859-1, multiline is disabled. Does it work? Yes.
    3. Encoding is CP1251 - the same
    
    All those examples show that your statement is wrong in mathematical meaning. 
    
    > I thought this PR targets to add the **explicit encoding** support mainly
    
    EXACTLY. I don't know why do you push me to do something with auto-detection. The PR doesn't
change behavior in the case if `encoding` is not specified. The PR is not about supporting
any encoding in any cases. It is about the cases when the `encoding` is specified by an user
explicitly.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message