spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Davidson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-23878) unable to import col() or lit()
Date Fri, 06 Apr 2018 03:25:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-23878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16427883#comment-16427883
] 

Andrew Davidson commented on SPARK-23878:
-----------------------------------------

Hi Hyukjin

you are correct. Most IDE's are primarily language aware editors and builders. For example
consider eclipse or IntelJ for developing a javascript website, or java servlet. The editor
functionality knows about the syntax of the language you are working with along with the libraries
and packages you are using. Often the IDE does some sort of continuous build or code analysis
to help you find bugs without having to deploy 

Often the IDE makes it easy build, package, to actually deploy on some sort of test server
and debug and or run unit tests.

So if pyspark is generating functions at turn time that going to cause problems for the IDE.
the functions are not defined in the edit session. 

[http://www.learn4master.com/algorithms/pyspark-unit-test-set-up-sparkcontext] describes
how to write unititests for pyspark that you can run from your command line and or from with
in elipse.  I think a side effect is that they might cause the functions lit() and col()
to be generated?

 

I could not find a work around for col() and lit().

 

    ret = df.select(

                col(columnName).cast("string").alias("key"),

                lit(value).alias("source")

            )

 

Kind regards

 

Andy  

> unable to import col() or lit()
> -------------------------------
>
>                 Key: SPARK-23878
>                 URL: https://issues.apache.org/jira/browse/SPARK-23878
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.3.0
>         Environment: eclipse 4.7.3
> pyDev 6.3.2
> pyspark==2.3.0
>            Reporter: Andrew Davidson
>            Priority: Major
>
> I have some code I am moving from a jupyter notebook to separate python modules. My notebook
uses col() and list() and works fine
> when I try to work with module files in my IDE I get the following errors. I am also
not able to run my unit tests.
> {color:#FF0000}Description Resource Path Location Type{color}
> {color:#FF0000}Unresolved import: lit load.py /adt_pyDevProj/src/automatedDataTranslation
line 22 PyDev Problem{color}
> {color:#FF0000}Description Resource Path Location Type{color}
> {color:#FF0000}Unresolved import: col load.py /adt_pyDevProj/src/automatedDataTranslation
line 21 PyDev Problem{color}
> I suspect that when you run pyspark it is generating the col and lit functions?
> I found a discription of the problem @ [https://stackoverflow.com/questions/40163106/cannot-find-col-function-in-pyspark] I
do not understand how to make this work in my IDE. I am not running pyspark just an editor
> is there some sort of workaround or replacement for these missing functions?
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message