livy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Hu <hood...@gmail.com>
Subject Re: How to use multiple programming languages in the same Spark context in Livy?
Date Mon, 11 Dec 2017 06:34:47 GMT
Ah thanks for the clarification, I also had a feeling that what I asked
might be too good to be true:D Anyway I have a better understanding of how
Livy works now. Appreciate your help, and have a good day!


Regards, David

2017-12-11 14:16 GMT+08:00 Jeff Zhang <zjffdu@gmail.com>:

>
> No, you can not refer a variable in scala when it is defined in scala. You
> need to register the table in python, then get it from
> SparkSession/SparkContext. The following is what you can do.
>
> Python:
>
> df_in_pyspark = spark.read.json("examples/src/main/resources/people.json")
>
> df_in_pyspark.registerTempTable("mytable")
>
> Scala:
>          val df_in_pyspark = spark.table("mytable")
>          val dfInScala: DataFrame = df_in_pyspark.where("age > 35")
>
>
>
> David Hu <hoodavy@gmail.com>于2017年12月11日周一 下午12:09写道:
>
>> Hi Jeff,
>>
>> That's great to know! I've heard of zeppelin and sort of know what it
>> does, but I haven't got a chance to use it by myself. So to confirm if what
>> you are saying is what I am understanding, I'd like to go with a scenario.
>>
>> I first send a POST request to /sessions/1/statements with kind as
>> 'pyspark' and code as the following:
>>
>> df_in_pyspark = spark.read.json("examples/src/main/resources/people.json")
>>
>> the above code defines a dataframe var `df_in_pyspark` in Python code and
>> it will be used in the second POST request to /sessions/1/statements, whose
>> 'kind' is 'spark'(scala) with the following code:
>>
>> val dfInScala: DataFrame = df_in_pyspark.where("age > 35")
>>
>> So basically you were saying that the above code would run without any issues, is
that correct? If so, I assume it also applies to other types of vars like Estimator/Model/Pipeline?
Then how about methods? Is it ok if I define a method in Scala and later use it in Python/R
code and vice versa?
>>
>> Sorry for so many questions but if I could know the answer I would be much assured
to upgrade to latest HDP and enable this awesome feature. Thanks!
>>
>> Regards, David
>>
>> 2017-12-11 11:07 GMT+08:00 Jeff Zhang <zjffdu@gmail.com>:
>>
>>>
>>> You can use dataframe in scala if this dataframe is registered in
>>> python. Because they share the same sparkcontext.
>>>
>>> I believe livy can meet your requirement. If you know zeppelin, the
>>> behavior of livy now is very similar as zeppelin where you can run one
>>> paragraph via scala and another paragraph via python or R. And they run in
>>> the same spark application and be able to share data via sparkcontext.
>>>
>>>
>>>
>>>
>>>
>>>
>>> David Hu <hoodavy@gmail.com>于2017年12月11日周一 上午10:44写道:
>>>
>>>> Hi Jeff & Saisai,
>>>>
>>>> Thank you so much for the explanation and they are very helpful, also
>>>> sorry for not replying in time.
>>>>
>>>> I had read all the links you provided and the impression I got is that,
>>>> correct me if I am wrong, this feature would not allow different
>>>> session-kind interacting with each other? What I mean is, if I ran one
>>>> Scala kind and one Python kind in the same context, unless some kind of
>>>> persistence it won't be possible to refer a dataframe variable in
>>>> Python code that was defined in Scala right?
>>>>
>>>> The goal I want to achieve is to mix different languages together and
>>>> run as one integrated spark job within which vars/methods defined in one
>>>> language can be referred/used in other, because our users might have
>>>> different programming background. It might sound silly but I am keen to
>>>> know if that's possible under the current Livy infrastructure. Appreciate
>>>> it if anyone could answer. Thanks in advance!
>>>>
>>>> Regards, Dawei
>>>>
>>>> 2017-12-04 8:30 GMT+08:00 Saisai Shao <sai.sai.shao@gmail.com>:
>>>>
>>>>> This feature is targeted for Livy 0.5.0 community version. But we
>>>>> already back-ported this in HDP 2.6.3, so you can try this feature in
HDP
>>>>> 2.6.3.
>>>>>
>>>>> You can check this doc (https://github.com/apache/
>>>>> incubator-livy/blob/master/docs/rest-api.md) to see the API
>>>>> difference for this feature.
>>>>>
>>>>> 2017-12-03 9:55 GMT+08:00 Jeff Zhang <zjffdu@gmail.com>:
>>>>>
>>>>>>
>>>>>> It is implemented in https://issues.apache.org/jira/browse/LIVY-194
>>>>>>
>>>>>> But not release in apache version, HDP backport it in their
>>>>>> distribution
>>>>>>
>>>>>>
>>>>>>
>>>>>> 胡大为(David) <hoodavy@gmail.com>于2017年12月2日周六
上午10:58写道:
>>>>>>
>>>>>>> I forgot to add the link reference and here it is.
>>>>>>>
>>>>>>> https://hortonworks.com/blog/hdp-2-6-3-dataplane-service/
>>>>>>>
>>>>>>> Regards, Dawei
>>>>>>>
>>>>>>> On 2 Dec 2017, at 8:24 AM, 胡大为(David) <hoodavy@gmail.com>
wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I was reading the HDP 2.6.3 release notes and it mentions that
Livy
>>>>>>> service is able to multiple programming languages in the same
Spark
>>>>>>> context, but I went through all the Livy document and examples
I can find
>>>>>>> but so far haven’t found out how to get it work. Currently
I am using the
>>>>>>> latest Livy 0.4 to submit Scala code only and it would be awesome
to mix it
>>>>>>> with Python or R code in the same session. Much appreciate it
anyone could
>>>>>>> give me some clue about this.
>>>>>>>
>>>>>>> Thanks in advance and have a good day :)
>>>>>>>
>>>>>>> Regards, Dawei
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>

Mime
View raw message