spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hemant singh <hemant2...@gmail.com>
Subject Re: pyspark execution
Date Tue, 17 Apr 2018 07:32:53 GMT
If it contains only SQL then you can use a function as below -

import subprocess

def run_sql(sql_file_path, your_db_name ,location):

subprocess.call(["spark-sql","-S","--hivevar","<DBName>",<your_db_name>,"--hivevar","LOCATION",location,"-f",sql_file_path])

In you have other pieces like spark code and not only sql in that file-

Write a parse function which parse you sql and replace the placeholders
like DB Name etc in your sql and then execute the new formed sql.

Maintaining your sql in a separate file though de-couples the code and sql
and make it easier from maintenance perspective.

On Tue, Apr 17, 2018 at 8:11 AM, anudeep <anud33p@gmail.com> wrote:

> Hi All,
>
> I have a python file which I am executing directly with spark-submit
> command.
>
> Inside the python file, I have sql written using hive context.I created a
> generic variable for the  database name inside sql
>
> The problem is : How can I pass the value for this variable dynamically
> just as we give in hive like --hivevar parameter.
>
> Thanks!
> Anudeep
>
>
>
>
>
>
>

Mime
View raw message