spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ankur Srivastava <ankur.srivast...@gmail.com>
Subject Re: Get variable into Spark's foreachRDD function
Date Tue, 29 Sep 2015 00:26:48 GMT
Hi,

You are creating a logger instance on driver and then trying to use that
instance in a transformation function which is executed on the executor.

You should create logger instance in the transformation function itself but
then the logs will go to separate files on each worker node.

Hope this helps.

Thanks
Ankur

On Mon, Sep 28, 2015 at 4:06 PM, markluk <mark@juicero.com> wrote:

> I have a streaming Spark process and I need to do some logging in the
> `foreachRDD` function, but I'm having trouble accessing the logger as a
> variable in the `foreachRDD` function
>
> I would like to do the following
>
>     import logging
>
>     myLogger = logging.getLogger(LOGGER_NAME)
>     ...
>     ...
>     someData = <STREAMING DATA>
>
>     someData.foreachRDD(lambda now, rdds : myLogger.info( <SOMETHING ABOUT
> RDD>))
>
> Inside the lambda, it cannot access `myLogger`. I get a giant stacktrace -
> here is a snippet.
>
>
>       File
>
> "/juicero/press-mgmt/spark-1.5.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/cloudpickle.py",
> line 537, in     save_reduce
>     save(state)
>       File "/usr/lib/python2.7/pickle.py", line 286, in save
>         f(self, obj) # Call unbound method with explicit self
>       File "/usr/lib/python2.7/pickle.py", line 548, in save_tuple
>         save(element)
>       File "/usr/lib/python2.7/pickle.py", line 286, in save
>         f(self, obj) # Call unbound method with explicit self
>       File "/usr/lib/python2.7/pickle.py", line 649, in save_dict
>         self._batch_setitems(obj.iteritems())
>       File "/usr/lib/python2.7/pickle.py", line 681, in _batch_setitems
>         save(v)
>       File "/usr/lib/python2.7/pickle.py", line 286, in save
>         f(self, obj) # Call unbound method with explicit self
>       File
>
> "/juicero/press-mgmt/spark-1.5.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/cloudpickle.py",
> line 315, in save_builtin_function
>         return self.save_function(obj)
>       File
>
> "/juicero/press-mgmt/spark-1.5.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/cloudpickle.py",
> line 191, in save_function
>         if islambda(obj) or obj.__code__.co_filename == '<stdin>' or
> themodule is None:
>     AttributeError: 'builtin_function_or_method' object has no attribute
> '__code__'
>
>
>
> I don't understand why I can't access `myLogger`. Does it have something to
> do with Spark cannot serialize this logger object?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Get-variable-into-Spark-s-foreachRDD-function-tp24852.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message