spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jan Gorecki (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-26433) Tail method for spark DataFrame
Date Sun, 30 Dec 2018 05:07:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-26433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16730887#comment-16730887
] 

Jan Gorecki commented on SPARK-26433:
-------------------------------------

[~hyukjin.kwon] Thank you for your comment but not sure if I understood correctly. You mean
I should first collect data to client and then extract last few rows of dataframe? If so it
doesn't seems to be a feasible solution, as data in spark are likely to not fit into client
machine. `Tail` is exactly the operation that one would want to perform BEFORE collecting
data to client. Could you confirm?

> Tail method for spark DataFrame
> -------------------------------
>
>                 Key: SPARK-26433
>                 URL: https://issues.apache.org/jira/browse/SPARK-26433
>             Project: Spark
>          Issue Type: New Feature
>          Components: PySpark
>    Affects Versions: 2.4.0
>            Reporter: Jan Gorecki
>            Priority: Major
>
> There is a head method for spark dataframes which work fine but there doesn't seems to
be tail method.
> ```
> >>> ans                                                                    
    
> DataFrame[v1: bigint]                                                           
> >>> ans.head(3)                                                            
   
> [Row(v1=299443), Row(v1=299493), Row(v1=300751)]
> >>> ans.tail(3)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/home/jan/git/db-benchmark/spark/py-spark/lib/python3.6/site-packages/py
> spark/sql/dataframe.py", line 1300, in __getattr__
>     "'%s' object has no attribute '%s'" % (self.__class__.__name__, name))
> AttributeError: 'DataFrame' object has no attribute 'tail'
> ```
> I would like to feature request Tail method for spark dataframe



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message