spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Evan R. Sparks" <evan.spa...@gmail.com>
Subject Re: Pandas' Shift in Dataframe
Date Wed, 29 Apr 2015 20:34:23 GMT
In general there's a tension between ordered data and set-oriented data
model underlying DataFrames. You can force a total ordering on the data,
but it may come at a high cost with respect to performance.

It would be good to get a sense of the use case you're trying to support,
but one suggestion would be to apply I can imagine achieving a similar
result by applying a datetime.timedelta (in Python terms) to a time
attribute (your "axis") and then performing join between the base table and
this derived table to merge the data back together. This type of join could
then be optimized if the use case is frequent enough to warrant it.

- Evan

On Wed, Apr 29, 2015 at 1:25 PM, Reynold Xin <rxin@databricks.com> wrote:

> In this case it's fine to discuss whether this would fit in Spark
> DataFrames' high level direction before putting it in JIRA. Otherwise we
> might end up creating a lot of tickets just for querying whether something
> might be a good idea.
>
> About this specific feature -- I'm not sure what it means in general given
> we don't have axis in Spark DataFrames. But I think it'd probably be good
> to be able to shift a column by one so we can support the end time / begin
> time case, although it'd require two passes over the data.
>
>
>
> On Wed, Apr 29, 2015 at 1:08 PM, Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
> > I can't comment on the direction of the DataFrame API (that's more for
> > Reynold or Michael I guess), but I just wanted to point out that the JIRA
> > would be the recommended way to create a central place for discussing a
> > feature add like that.
> >
> > Nick
> >
> > On Wed, Apr 29, 2015 at 3:43 PM Olivier Girardot <
> > o.girardot@lateral-thoughts.com> wrote:
> >
> > > Hi Nicholas,
> > > yes I've already checked, and I've just created the
> > > https://issues.apache.org/jira/browse/SPARK-7247
> > > I'm not even sure why this would be a good feature to add except the
> fact
> > > that some of the data scientists I'm working with are using it, and it
> > > would be therefore useful for me to translate Pandas code to Spark...
> > >
> > > Isn't the goal of Spark Dataframe to allow all the features of Pandas/R
> > > Dataframe using Spark ?
> > >
> > > Regards,
> > >
> > > Olivier.
> > >
> > > Le mer. 29 avr. 2015 à 21:09, Nicholas Chammas <
> > nicholas.chammas@gmail.com>
> > > a écrit :
> > >
> > >> You can check JIRA for any existing plans. If there isn't any, then
> feel
> > >> free to create a JIRA and make the case there for why this would be a
> > good
> > >> feature to add.
> > >>
> > >> Nick
> > >>
> > >> On Wed, Apr 29, 2015 at 7:30 AM Olivier Girardot <
> > >> o.girardot@lateral-thoughts.com> wrote:
> > >>
> > >>> Hi,
> > >>> Is there any plan to add the "shift" method from Pandas to Spark
> > >>> Dataframe,
> > >>> not that I think it's an easy task...
> > >>>
> > >>> c.f.
> > >>>
> > >>>
> >
> http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.shift.html
> > >>>
> > >>> Regards,
> > >>>
> > >>> Olivier.
> > >>>
> > >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message