crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Whiting <davidwhit...@gmail.com>
Subject Use of Iterable with combine in Scrunch
Date Wed, 30 Jul 2014 13:52:38 GMT
The Scrunch version of combine accepts a function Iterable[V] => V . This
causes a lot of unexpected behaviour because the iterable that is wrapped
is actually a SingleUseIterable, and much of Scala's collection function
implementations actually try and access the underlying iterator multiple
times if they know that it's possible. This leads to often having to write
code like this:

...
.groupByKey()
.combine { _.iterator reduce { _ + _ } }

This is a silly example of course, because there's an Aggregator for
summation, but if your reduce function is more complex you have to do this
indirection via iterator in order to get correct behaviour.

Possible fixes:
a) Change combine to accept a function TraversableOnce[V] => V or
Iterator[V] => V, better reflecting the single-use nature of the underlying
Iterable
b) Given that most custom combines will in fact be folds over monoids, we
could promote the notion of reduce or fold up the the PGroupedTable itself,
so you can do .groupByKey().foldValues(_+_)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message