mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anand Avati <av...@gluster.org>
Subject Re: Mahout DSL vs Spark
Date Wed, 30 Apr 2014 05:06:44 GMT
OK, that's fine. Those which produce $directional concatenation, are
$directional blocks. The pdf has to be updated in that case.

Thanks

On Tue, Apr 29, 2014 at 10:01 PM, Dmitriy Lyubimov <dlieu.7@gmail.com>wrote:

> at least it seems to be consistent with matlab.
>
> In R these operations are called rbind (row-bind) and cbind (column-bind)
> respectively
>
>
> On Tue, Apr 29, 2014 at 9:57 PM, Dmitriy Lyubimov <dlieu.7@gmail.com>wrote:
>
>> No, i think my take is more common. i can't immediately find an
>> authoritative reference, but there's definition of vertical and horizontal
>> concatenation. So i assume it is intuitive to call blocks producing
>> vertical concatenation, vertical blocks. [1]
>>
>> [1]
>> http://www-rohan.sdsu.edu/doc/matlab/toolbox/simulink/slref/matrixconcatenation.html
>>
>>
>> On Tue, Apr 29, 2014 at 9:52 PM, Dmitriy Lyubimov <dlieu.7@gmail.com>wrote:
>>
>>> hm . i really did not think of it. i thought vertical blocks are those
>>> that one on top the other. As if one is building a vertical tower.
>>>
>>> let me check what official math terminology is.
>>>
>>>
>>> On Tue, Apr 29, 2014 at 9:47 PM, Anand Avati <avati@gluster.org> wrote:
>>>
>>>>
>>>>
>>>>
>>>> On Tue, Apr 29, 2014 at 9:20 PM, Dmitriy Lyubimov <dlieu.7@gmail.com>wrote:
>>>>
>>>>> actually I imply vertical slicing as A(100 to 200, ::). if it is the
>>>>> other way around it is a typo.
>>>>>
>>>>
>>>> Isn't that counter-intuitive? Isn't the syntax A(row,col), therefore
>>>> A(100 to 200, ::) mean all (columns) of rows 100 through 200 - so they are
>>>> horizontal slices, no?
>>>>
>>>>
>>>>
>>>>>
>>>>> strictly speaking this doc is working notes, not a manual (i.e. i just
>>>>> filled it in as i went with design so i don't forget myself). i guess
>>>>> there's a gap between it and an actual doc. I suggested to keep it for
>>>>> reference (since it exists) but rather create an html-based wiki/cms
doc
>>>>> pages. this is todo.
>>>>>
>>>>>
>>>>> On Tue, Apr 29, 2014 at 7:19 PM, Anand Avati <avati@gluster.org>wrote:
>>>>>
>>>>>>
>>>>>> On Mon, Apr 28, 2014 at 11:15 PM, Dmitriy Lyubimov <dlieu.7@gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Apr 28, 2014 at 7:23 PM, Anand Avati <avati@gluster.org>wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I'm not sure I completely understand mapBlock. Can you please
give
>>>>>>>> a concrete example (with a simple 2x3 matrix) of how mapblock
works? I have
>>>>>>>> a reasonable understanding of how Spark partitions and distributes
data of
>>>>>>>> its RDD. Based on that, and knowing how H2O distributes data,
I feel it is
>>>>>>>> a matter of providing thing logic and wrapper to make something
built on
>>>>>>>> Spark to be built on H2O. That being said, I want to make
sure I do not
>>>>>>>> misunderstand or make wrong assumptions about mapBlock, hence
request for a
>>>>>>>> concrete example.
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>>
>>>>>>> Anand,
>>>>>>>
>>>>>>> concrete examples are given and explained in scala/spark bindings
>>>>>>> documentation on Mahout website.
>>>>>>>
>>>>>>> Also, there's a talk and slides from last Mahout meetup that
also
>>>>>>> discuss Mahout DRM structure and access to it in case of sparkbindings.
>>>>>>>
>>>>>>> Come back if you still have questions after that (along with
>>>>>>> suggestions what can be improved in the docs to make things easier).
>>>>>>>
>>>>>>
>>>>>> Dmitry,
>>>>>> Thanks for the link, now I understand what's happening with
>>>>>> mapBlock(), and it is exactly how I had understood initially (before
>>>>>> un-understanding :p). I don't see it being a huge problem to provide
a
>>>>>> mapBlock() over H2O. The part which confused me (both your email
and in
>>>>>> ScalaSparkBindings.pdf) is this -
>>>>>>
>>>>>> page 17:
>>>>>>
>>>>>> ...
>>>>>> Vertical block
>>>>>>   A(::, 100 to 200)
>>>>>> ...
>>>>>> mapBlock provides ... "vertical blockiļ¬ed tuples of the matrix"
>>>>>>
>>>>>> The terminology of "Vertical block" describing as A(::, 100 to 200),
>>>>>> is intuitive and feels "right".
>>>>>>
>>>>>> But then when mapBlock is described as presenting "vertical
>>>>>> block"ified tuples, maybe it is just me, sounds as if mapBlock gives
you a
>>>>>> subset of full columns in the form a Matrix (while it actually provides
a
>>>>>> subset of full rows in the form of a Matrix). It was this interpretation
of
>>>>>> orthogonal orientation associated with "vertical block"(ified tuples)
which
>>>>>> caused my confusion.
>>>>>>
>>>>>> It would be very helpful if the documentation on that page explicitly
>>>>>> states that mapblock presents a subset of full rows. It feels obvious
>>>>>> looking backwards, but the terminology was confusing initially. It
is
>>>>>> somewhat implied in a later statement "...should not change the height
of
>>>>>> the block, in order to provide correct total matrix row count ...",
but
>>>>>> that wasn't good enough in the first parse.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> PS: It might be helpful if
>>>>>> http://mahout.apache.org/users/sparkbindings/ScalaSparkBindings.pdfis
made available under doc/ in the repository for future code inspectors.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message