flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saliya Ekanayake <esal...@gmail.com>
Subject Re: Mapping two datasets
Date Thu, 25 Feb 2016 17:11:12 GMT
Thank you. Any thoughts on the ParallelIteratorInputFormat in Flink?

On Thu, Feb 25, 2016 at 12:07 PM, Márton Balassi <balassi.marton@gmail.com>
wrote:

> Hey Saliya,
>
> I recommend using DataSetUtils.zipWithIndex for this task. [1] It comes
> with flink-java.
>
> [1]
> https://github.com/apache/flink/blob/master/flink-java/src/main/java/org/apache/flink/api/java/utils/DataSetUtils.java#L77
>
> On Thu, Feb 25, 2016 at 5:52 PM, Saliya Ekanayake <esaliya@gmail.com>
> wrote:
>
>> Thank you, Marton. That seems doable.
>>
>> However, is there a way I can create a dummy indexed data set? Like a way
>> to partition the index range without data across parallel tasks. For
>> example, if I could have something like,
>>
>> DataSet<IndexedSet> ds = ...
>>
>> then I can implement a custom method to load required data for a split
>> within a map operation, which will be less expensive than a join for my
>> case.
>>
>> Thank you,
>> Saliya
>>
>> On Thu, Feb 25, 2016 at 11:45 AM, Márton Balassi <
>> balassi.marton@gmail.com> wrote:
>>
>>> Hey Saliya,
>>>
>>> I would add a uniqe ID to both the DataSets, the variable you referred
>>> to as 'i'. Then you can join the two DataSets on the field containing 'i'
>>> and do the mapping on the joined result.
>>>
>>> Hope this helps,
>>>
>>> Marton
>>>
>>> On Thu, Feb 25, 2016 at 5:38 PM, Saliya Ekanayake <esaliya@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I've two data sets like,
>>>>
>>>> DataSet<T> a = ...
>>>> DataSet<T> b = ...
>>>>
>>>> They have the same type and same decomposition. I want to apply a map
>>>> operator that need both *a* and *b. *For example,
>>>>
>>>> a.map( i -> OP)
>>>>
>>>> within this OP I need the corresponding (*i *th) element of *b* as
>>>> well. Is there a way to do this?
>>>>
>>>> Thank you,
>>>> Saliya
>>>>
>>>> --
>>>> Saliya Ekanayake
>>>> Ph.D. Candidate | Research Assistant
>>>> School of Informatics and Computing | Digital Science Center
>>>> Indiana University, Bloomington
>>>> Cell 812-391-4914
>>>> http://saliya.org
>>>>
>>>
>>>
>>
>>
>> --
>> Saliya Ekanayake
>> Ph.D. Candidate | Research Assistant
>> School of Informatics and Computing | Digital Science Center
>> Indiana University, Bloomington
>> Cell 812-391-4914
>> http://saliya.org
>>
>
>


-- 
Saliya Ekanayake
Ph.D. Candidate | Research Assistant
School of Informatics and Computing | Digital Science Center
Indiana University, Bloomington
Cell 812-391-4914
http://saliya.org

Mime
View raw message