hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anty <anty....@gmail.com>
Subject Re: how to write this MapReduce
Date Tue, 27 Oct 2009 02:55:23 GMT
@Thomas
Thanks.My input files are sorted .
@Jingkei
Thanks.I will have a look at the instructions for join.

On Tue, Oct 27, 2009 at 12:39 AM, Thomas Thevis <thomas.thevis@vionto.com>wrote:

> Hey Anty,
>
> there exists a config key 'map.input.file' which should return the name of
> the input file the mapper gets its input values from.
> In the pre-hadoop-0.20.0 era, one would have to implement the configure()
> method to have access to the configuration. Since then, it could be possible
> to use the configuration from the context object.
> However, if your input files aren't sorted in any way, this approach won't
> work.
>
> Best Regards
> Thomas
>
>
> Anty schrieb:
>
>> Thanks very much for your reply Thomas.
>> I search in Mapper.map() method,but i still can't find out the way to
>> retrieve the source file name of the input data,can you describe in more
>> details?
>> for your proposed suggestion,i have some doubts,
>> the names of the three files are  random,so we couldn't sort the values by
>> file name,which will not correspond  to the order of
>> (value1A,value1B,value1C),e.g
>> "bbbb"                  "aaaa"                   "ccccc"
>>
>> key1-value1A      key1-value1B     key1-value1C
>>
>> then if we sort the value by file name,the result will be
>> "key1-(value1B,value1A,
>> value1C)" or "key1-(value1C,value1A,value1B)"
>> Maybe i should use some particular rules to sort the values.
>> Thanks Thomas.
>>
>>
>> On Mon, Oct 26, 2009 at 11:36 PM, Anty <anty.rao@gmail.com <mailto:
>> anty.rao@gmail.com>> wrote:
>>
>>    Thanks very much for your reply Thomas.
>>    I search in Mapper.map() method,but i still can't find out the way
>>    to retrieve the source file name of the input data,can you describe
>>    in more details?
>>    for your proposed suggestion,i have some doubts,
>>    the names of the three files are  random,so we couldn't sort the
>>    values by file name,which will not correspond  to the order of
>>    (value1A,value1B,value1C),e.g
>>    "bbbb"                  "aaaa"                   "ccccc"
>>
>>    key1-value1A      key1-value1B     key1-value1C
>>
>>    then if we sort the value by file name,the result will be
>>    "key1-(value1B,value1A,value1C)" or "key1-(value1C,value1A,value1B)"
>>    Maybe i should use some particular rules to sort the values.
>>    Thanks Thomas.
>>
>>
>>    Up to now i don't know how to retrieve the source file name of the
>>    input data within Mapper.map() method,.Anyway,i have some doubts
>>    about your proposed suggestion.
>>
>>
>>    On Mon, Oct 26, 2009 at 8:59 PM, Thomas Thevis
>>    <thomas.thevis@vionto.com <mailto:thomas.thevis@vionto.com>> wrote:
>>
>>        Hi Anty,
>>
>>        as far as I know, it is possible to retrieve the source file
>>        name of the input data within the Mapper's map() method.
>>        If so, you could use secondary sort on values (have a look at
>>        the Hadoop wiki pages) to propagate the values sorted first by
>>        key and second by filename to the Reducer which could aggregate
>>        them in any particukar way.
>>
>>        Hope that helps
>>        Thomas
>>
>>
>>        Anty schrieb:
>>
>>            Does MultipleInputs meet this situation?
>>            Does any one have any idea about this?
>>
>>            On Mon, Oct 26, 2009 at 7:44 PM, Anty <anty.rao@gmail.com
>>            <mailto:anty.rao@gmail.com> <mailto:anty.rao@gmail.com
>>
>>            <mailto:anty.rao@gmail.com>>> wrote:
>>
>>               Hi:
>>               all
>>               I have a such use case:i have three files,each file is
>>            key-value pairs,
>>               file1:                         file2:
>>              file3:
>>               key1-value1A           key1-value1B           key1-value1C
>>               key2-value2A           key2-value2B           key2-value2C
>>               key3-value3A           kye3-value3B           kye3-value3C
>>                  .....                                    ......
>>                          .....
>>               now ,i want to write a MR job to generate a file,
>>               file4:
>>               key1-(value1A,value1B,value1C)
>>               key2-(value2A,value2B,value2C)
>>               key3-(value3A,value3B,value3C)
>>               ..........
>>               Any suggestion will be appreciated.
>>               --    Best Regards
>>               Anty Rao
>>
>>
>>
>>
>>            --            Best Regards
>>            Anty Rao
>>
>>
>>
>>
>>
>>    --    Best Regards
>>    Anty Rao
>>
>>
>>
>>
>> --
>> Best Regards
>> Anty Rao
>>
>
>
> --
> Thomas Thevis
> Software Developer
> ------------------------------------------------------------
> vionto GmbH
> Karl-Marx-Allee 90a, D-10243 Berlin
>
> fon   +49 30 40 20 3 29 - 28
> fax   +49 30 40 20 3 29 - 29
> web   http://www.vionto.com
> ------------------------------------------------------------
> Geschäftsführer: Ralf von Grafenstein, Dr. Martin C. Hirsch
> Sitz der Gesellschaft: Berlin
> Amtsgericht Berlin Charlottenburg, HRB 108054B
> ------------------------------------------------------------
>



-- 
Best Regards
Anty Rao

Mime
View raw message