hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lac Trung <trungnb3...@gmail.com>
Subject Re: Determine the key of Map function
Date Tue, 24 Apr 2012 12:37:36 GMT
Thanks so much !


Vào 12:21 Ngày 24 tháng 4 năm 2012, Devaraj k <devaraj.k@huawei.com> đã
viết:

> Hi Lac,
>
>  As per my understanding based on your problem description, you need to
> the below things.
>
> 1. Mapper : Write a mapper which emits records from input files and
> convert intto key and values. Here this key should contain teacher id,
> class id and no of students, value can be empty(or null).
> 2. Partitioner : Write Custom partitioner to send all the records for a
> teacher id to one reducer.
> 3. Grouping Comaparator : Write a comparator to group the records based on
> teacher id.
> 4. Sorting Comparator : Write a comparator to sort the records based on
> teacher id and no of students.
> 5. Reducer : In the reducer, you will get the records for all teachers one
> after other and also in the sorted order(by no of students) for a teacher
> id. You can keep how many top records you want in the reducer and finally
> can be written.
>
> You can refer this doc for reference:
> http://www.inf.ed.ac.uk/publications/thesis/online/IM100859.pdf
>
> Thanks
> Devaraj
>
> ________________________________________
> From: Lac Trung [trungnb3535@gmail.com]
> Sent: Tuesday, April 24, 2012 10:11 AM
> To: common-user@hadoop.apache.org
> Subject: Re: Determine the key of Map function
>
> Ah, as I said before, I have no experience at programming MapReduce. So,
> can you give me some documents or websites or something about programming
> the thing you said above? ("Thousand things start hard" - VietNam)
> Thanks so much ^^!
>
> Vào 10:54 Ngày 24 tháng 4 năm 2012, Lac Trung <trungnb3535@gmail.com> đã
> viết:
>
> > Thanks Jay so much !
> > I will try this.
> > ^^
> >
> > Vào 10:52 Ngày 24 tháng 4 năm 2012, Jay Vyas <jayunit100@gmail.com> đã
> > viết:
> >
> > Ahh... Well than the key will be teacher, and the value will simply be
> >>
> >> <-1 * # students, class_id> .
> >>
> >> Then, you will see in the reducer that the first 3 entries will always
> be
> >> the ones you wanted.
> >>
> >> On Mon, Apr 23, 2012 at 10:17 PM, Lac Trung <trungnb3535@gmail.com>
> >> wrote:
> >>
> >> > Hi Jay !
> >> > I think it's a bit difference here. I want to get 30 classId for each
> >> > teacherId that have most students.
> >> > For example : get 3 classId.
> >> > (File1)
> >> > 1) Teacher1, Class11, 30
> >> > 2) Teacher1, Class12, 29
> >> > 3) Teacher1, Class13, 28
> >> > 4) Teacher1, Class14, 27
> >> > ... n ...
> >> >
> >> > n+1) Teacher2, Class21, 45
> >> > n+2) Teacher2, Class22, 44
> >> > n+3) Teacher2, Class23, 43
> >> > n+4) Teacher2, Class24, 42
> >> > ... n+m ...
> >> >
> >> > => return 3 line 1, 2, 3 for Teacher1 and line n+1, n+2, n+3 for
> >> Teacher2
> >> >
> >> >
> >> > Vào 09:52 Ngày 24 tháng 4 năm 2012, Jay Vyas <jayunit100@gmail.com>
> đã
> >> > viết:
> >> >
> >> > > Its somewhat tricky to understand exactly what you need from your
> >> > > explanation, but I believe you want teachers who have the most
> >> students
> >> > in
> >> > > a given class.  So for English, i have 10 teachers teaching the
> class
> >> -
> >> > and
> >> > > i want the ones with the highes # of students.
> >> > >
> >> > > You can output key= <classid>, value=<-1*#ofstudent,teacherid>
as
> the
> >> > > values.
> >> > >
> >> > > The values will then be sorted, by # of students.  You can thus pick
> >> > > teacher in the the first value of your reducer, and that will be the
> >> > > teacher for class id = xyz , with the highes number of students.
> >> > >
> >> > > You can also be smart in your mapper by running a combiner to remove
> >> the
> >> > > teacherids who are clearly not maximal.
> >> > >
> >> > > On Mon, Apr 23, 2012 at 9:38 PM, Lac Trung <trungnb3535@gmail.com>
> >> > wrote:
> >> > >
> >> > > > Hello everyone !
> >> > > >
> >> > > > I have a problem with MapReduce [:(] like that :
> >> > > > I have 4 file input with 3 fields : teacherId, classId,
> >> numberOfStudent
> >> > > > (numberOfStudent is ordered by desc for each teach)
> >> > > > Output is top 30 classId that numberOfStudent is max for each
> >> teacher.
> >> > > > My approach is MapReduce like Wordcount example. But I don't
know
> >> how
> >> > to
> >> > > > determine key for map function.
> >> > > > I run Wordcount example, understand its code but I have no
> >> experience
> >> > at
> >> > > > programming MapReduce.
> >> > > >
> >> > > > Can anyone help me to resolve this problem ?
> >> > > > Thanks so much !
> >> > > >
> >> > > >
> >> > > > --
> >> > > > Lạc Trung
> >> > > > 20083535
> >> > > >
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Jay Vyas
> >> > > MMSB/UCHC
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Lạc Trung
> >> > 20083535
> >> >
> >>
> >>
> >>
> >> --
> >> Jay Vyas
> >> MMSB/UCHC
> >>
> >
> >
> >
> > --
> > Lạc Trung
> > 20083535
> >
> >
>
>
> --
> Lạc Trung
> 20083535
>



-- 
Lạc Trung
20083535

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message