hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kleegrewe, Christian" <christian.kleegr...@siemens.com>
Subject AW: Best number of mappers and reducers when processing data to and from HBase?
Date Mon, 20 Oct 2014 14:17:42 GMT
Hallo Rolf,

in der letzten Oktober Woche, aber ich muss glaub ich nicht die ganze Zeit dabei sein.

Mit freundlichen Grüßen
Christian Kleegrewe

Siemens AG
Corporate Technology
Research and Technology Center
Otto-Hahn-Ring 6
81739 München, Deutschland
Tel.: +49 89 636-633785

Siemens Aktiengesellschaft: Vorsitzender des Aufsichtsrats: Gerhard Cromme; Vorstand: Joe
Kaeser, Vorsitzender; Roland Busch, Lisa Davis, Klaus Helmrich, Hermann Requardt, Siegfried
Russwurm, Ralf P. Thomas; Sitz der Gesellschaft: Berlin und München, Deutschland; Registergericht:
Berlin Charlottenburg, HRB 12300, München, HRB 6684; WEEE-Reg.-Nr. DE 23691322
-----Ursprüngliche Nachricht-----
Von: peterm_second [mailto:regestrer@gmail.com] 
Gesendet: Montag, 20. Oktober 2014 16:09
An: user@hadoop.apache.org
Betreff: Best number of mappers and reducers when processing data to and from HBase?

Hi Guys,
I have a somewhat abstract question to ask. I am reading data from Hbase and I was wondering
how am I to know what's the best mapper and reducer count, I mean what are the criteria that
need to be taken into consideration when determining the mapper and reducer counts. My MR
job is reeding data from a Hbase table, said data is processed in the mapper and the reducer
takes the data and outputs some stuff to another Hbase table. I want to be able to dinamicly
deduce what's the correct number of mappers to initially process the data (actually map it
to a specific criterion ) and the reducers to later do some other magic on it and output a
new dataset which then saved to a new Hbase Table. I've read that when reading data from files
I should have something like 10 mappers per DFS block, but I have no clue how to translate
that in my case where the input is a HBase table. Any ideas would be appreciated, even if
it's a book or an article I should read.

View raw message