hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bejoy KS" <bejoy...@yahoo.com>
Subject Re: Hive mapper creation
Date Thu, 28 Jun 2012 19:07:53 GMT
Hi Mohammed

Splits are associated with MapReduce framework and not necessarily with hive. It is the data
processed by a mapper. Based on your InputFormat, min and max split size properties MR framework
considers hdfs blocks that a mapper should process.( It can be just one block or more if CombineFileInputFormat
is used.) This choice of which all hdfs blocks forms a split is determined under the consideration
of data locality. Number of mappers/map tasks created by a job is equal to the number of splits
thus determined. ie one map task per split.

Hope it is clear. Feel free to revert if you still have any queries.


Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Mohammad Tariq <dontariq@gmail.com>
Date: Fri, 29 Jun 2012 00:29:13 
To: <user@hive.apache.org>; <bejoy_ks@yahoo.com>
Reply-To: user@hive.apache.org
Subject: Re: Hive mapper creation

Hello Nitin, Bejoy,

        Thanks a lot for the quick response. Could you please tell me
what is the default criterion of split creation??How the splits for a
Hive query are created??(Pardon my ignorance).

Regards,
    Mohammad Tariq


On Fri, Jun 29, 2012 at 12:22 AM, Bejoy KS <bejoy_ks@yahoo.com> wrote:
> Hi Mohammed
>
> Internally In hive the processing is done using MapReduce. So like in mapreduce the splits
are calculated on job submission and a mapper is assigned per split. So a mapper ideally process
a split and not a row.
>
> You can store data in various formats as text, sequence files, RC files etc. No restriction
just on text files.
>
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
>
> -----Original Message-----
> From: Mohammad Tariq <dontariq@gmail.com>
> Date: Fri, 29 Jun 2012 00:17:05
> To: user<user@hive.apache.org>
> Reply-To: user@hive.apache.org
> Subject: Hive mapper creation
>
> Hello list,
>
>         Since Hive tables are assumed to be of text input format, is
> it right to assume that a mapper is created per row of a particular
> table??Please correct me if my understanding is wrong. Also let me
> know how mappers are created corresponding to a Hive query. Many
> thanks.
>
> Regards,
>     Mohammad Tariq
Mime
View raw message