hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alok Kumar <alok...@gmail.com>
Subject Re: Reading fields from a Text line
Date Thu, 02 Aug 2012 10:52:23 GMT
Hi Tariq,

Is your file splittable? If it's not, Mapper will process entire file in
one go!
http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.html#isSplitable%28org.apache.hadoop.mapreduce.JobContext,%20org.apache.hadoop.fs.Path%29

How many mappers being created? See if that helps.

Regards,
Alok

On Thu, Aug 2, 2012 at 3:48 PM, Mohammad Tariq <dontariq@gmail.com> wrote:
> Thanks for the response Harsh n Sri. Actually, I was trying to prepare
> a template for my application using which I was trying to read one
> line at a time, extract the first field from it and emit that
> extracted value from the mapper. I have these few lines of code for
> that :
>
> public static class XPTMapper extends Mapper<IntWritable, Text,
> LongWritable, Text>{
>
>                 public void map(LongWritable key, Text value, Context
context)
> throws IOException, InterruptedException{
>
>                         Text word = new Text();
>                         String line = value.toString();
>                         if (!line.startsWith("TT")){
>                                 context.setStatus("INVALID
LINE..SKIPPING........");
>                         }else{
>                                 String stdid = line.substring(0, 7);
>                                 word.set(stdid);
>                                 context.write(key, word);
>                         }
>                 }
>
> But the output file contains all the rows of the input file including
> the lines which I was expecting to get skipped. Also, I was expecting
> only the fields I am emitting but the file contains entire lines.
> Could you guys please point out the the mistake I might have made.
> (Pardon my ignorance, as I am not very good at MapReduce).Many thanks.
>
> Regards,
>     Mohammad Tariq
>
>
> On Thu, Aug 2, 2012 at 10:58 AM, Sriram Ramachandrasekaran
> <sri.rams85@gmail.com> wrote:
>> Wouldn't it be better if you could skip those unwanted lines
>> upfront(preprocess) and have a file which is ready to be processed by
the MR
>> system? In any case, more details are needed.
>>
>>
>> On Thu, Aug 2, 2012 at 8:23 AM, Harsh J <harsh@cloudera.com> wrote:
>>>
>>> Mohammad,
>>>
>>> > But it seems I am not doing  things in correct way. Need some
guidance.
>>>
>>> What do you mean by the above? What is your written code exactly
>>> expected to do and what is it not doing? Perhaps since you ask for a
>>> code question here, can you share it with us (pastebin or gists,
>>> etc.)?
>>>
>>> For skipping 8 lines, if you are using splits, you need to detect
>>> within the mapper or your record reader if the map task filesplit has
>>> an offset of 0 and skip 8 line reads if so (Cause its the first split
>>> of some file).
>>>
>>> On Thu, Aug 2, 2012 at 1:54 AM, Mohammad Tariq <dontariq@gmail.com>
wrote:
>>> > Hello list,
>>> >
>>> >        I have a flat file in which data is stored as lines of 107
>>> > bytes each. I need to skip the first 8 lines(as they don't contain any
>>> > valuable info). Thereafter, I have to read each line and extract the
>>> > information from them, but not the line as a whole. Each line is
>>> > composed of several fields without any delimiter between them. For
>>> > example, the first field is of 8 bytes, second of 2 bytes and so on. I
>>> > was trying to reach each line as a Text value, convert it into string
>>> > and using String.subring() method to extract the value of each field.
>>> > But it seems I am not doing  things in correct way. Need some
>>> > guidance. Many thanks.
>>> >
>>> > Regards,
>>> >     Mohammad Tariq
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>>
>>
>>
>> --
>> It's just about how deep your longing is!
>>



-- 
Alok Kumar

Mime
View raw message