hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans" <jdcry...@gmail.com>
Subject Re: newbie - map reduce not distributing
Date Fri, 01 Aug 2008 00:42:23 GMT
Dru,

There is something truly weird with your setup. I would advise running your
code (the simple one that only logs the rows) with DEBUG on. See the
faq<http://wiki.apache.org/hadoop/Hbase/FAQ#5>on how to do it. Then
get back with syslog and stdout. This way we will have
more informations on how scanners are handling this.

Also FYI, I ran the same code as yours with 0.2.0 on my setup and had no
problems.

J-D

On Thu, Jul 31, 2008 at 7:06 PM, Dru Jensen <drujensen@gmail.com> wrote:

> UPDATE:  I modified the RowCounter example and verified that it is sending
> the same row to multiple map tasks also. Is this a known bug or am I doing
> something truly as(s)inine?  Any help is appreciated.
>
>
> On Jul 30, 2008, at 3:02 PM, Dru Jensen wrote:
>
>  J-D,
>>
>> Again, thank you for your help on this.
>>
>> hitting the HBASE Master port 60010:
>> System 1 - 2 regions
>> System 2 - 1 region
>> System 3 - 3 regions
>>
>> In order to demonstrate the behavior I'm seeing, I wrote a test class.
>>
>> public class Test extends Configured implements Tool {
>>
>>    public static class Map extends TableMap {
>>
>>        @Override
>>        public void map(ImmutableBytesWritable key, RowResult row,
>> OutputCollector output, Reporter r) throws IOException {
>>
>>            String key_str = new String(key.get());
>>            System.out.println("map: key = " + key_str);
>>        }
>>
>>    }
>>
>>    public class Reduce extends TableReduce {
>>
>>        @Override
>>        public void reduce(WritableComparable key, Iterator values,
>> OutputCollector output, Reporter r) throws IOException {
>>
>>        }
>>
>>    }
>>
>>    public int run(String[] args) throws Exception {
>>        JobConf job = new JobConf(getConf(), Test.class);
>>        job.setJobName("Test");
>>
>>        job.setNumMapTasks(4);
>>        job.setNumReduceTasks(1);
>>
>>        Map.initJob("test", "content:", Map.class, HStoreKey.class,
>> HbaseMapWritable.class, job);
>>        Reduce.initJob("test", Reduce.class, job);
>>
>>        JobClient.runJob(job);
>>        return 0;
>>    }
>>
>>    public static void main(String[] args) throws Exception {
>>        int res = ToolRunner.run(new Configuration(), new Test(), args);
>>        System.exit(res);
>>    }
>> }
>>
>> In hbase shell:
>> create 'test','content'
>> put 'test','test','content:test','testing'
>> put 'test','test2','content:test','testing2'
>>
>>
>> The Hadoop log results:
>> Task Logs: 'task_200807301447_0001_m_000000_0'
>>
>>
>>
>> stdout logs
>> map: key = test
>> map: key = test2
>>
>>
>> stderr logs
>>
>>
>> syslog logs
>> 2008-07-30 14:51:16,410 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
>> Initializing JVM Metrics with processName=MAP, sessionId=
>> 2008-07-30 14:51:16,507 INFO org.apache.hadoop.mapred.MapTask:
>> numReduceTasks: 1
>> 2008-07-30 14:51:17,120 INFO org.apache.hadoop.mapred.TaskRunner: Task
>> 'task_200807301447_0001_m_000000_0' done.
>>
>> Task Logs: 'task_200807301447_0001_m_000001_0'
>>
>>
>>
>> stdout logs
>> map: key = test
>> map: key = test2
>>
>>
>> stderr logs
>>
>>
>> syslog logs
>> 2008-07-30 14:51:16,410 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
>> Initializing JVM Metrics with processName=MAP, sessionId=
>> 2008-07-30 14:51:16,509 INFO org.apache.hadoop.mapred.MapTask:
>> numReduceTasks: 1
>> 2008-07-30 14:51:17,118 INFO org.apache.hadoop.mapred.TaskRunner: Task
>> 'task_200807301447_0001_m_000001_0' done.
>>
>> Tasks 3 and 4 are the same.
>>
>> Each map task is seeing the same rows.  Any help to prevent this is
>> appreciated.
>>
>> Thanks,
>> Dru
>>
>>
>> On Jul 30, 2008, at 2:22 PM, Jean-Daniel Cryans wrote:
>>
>>  Dru,
>>>
>>> It is not supposed to process many times the same rows. Can I see the log
>>> you're talking about? Also, how many regions do you have in your table?
>>> (info available in the web UI).
>>>
>>> thx
>>>
>>> J-D
>>>
>>> On Wed, Jul 30, 2008 at 5:04 PM, Dru Jensen <drujensen@gmail.com> wrote:
>>>
>>>  J-D,
>>>>
>>>> thanks for your quick response.   I have 4 mapping processes running on
>>>> 3
>>>> systems.
>>>>
>>>> Are the same rows being processed 4 times by each mapping processor?
>>>> According to the logs they are.
>>>>
>>>> When I run a map/reduce against a file, only one row gets logged per
>>>> mapper.  Why would this be different for hbase tables?
>>>>
>>>> I would think only one mapping process would process that one row and it
>>>> would only show up once in only one log.
>>>> preferable it would be the same system that has the region.
>>>>
>>>> I only want one row to be processed once.  Is there anyway to change
>>>> this
>>>> behavior without running only 1 mapper?
>>>>
>>>> thanks,
>>>> Dru
>>>>
>>>>
>>>> On Jul 30, 2008, at 1:44 PM, Jean-Daniel Cryans wrote:
>>>>
>>>> Dru,
>>>>
>>>>>
>>>>> The regions will split when achieving a certain threshold so if you
>>>>> want
>>>>> your computing to be distributed, you will have to have more data.
>>>>>
>>>>> Regards,
>>>>>
>>>>> J-D
>>>>>
>>>>> On Wed, Jul 30, 2008 at 4:36 PM, Dru Jensen <drujensen@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>>>
>>>>>> I created a map/reduce process by extending the TableMap and
>>>>>> TableReduce
>>>>>> API but for some reason
>>>>>> when I run multiple mappers, in the logs its showing that the same
>>>>>> rows
>>>>>> are
>>>>>> being processed by each Mapper.
>>>>>>
>>>>>> When I say logs, I mean in the hadoop task tracker (localhost:50030)
>>>>>> and
>>>>>> drilling down into the logs.
>>>>>>
>>>>>> Do I need to manually perform a TableSplit or is this supposed to
be
>>>>>> done
>>>>>> automatically?
>>>>>>
>>>>>> If its something I need to do manually, can someone point me to some
>>>>>> sample
>>>>>> code?
>>>>>>
>>>>>> If its supposed to be automatic and each mapper was supposed to get
>>>>>> its
>>>>>> own
>>>>>> set of rows,
>>>>>> should I write up a bug for this?  I using trunk 0.2.0 on hadoop
trunk
>>>>>> 0.17.2.
>>>>>>
>>>>>> thanks,
>>>>>> Dru
>>>>>>
>>>>>>
>>>>>>
>>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message