hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HBASE-559) MR example job to count table rows
Date Fri, 04 Apr 2008 18:09:24 GMT

     [ https://issues.apache.org/jira/browse/HBASE-559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

stack updated HBASE-559:
------------------------

    Attachment: 559-0.1-v2.patch

Attached is a tested patch.  Also makes it so hbase.jar is now a hadoop job jar.   There's
a Driver under mapred.  Add MR jobs to its list to make it so can do:

{code}
./bin/hadoop -jar hbase.jar
{code}

... and you'll get a list of the hbase MR jobs.

Here is how our dumb rowcounter looks:

{code}
durruti$ ./bin/hbase org.apache.hadoop.hbase.mapred.Driver rowcounter /tmp/output x x:
08/04/04 10:59:58 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker,
sessionId=
08/04/04 10:59:58 WARN mapred.JobClient: No job jar file set.  User classes may not be found.
See JobConf(Class) or JobConf#setJar(String).
08/04/04 10:59:58 INFO mapred.JobClient: Running job: job_local_1
08/04/04 10:59:58 INFO mapred.MapTask: numReduceTasks: 1
08/04/04 10:59:58 INFO hbase.HTable: Creating scanner over x starting at key 
08/04/04 10:59:58 INFO mapred.LocalJobRunner: 
08/04/04 10:59:58 INFO mapred.TaskRunner: Task 'job_local_1_map_0000' done.
08/04/04 10:59:58 INFO mapred.LocalJobRunner: reduce > reduce
08/04/04 10:59:58 INFO mapred.TaskRunner: Task 'reduce_qv8ybc' done.
08/04/04 10:59:58 INFO mapred.TaskRunner: Saved output of task 'reduce_qv8ybc' to file:/tmp/output
08/04/04 10:59:59 INFO mapred.JobClient: Job complete: job_local_1
08/04/04 10:59:59 INFO mapred.JobClient: Counters: 10
08/04/04 10:59:59 INFO mapred.JobClient:   RowCounter
08/04/04 10:59:59 INFO mapred.JobClient:     Rows=1
08/04/04 10:59:59 INFO mapred.JobClient:   Map-Reduce Framework
08/04/04 10:59:59 INFO mapred.JobClient:     Map input records=1
08/04/04 10:59:59 INFO mapred.JobClient:     Map output records=1
08/04/04 10:59:59 INFO mapred.JobClient:     Map input bytes=0
08/04/04 10:59:59 INFO mapred.JobClient:     Map output bytes=7
08/04/04 10:59:59 INFO mapred.JobClient:     Combine input records=0
08/04/04 10:59:59 INFO mapred.JobClient:     Combine output records=0
08/04/04 10:59:59 INFO mapred.JobClient:     Reduce input groups=1
08/04/04 10:59:59 INFO mapred.JobClient:     Reduce input records=1
08/04/04 10:59:59 INFO mapred.JobClient:     Reduce output records=1
{code}

Here is commit comment:

{code}
M build.xml
    (Jar target): Add copying of any properties files under src/java
    Also added Main-Class to manifest.
M  src/java/org/apache/hadoop/hbase/mapred/RowCounter_Counters.properties
    Added resource so MR can print out counters for RowCounter MR job
M  src/java/org/apache/hadoop/hbase/mapred/RowCounter.java
    Example, simple MR job that counts non-empty rows.
M  src/java/org/apache/hadoop/hbase/mapred/Driver.java
    Driver class. General entry point for hbase MR jobs.
{code}

Will apply to branch after 0.1.1 goes out.  Will apply to TRUNK when MR API settles.

> MR example job to count table rows
> ----------------------------------
>
>                 Key: HBASE-559
>                 URL: https://issues.apache.org/jira/browse/HBASE-559
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>         Attachments: 559-0.1-v2.patch, 559.patch
>
>
> The Lars' import is a little messy; he's not sure how many records were imported.  Running
a select takes a couple of hours.  He happens to have an idle MR cluster standing by.  An
example MR job that just did a count of records would be generally useful.  Could even output
row keys so you'd have a list of what made it in.   Later, if this tool becomes popular with
derivatives and similiars, we can bundle a jar of MR jobs to run against your tables that
can answer common queries and that are amenable to subclassing/modification.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message