hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Geoff Hendrey" <ghend...@decarta.com>
Subject RE: getSplits question
Date Thu, 10 Feb 2011 07:46:09 GMT
Oh, I definitely don't *need* my own to run mapreduce. However, if I want to control the number
of records handled by each mapper (splitsize) and the startrow and endrow, then I thought
I had to write my own getSplits(). Is there another way to accomplish this, because I do need
the combination of controlled splitsize and start/endrow.

-geoff

-----Original Message-----
From: Ryan Rawson [mailto:ryanobjc@gmail.com] 
Sent: Wednesday, February 09, 2011 11:43 PM
To: user@hbase.apache.org
Cc: hbase-user@hadoop.apache.org
Subject: Re: getSplits question

You shouldn't need to write your own getSplits() method to run a map
reduce, I never did at least...

-ryan

On Wed, Feb 9, 2011 at 11:36 PM, Geoff Hendrey <ghendrey@decarta.com> wrote:
> Are endrows inclusive or exclusive? The docs say exclusive, but then the
> question arises as to how to form the last split for getSplits(). The
> code below runs fine, but I believe it is omitting some rows, perhaps
> b/c of the exclusive end row. For the final split, should the endrow be
> null? I tried that, and got what appeared to be a final split without an
> endrow at all. Would appreciate a pointer to the correct implementation
> of getSplits in which I desire to provide a startrow, endrow, and
> splitsize. Apparently this isn't it J :
>
>
>
> int splitSize = context.getConfiguration().getInt("splitsize", 1000);
>
>                byte[] splitStop = null;
>
>                String hostname = null;
>
>                while ((results = resultScanner.next(splitSize)).length
>> 0) {
>
>                    //   System.out.println("results
> :-------------------------- "+results);
>
>                    byte[] splitStart = results[0].getRow();
>
>                    splitStop = results[results.length - 1].getRow();
> //I think this is a problem...we don't actually include this row in the
> split since it's exclusive..revisit this and correct
>
>                    HRegionLocation location =
> table.getRegionLocation(splitStart);
>
>                    hostname =
> location.getServerAddress().getHostname();
>
>                    InputSplit split = new
> TableSplit(table.getTableName(), splitStart, splitStop, hostname);
>
>                    splits.add(split);
>
>                    System.out.println("initializing splits: " +
> split.toString());
>
>                }
>
>                resultScanner.close();
>
>
>
>
>
> -g
>
>

Mime
View raw message