Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (nike.apache.org: domain of ghendrey@decarta.com designates
 208.81.204.160 as permitted sender)
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Subject: RE: getSplits question
Date: Wed, 9 Feb 2011 23:46:09 -0800
Message-ID: 
 <6C5C1804772DB944BA88A0DC48D338DA0A5DED63@dct-mail.sanjose.telcontar.com>
In-Reply-To: <AANLkTi=1YVihV031crBevJZwFkcvWaXno9AhriCk7ru4@mail.gmail.com>
Thread-Topic: getSplits question
Thread-Index: AcvI9jwBvSX5wRqBSFiPHYKuCSYJUQAAA/GA
References: 
 <6C5C1804772DB944BA88A0DC48D338DA0A5DED60@dct-mail.sanjose.telcontar.com>
 <AANLkTi=1YVihV031crBevJZwFkcvWaXno9AhriCk7ru4@mail.gmail.com>
From: "Geoff Hendrey" <ghendrey@decarta.com>
To: <user@hbase.apache.org>
Cc: <hbase-user@hadoop.apache.org>

Oh, I definitely don't *need* my own to run mapreduce. However, if I =
want to control the number of records handled by each mapper (splitsize) =
and the startrow and endrow, then I thought I had to write my own =
getSplits(). Is there another way to accomplish this, because I do need =
the combination of controlled splitsize and start/endrow.

-geoff

-----Original Message-----
From: Ryan Rawson [mailto:ryanobjc@gmail.com]=20
Sent: Wednesday, February 09, 2011 11:43 PM
To: user@hbase.apache.org
Cc: hbase-user@hadoop.apache.org
Subject: Re: getSplits question

You shouldn't need to write your own getSplits() method to run a map
reduce, I never did at least...

-ryan

On Wed, Feb 9, 2011 at 11:36 PM, Geoff Hendrey <ghendrey@decarta.com> =
wrote:
> Are endrows inclusive or exclusive? The docs say exclusive, but then =
the
> question arises as to how to form the last split for getSplits(). The
> code below runs fine, but I believe it is omitting some rows, perhaps
> b/c of the exclusive end row. For the final split, should the endrow =
be
> null? I tried that, and got what appeared to be a final split without =
an
> endrow at all. Would appreciate a pointer to the correct =
implementation
> of getSplits in which I desire to provide a startrow, endrow, and
> splitsize. Apparently this isn't it J :
>
>
>
> int splitSize =3D context.getConfiguration().getInt("splitsize", =
1000);
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0byte[] splitStop =3D null;
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0String hostname =3D null;
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0while ((results =3D =
resultScanner.next(splitSize)).length
>> 0) {
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0// =A0 =
System.out.println("results
> :-------------------------- "+results);
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0byte[] splitStart =3D =
results[0].getRow();
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0splitStop =3D =
results[results.length - 1].getRow();
> //I think this is a problem...we don't actually include this row in =
the
> split since it's exclusive..revisit this and correct
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0HRegionLocation location =3D
> table.getRegionLocation(splitStart);
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0hostname =3D
> location.getServerAddress().getHostname();
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0InputSplit split =3D new
> TableSplit(table.getTableName(), splitStart, splitStop, hostname);
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0splits.add(split);
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0System.out.println("initializing splits: " +
> split.toString());
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0}
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0resultScanner.close();
>
>
>
>
>
> -g
>
>