hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Allan Yang <allan...@apache.org>
Subject Re: Difference between ResultScanner and initTableMapperJob
Date Tue, 11 Jul 2017 05:27:36 GMT
You should pay more attention to the RetriesExhaustedException exception.
As you have already found out that the important difference between normal
scan and MapReduce job is that the mapreduce job will end up in many
concurrent tasks( 1200 maps as you pointed out). Maybe the 1200 tasks
running in parallel create a big load for client/server and result in the
retry exhausted exception. Anyway, still need more log here to locate the
root problem.

Best Regards
Allan Yang

2017-07-11 11:50 GMT+08:00 Ted Yu <yuzhihong@gmail.com>:

> bq. for hbase-client/hbase-server version 1.0.1 and 1.2.0-cdh5.7.2.
>
> You mean the error occurred for both versions or, client is on 1.0.1 and
> server is on 1.2.0 ?
>
> There should be more to the RetriesExhaustedException.
> Can you pastebin the full stack trace ?
>
> Cheers
>
> On Mon, Jul 10, 2017 at 2:21 PM, S L <slouie.at.work@gmail.com> wrote:
>
> > I hope someone can tell me what the difference between these two API
> calls
> > are.  I'm getting weird results between the two of them.  This is
> happening
> > for hbase-client/hbase-server version 1.0.1 and 1.2.0-cdh5.7.2.
> >
> > First off, my rowkeys are in the format hash_name_timestamp
> > e.g. 100_servername_1234567890.  The hbase table has a TTL of 30 days so
> > things older than 30 days should disappear after compaction.
> >
> > The following is code for using ResultScanner.  It doesn't use MapReduce
> so
> > it takes a very long time to complete.  I can't run my job this way
> because
> > it takes too long.  However, for debugging purposes, I don't have any
> > problems with this method.  It lists all keys for the specified time
> range,
> > which look valid to me since all the timestamps of the returned keys are
> > within the past 30 days and within the specified time range:
> >
> >     Scan scan = new Scan();
> >     scan.addColumn(Bytes.toBytes("raw_data"), Bytes.toBytes(fileType));
> >     scan.setCaching(500);
> >     scan.setCacheBlocks(false);
> >     scan.setTimeRange(start, end);
> >
> >     Connection fConnection = ConnectionFactory.createConnection(conf);
> >     Table table = fConnection.getTable(TableName.valueOf(tableName));
> >     ResultScanner scanner = table.getScanner(scan);
> >     for (Result result = scanner.next(); result != null; result =
> > scanner.next()) {
> >        System.out.println("Found row: " + Bytes.toString(result.getRow()
> > ));
> >     }
> >
> >
> > The follow code doesn't work but it uses MapReduce, which runs way faster
> > than using the ResultScanner way, since it divides things up into 1200
> > maps.  The problem is I'm getting rowkeys that should have disappeared
> due
> > to TTL expiring:
> >
> >     Scan scan = new Scan();
> >     scan.addColumn(Bytes.toBytes("raw_data"), Bytes.toBytes(fileType));
> >     scan.setCaching(500);
> >     scan.setCacheBlocks(false);
> >     scan.setTimeRange(start, end);
> > TableMapReduceUtil.initTableMapperJob(tableName, scan, MTTRMapper.class,
> > Text.class, IntWritable.class, job);
> >
> > Here is the error that I get, which eventually kills the whole MR job
> later
> > because over 25% of the mappers failed.
> >
> > > Error: org.apache.hadoop.hbase.client.RetriesExhaustedException:
> > > Failed after attempts=36, exceptions: Wed Jun 28 13:46:57 PDT 2017,
> > > null, java.net.SocketTimeoutException: callTimeout=120000,
> > > callDuration=120301: row '65_app129041.iad1.mydomain.com_1476641940'
> > > on table 'server_based_data' at region=server_based_data
> >
> > I'll try to study the code for the hbase-client and hbase-server jars but
> > hopefully someone will know offhand what the difference between the
> methods
> > are and what is causing the initTableMapperJob call to fail.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message