predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <...@occamsmachete.com>
Subject Trouble with HBase and PEventStore.aggregateProperties
Date Fri, 28 Oct 2016 15:30:45 GMT
The use of PEventStore.aggregateProperties as shown below causes errors when a large cluster
is making the query. It seems to cause a full DB scan, which results in timeouts. This may
be because nearly 400 (parallelism of the cluster) threads are making requests. But should
this result in a full scan? Is there a better want to get all "item" properties? Is it possible
to index a column that would make this mor efficient? Any ideas would be appreciated. This
makes frequent or fast training on near Tb data impossible.

    val fieldsRDD: RDD[(ItemID, PropertyMap)] = PEventStore.aggregateProperties(
      appName = dsp.appName,
      entityType = "item")(sc)

BTW: If I reduce parallelism in Spark it slows other parts of the algorithm unacceptably.
I have also experimented with very large RPC/Scanner timeouts of many minutes—to no avail.

Job aborted due to stage failure: Task 44 in stage 147.0 failed 4 times, most recent failure:
Lost task 44.3 in stage 147.0 (TID 24833, ip-172-16-3-9.eu-central-1.compute.internal): org.apache.hadoop.hbase.DoNotRetryIOException:
Failed after retry of OutOfOrderScannerNextException: was there a rpc timeout?+details
Job aborted due to stage failure: Task 44 in stage 147.0 failed 4 times, most recent failure:
Lost task 44.3 in stage 147.0 (TID 24833, ip-172-16-3-9.eu-central-1.compute.internal): org.apache.hadoop.hbase.DoNotRetryIOException:
Failed after retry of OutOfOrderScannerNextException: was there a rpc timeout? at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:403)
at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:232)
at org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:138)
at 


Mime
View raw message