predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sachin Kamkar <sachinkam...@gmail.com>
Subject Hardware Configuration for Binary Classification using PIO
Date Thu, 26 Oct 2017 08:00:17 GMT
Hi Team,

Firstly, If I am posting to a wrong a group please direct me to the right
forum or mailing list. Thanks in advance.

Problem: Binary Classification
Number of Features: 10K - 20K
Number of documents to be trained: 1 Million
Model: https://github.com/EmergentOrder/template-scala-
probabilistic-classifier-batch-lbfgs
Recommended PIO version: 0.9.2

I am new to Prediction IO and I have done small predictions with ~100
features and 10k training set and I was able to run that using a 2 Core
16GB RAM server.

Now that my actual dataset is very huge, I don't know where to even start
in terms of configuration.

I need 3 suggestions

   - For my problem, have I chosen the correct model? As this model only
   runs on 0.9.2 and with 0.12 being the latest, am I spending energy on the
   wrong model?
      - Should I consider changing the code to be compatible 0.12?
   - What is the hardware that I should choose?
      - Should I have a dedicated Spark Cluster? If yes, with what config
      should I start off with?
      - How much memory should I set for the driver and executor?
   - How much time can I expect this training to take?


With Regards,

     Sachin
⚜KTBFFH⚜

Mime
View raw message