predictionio-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Chan <yukhei.c...@gmail.com>
Subject Re: Setup PredictionIO for large events
Date Tue, 06 Sep 2016 17:58:58 GMT
It's been a while since I last did that, so I might be wrong / outdated. I
think it is in the HBase web UI (so port 60010 by default) and from there
you can split the regions into more. If that's not the case you can look up
ways to do it with "hbase split region".

On Mon, Sep 5, 2016 at 11:08 PM, Digambar Bhat <digambarbhat14@gmail.com>
wrote:

> Thanks Tom for reply.
>
> I checked no. of Cores. There two CPUs with 10 cores of each. Also
> virtualization is enabled so we get 40 CPUs in total. And number of regions
> for app table is 2.
>
> So may I know how to increase regions for app table?
>
> On 06-Sep-2016 10:40 am, "Tom Chan" <yukhei.chan@gmail.com> wrote:
>
>> One quick thing to check is the number of regions in the HBase table for
>> your app. If it's less than the number of cores you have then you won't be
>> utilizing all computing power. Hope this helps.
>>
>> Tom
>>
>> On Sep 5, 2016 9:05 PM, "Digambar Bhat" <digambarbhat14@gmail.com> wrote:
>>
>>> Update please..
>>>
>>> On 30-Aug-2016 8:06 pm, "Digambar Bhat" <digambarbhat14@gmail.com>
>>> wrote:
>>>
>>>> I am using Universal Recommender.
>>>>
>>>> On 30-Aug-2016 8:05 pm, "Pat Ferrel" <pat@occamsmachete.com> wrote:
>>>>
>>>>> Training time is also template dependent, what template are you using?
>>>>>
>>>>> On Aug 30, 2016, at 12:21 AM, Digambar Bhat <digambarbhat14@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> I am using PredictionIO since last one  year. It's working fine for
>>>>> me.
>>>>>
>>>>> Earlier importing, training was working flawlessly. But now training
>>>>> is very slow as events are increased. Training almost taking 9-10 hours.
>>>>>
>>>>> Currently, events are about 15 million and items are about 10 million.
>>>>>
>>>>> Architecture is like below:
>>>>> Spark and elastic search is on two machines. Hadoop and hbase is on
>>>>> another two separate machines.
>>>>>
>>>>> Each machine has following configuration:
>>>>> 160GB ram, CPUs 40, Cores per socket 10, cpu MHz 3000
>>>>>
>>>>> So please let me know what is right configuration for such large
>>>>> events. Also let me know what possibility should I consider as my events
>>>>> are going to increase to billion. Will it work for such large data set?
>>>>>
>>>>> Thanks in advance.
>>>>>
>>>>> Thanks,
>>>>> Digambar
>>>>>
>>>>>

Mime
View raw message