mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manuel Blechschmidt <>
Subject Re: Which database should I use with Mahout
Date Sun, 19 May 2013 18:01:37 GMT
Hi Tevfik,
one request to the recommender could become more then 1000 queries to the database depending
on which recommender you use and the amount of preferences for the given user.

The problem is not if you are using SQL, NoSQL, or any other query language. The problem is
the latency of the answers.

An average tcp package in the same data center takes 500 µs. A main memory reference 0,1
µs. This means that your main memory of your java process can be accessed 5000 times faster
then any other process like a database connected via TCP/IP.

Here you can see a screenshot that shows that database communication is by far (99%) the slowest
component of a recommender request:

If you do not want to cache your data in your Java process you can use a complete in memory
database technology like SAP HANA or EXASOL

Nevertheless if you are using these you do not need Mahout anymore.

An architecture of a Mahout system can be seen here:

Hope that helps

Am 19.05.2013 um 19:20 schrieb Sean Owen:

> I'm first saying that you really don't want to use the database as a
> data model directly. It is far too slow.
> Instead you want to use a data model implementation that reads all of
> the data, once, serially, into memory. And in that case, it makes no
> difference where the data is being read from, because it is read just
> once, serially. A file is just as fine as a fancy database. In fact
> it's probably easier and faster.
> On Sun, May 19, 2013 at 10:14 AM, Tevfik Aytekin
> <> wrote:
>> Thanks Sean, but I could not get your answer. Can you please explain it again?
>> On Sun, May 19, 2013 at 8:00 PM, Sean Owen <> wrote:
>>> It doesn't matter, in the sense that it is never going to be fast
>>> enough for real-time at any reasonable scale if actually run off a
>>> database directly. One operation results in thousands of queries. It's
>>> going to read data into memory anyway and cache it there. So, whatever
>>> is easiest for you. The simplest solution is a file.
>>> On Sun, May 19, 2013 at 9:52 AM, Ahmet Ylmaz
>>> <> wrote:
>>>> Hi,
>>>> I would like to use Mahout to make recommendations on my web site. Since
the data is going to be big, hopefully, I plan to use hadoop implementations of the recommender
>>>> I'm currently storing the data in mysql. Should I continue with it or should
I switch to a nosql database such as mongodb or something else?
>>>> Thanks
>>>> Ahmet

Manuel Blechschmidt
M.Sc. IT Systems Engineering
Dortustr. 57
14467 Potsdam
Mobil: 0173/6322621

View raw message