Return-Path: X-Original-To: apmail-mahout-dev-archive@www.apache.org Delivered-To: apmail-mahout-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DD166CE74 for ; Tue, 16 Jul 2013 18:46:51 +0000 (UTC) Received: (qmail 21250 invoked by uid 500); 16 Jul 2013 18:46:50 -0000 Delivered-To: apmail-mahout-dev-archive@mahout.apache.org Received: (qmail 21200 invoked by uid 500); 16 Jul 2013 18:46:50 -0000 Mailing-List: contact dev-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mahout.apache.org Delivered-To: mailing list dev@mahout.apache.org Received: (qmail 21175 invoked by uid 99); 16 Jul 2013 18:46:49 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Jul 2013 18:46:49 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ted.dunning@gmail.com designates 209.85.223.176 as permitted sender) Received: from [209.85.223.176] (HELO mail-ie0-f176.google.com) (209.85.223.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Jul 2013 18:46:44 +0000 Received: by mail-ie0-f176.google.com with SMTP id ar20so2373641iec.35 for ; Tue, 16 Jul 2013 11:46:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=Ii6qdAhHSk8239JNLQkh0FGufzgsJhxijCEyUyHLWn8=; b=xgBvy/ojrD3Z2qI/HE8ZCsFRAFnBCpWHlyahnm6XJWTRjRGACWr5aI2PZeCrG5Dno8 7ZEFLgymVh9Ho58mL6TS9wFcqu3EPVdnOLAUNZluJLPBf0xGIy/4RvpXxLdJ8ujmRrsM QwOrFOyLwI78Q7OHQJaaab2VC2tvlDqu77gmvrt9EIbZ3L7S9QqybLqQ6j7rQ4GtNXs6 bo7WgOD7eGoFPfE5n/WWBmlFSEM09yd4g2VbC4TTeAWkLiLSXnRKXMZNw2J8UyTMUteU q6Ljb7yoI3Nc20SOGSPfCcKngd3jFTEiwwsrbTgaFJg/3eO6BBvv7NbXUFlDZxn+LIaW uH/w== X-Received: by 10.42.222.135 with SMTP id ig7mr2312722icb.69.1374000383057; Tue, 16 Jul 2013 11:46:23 -0700 (PDT) MIME-Version: 1.0 Received: by 10.64.67.1 with HTTP; Tue, 16 Jul 2013 11:45:52 -0700 (PDT) In-Reply-To: <51E5646F.3010607@uowmail.edu.au> References: <51E4D4A1.3040505@googlemail.com> <51E5646F.3010607@uowmail.edu.au> From: Ted Dunning Date: Tue, 16 Jul 2013 11:45:52 -0700 Message-ID: Subject: Re: Regarding Online Recommenders To: Mahout Dev List Content-Type: multipart/alternative; boundary=001a113322acc0e61004e1a56480 X-Virus-Checked: Checked by ClamAV on apache.org --001a113322acc0e61004e1a56480 Content-Type: text/plain; charset=UTF-8 Netflix is a small dataset. 12G for that seems quite excessive. Note also that this is before you have done any work. Ideally, 100million observations should take << 1GB. On Tue, Jul 16, 2013 at 8:19 AM, Peng Cheng wrote: > The second idea is indeed splendid, we should separate time-complexity > first and space-complexity first implementation. What I'm not quite sure, > is that if we really need to create two interfaces instead of one. > Personally, I think 12G heap space is not that high right? Most new laptop > can already handle that (emphasis on laptop). And if we replace hash map > (the culprit of high memory consumption) with list/linkedList, it would > simply degrade time complexity for a linear search to O(n), not too bad > either. The current DataModel is a result of careful thoughts and has > underwent extensive test, it is easier to expand on top of it instead of > subverting it. --001a113322acc0e61004e1a56480--