Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D60C96DB4 for ; Wed, 1 Jun 2011 06:02:02 +0000 (UTC) Received: (qmail 50739 invoked by uid 500); 1 Jun 2011 06:02:01 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 50701 invoked by uid 500); 1 Jun 2011 06:02:01 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 50692 invoked by uid 99); 1 Jun 2011 06:02:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Jun 2011 06:02:00 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ted.dunning@gmail.com designates 209.85.220.170 as permitted sender) Received: from [209.85.220.170] (HELO mail-vx0-f170.google.com) (209.85.220.170) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Jun 2011 06:01:53 +0000 Received: by mail-vx0-f170.google.com with SMTP id 40so9191283vxb.1 for ; Tue, 31 May 2011 23:01:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=TjiZyAVaghBs5AgSaIW13QKLXvu1XoZD4f2jGJ5M4dk=; b=nsI3E7QZUoqMDnXkpDNWM1X/oHExPzI0ki9z0k3aAUNzp9MLkSVATqLsppfLAYgAUW 7ZYt/eO2nEePQ2QzAkpsTD8PDNfNiiMDtTZOXwEw383+ntsyG0aoVJiv0Jhm44rSdqRT dFSj3gTSmyqqlqslLDgunEJkEcjM3HXgG/QKY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=wDPx25wakjQUJtgmMdwD5+/lsKN7K67HRxLQFFYDegQIvYKMYXyxiXQcpunbkDZ3aN 29OAFx9tzWDfXGRm5jvC7M/TljXX3NmAzfbo1kGqc0Pm+LgviCra1g5AkQlRWgXL4RFi 1dInJRQEECm/5g46+YJ7glM8GbLzrRpup91nE= Received: by 10.52.18.14 with SMTP id s14mr1609669vdd.164.1306908093046; Tue, 31 May 2011 23:01:33 -0700 (PDT) MIME-Version: 1.0 Received: by 10.52.110.101 with HTTP; Tue, 31 May 2011 23:01:13 -0700 (PDT) In-Reply-To: References: From: Ted Dunning Date: Tue, 31 May 2011 23:01:13 -0700 Message-ID: Subject: Re: Why do userid & itemid have to be long? To: user@mahout.apache.org Content-Type: multipart/alternative; boundary=20cf3054a12fa3e0fc04a4a040a7 --20cf3054a12fa3e0fc04a4a040a7 Content-Type: text/plain; charset=UTF-8 Are you putting the translations into Mongo? On Tue, May 31, 2011 at 9:51 PM, Mike Khristo wrote: > Using the 0.6 snapshot + patch 705 (mongodatamodel) from jira ( > https://issues.apache.org/jira/browse/MAHOUT-705), and a test data set > with > ~300k rows like: > > "4cec0a2934ac9fbd2b040000","4d065d5434ac9f5227a12f00",118 > > It's slowly doing the translations: > INFO: [+++][MONGO-MAP] Adding Translation Item ID: > 4d57d54434ac9fd3570005a2 long_value: 145367 > > It's doing about 30,000 per hour (and getting slower). That's 8.3/sec. > 8G ram, 4 virtual cores > > With a test data set of 3M preferences, that would take >5 days, just for > the translation. > > Open to ideas/suggestions/"a-ha"-moments. Thanks! > > > > > On Tue, May 31, 2011 at 9:15 PM, Ted Dunning > wrote: > > > It makes the internals much cleaner to not repeat this conversion. > > > > But how is it that this is taking a long time? String -> lookup should > not > > be much longer than an array access, especially if you use the Mahout > > collections or one of the dictionary types. > > > > On Tue, May 31, 2011 at 7:50 PM, Mike Khristo > > wrote: > > > > > Rather, how can I use string-based userid/itemid's without having the > > deal > > > with the slowness associated with mapping them to a long? > > > > > > In the MongoDataModel, for example, significant time/overhead goes into > > > converting the unique id's to long... I'm still getting my head > wrapped > > > around mahout, but this seems like a significant limitation. I have to > > > assume there's some logic behind the decision to restrict them to long, > > but > > > i didn't find anything about it in Mahout in Action or the list. > > > > > > Thanks. > > > > > > --20cf3054a12fa3e0fc04a4a040a7--