Return-Path: X-Original-To: apmail-mahout-dev-archive@www.apache.org Delivered-To: apmail-mahout-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2343719EC0 for ; Wed, 20 Apr 2016 15:47:31 +0000 (UTC) Received: (qmail 57976 invoked by uid 500); 20 Apr 2016 15:47:25 -0000 Delivered-To: apmail-mahout-dev-archive@mahout.apache.org Received: (qmail 57905 invoked by uid 500); 20 Apr 2016 15:47:25 -0000 Mailing-List: contact dev-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mahout.apache.org Delivered-To: mailing list dev@mahout.apache.org Received: (qmail 57887 invoked by uid 99); 20 Apr 2016 15:47:25 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Apr 2016 15:47:25 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 8BED62C1F5D for ; Wed, 20 Apr 2016 15:47:25 +0000 (UTC) Date: Wed, 20 Apr 2016 15:47:25 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: dev@mahout.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (MAHOUT-1833) Enhance svec function to accepting cardinality as parameter MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAHOUT-1833?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D152= 50119#comment-15250119 ]=20 ASF GitHub Bot commented on MAHOUT-1833: ---------------------------------------- Github user smarthi commented on the pull request: https://github.com/apache/mahout/pull/224#issuecomment-212484167 =20 Ignore the travis failures, its still WIP. Thanks for the PR, we'll me= rge it soon for the next release. > Enhance svec function to accepting cardinality as parameter=20 > ------------------------------------------------------------ > > Key: MAHOUT-1833 > URL: https://issues.apache.org/jira/browse/MAHOUT-1833 > Project: Mahout > Issue Type: Improvement > Affects Versions: 0.12.0 > Environment: Mahout Spark Shell 0.12.0, > Spark 1.6.0 Cluster on Hadoop Yarn 2.7.1,=20 > Centos7 64bit > Reporter: Edmond Luo > > It will be nice to add one more wrapper function like below to org.apache= .mahout.math.scalabindings > {code} > /** > * create a sparse vector out of list of tuple2's with specific cardinali= ty(size), > * throws IllegalArgumentException if cardinality is not bigger than requ= ired cardinality of sdata > * @param cardinality sdata > * @return > */ > def svec(cardinality: Int, sdata: TraversableOnce[(Int, AnyVal)]) =3D { > val required =3D if (sdata.nonEmpty) sdata.map(_._1).max + 1 else 0 > if (cardinality < required) { > throw new IllegalArgumentException(s"Cardinality[%cardinality] must b= e bigger than required[%required]!") > } > val initialCapacity =3D sdata.size > val sv =3D new RandomAccessSparseVector(cardinality, initialCapacity) > sdata.foreach(t =E2=87=92 sv.setQuick(t._1, t._2.asInstanceOf[Number].d= oubleValue())) > sv > } > {code} > So user can specify the cardinality for the created sparse vector. > This is very useful and convenient if user wants to create a DRM with man= y sparse vectors and the vectors are not with the same actual size(but with= the same logical size, e.g. rows of a sparse matrix). > Below code should demonstrate the case: > {code} > var cardinality =3D 20 > val rdd =3D sc.textFile("/some/file.txt").map(_.split(",")).map(line =3D>= (line(0).toInt, Array((line(1).toInt,1)))).reduceByKey((v1, v2) =3D> v1 ++= v2).map(row =3D> (row._1, svec(cardinality, row._2))) > val drm =3D drmWrap(rdd.map(row =3D> (row._1, row._2.asInstanceOf[Vector]= ))) > // All element wise opperation will fail for those DRM with not cardinali= ty-consistent SparseVector > val drm2 =3D drm + drm > val drm3 =3D drm - drm > val drm4 =3D drm * drm > val drm5 =3D drm / drm > {code} > Notice that in the last map, the svec acceptted one more cardinality para= meter, so the cardinality of those created SparseVector can be consistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)