Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 01C131025C for ; Wed, 22 Jan 2014 22:37:33 +0000 (UTC) Received: (qmail 25832 invoked by uid 500); 22 Jan 2014 22:37:28 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 25777 invoked by uid 500); 22 Jan 2014 22:37:28 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 25766 invoked by uid 99); 22 Jan 2014 22:37:28 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Jan 2014 22:37:28 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS,UNPARSEABLE_RELAY X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [94.124.120.49] (HELO server7.bhosted.nl) (94.124.120.49) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 22 Jan 2014 22:37:04 +0000 Received: (qmail 26826 invoked by uid 87); 22 Jan 2014 23:36:43 +0100 Received: from mail-qa0-f50.google.com (postmaster@frankscholten.nl@mail-qa0-f50.google.com) by server7 (envelope-from , uid 0) with qmail-scanner-2.02 (clamdscan: 0.97.8/18380. spamassassin: 3.3.2. Clear:RC:0(209.85.216.50):. Processed in 0.240226 secs); 22 Jan 2014 22:36:43 -0000 Received: from mail-qa0-f50.google.com (postmaster@frankscholten.nl@209.85.216.50) by server7.bhosted.nl with SMTP; 22 Jan 2014 23:36:42 +0100 Received: by mail-qa0-f50.google.com with SMTP id cm18so1257287qab.23 for ; Wed, 22 Jan 2014 14:36:41 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=BBY7pBJbZAVaPCK3lPgH4T3gpwPahHmzMmR+CrBIt1c=; b=GD9lYjqsI9DboLh9EAzOujsQ89jL+NkbmUc0GuZrDv7Qxkqd1P5eNMOds5P7/XBB2N qVRbCWf+dyDGXIvtQPhtCpX6mK/EylU7YX7+swrFewv6iGQWYkymGasnEi0dkNJsXL9s /Ci17x4el78/yDsVEh+9RKTUQBSSway0oB0jkFy3ro4Gqs2pPJ11d68PX4GZTXcu7SJ0 mjYmHyMoFVWVEi707pAFlOEZ+fPu8QXtP1ZPCb5EFfR9Yu58x1cpZniQncoRjvJZvNUo 2F1g+QP4CK2EJeJdokZRv72SZdoSiy9EwxqYAl/GZTy7ftf4x7evKTlHDitMv9H9Vk0c nW0g== MIME-Version: 1.0 X-Received: by 10.140.98.33 with SMTP id n30mr6357244qge.8.1390430201472; Wed, 22 Jan 2014 14:36:41 -0800 (PST) Received: by 10.96.133.130 with HTTP; Wed, 22 Jan 2014 14:36:41 -0800 (PST) In-Reply-To: <1390427123.66539.YahooMailNeo@web160206.mail.bf1.yahoo.com> References: <1389883269.23804.YahooMailNeo@web160204.mail.bf1.yahoo.com> <1390226458.89630.YahooMailNeo@web160206.mail.bf1.yahoo.com> <1390325027.97853.YahooMailNeo@web160202.mail.bf1.yahoo.com> <1390338534.81366.YahooMailNeo@web160202.mail.bf1.yahoo.com> <1390382645.96586.YahooMailNeo@web160203.mail.bf1.yahoo.com> <1390427123.66539.YahooMailNeo@web160206.mail.bf1.yahoo.com> Date: Wed, 22 Jan 2014 23:36:41 +0100 Message-ID: Subject: Re: MAHOUT 0.9 Release - New URL From: Frank Scholten To: user@mahout.apache.org, Suneel Marthi Content-Type: multipart/alternative; boundary=001a113aaa3e3e895c04f096c2b4 X-Virus-Checked: Checked by ClamAV on apache.org --001a113aaa3e3e895c04f096c2b4 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Updated trunk and Streaming K-Means works in sequential mode: Average distance in cluster 0 [45]: 15835.645959 Average distance in cluster 1 [2]: 12655.384293 Cluster 2 is has 1 data point. Need atleast 2 data points in a cluster for OnlineSummarizer. Average distance in cluster 3 [12]: 16639.304306 Average distance in cluster 4 [12466]: 1765.051250 Average distance in cluster 5 [613]: 7968.987864 Average distance in cluster 6 [453]: 11678.351990 Average distance in cluster 7 [7848]: 3475.257237 Average distance in cluster 8 [137]: 14040.611024 Cluster 9 is has 1 data point. Need atleast 2 data points in a cluster for OnlineSummarizer. Num clusters: 10; maxDistance: 111156.247816 [Dunn Index] First: 0.002786 [Davies-Bouldin Index] First: 53.915866 Jan 22, 2014 11:29:51 PM org.slf4j.impl.JCLLoggerAdapter info INFO: Program took 33654 ms (Minutes: 0.5609) cluster,distance.mean,distance.sd ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train 0,15835.645959,4305.418183,-6094.066658,13494.967198,14908.353662,19013.936= 496,25229.795662,45,train 1,12655.384293,12655.384293,-12655.384293,0.000000,12655.384293,25310.76858= 7,37966.152880,2,train 3,16639.304306,8137.858191,76.596986,23652.805218,17032.177174,21993.360359= ,26116.861135,12,train 4,1765.051250,1912.041786,53.833129,665.221968,1398.456928,2116.252442,9120= 0.149803,12466,train 5,7968.987864,3283.509392,3106.173001,5444.653631,7154.854277,9475.107969,2= 0961.123807,613,train 6,11678.351990,3986.046231,80.428556,8688.530291,10657.331417,13992.879084,= 25697.590999,453,train 7,3475.257237,2849.263422,244.613872,1701.937225,2645.839526,4362.384712,11= 1156.247816,7848,train 8,14040.611024,3847.956007,-4400.223235,11295.103900,13063.847142,16227.853= 884,22973.712042,137,train On Wed, Jan 22, 2014 at 10:45 PM, Suneel Marthi wr= ote: > Thanks Andrew. I'll put a Release out soon. > > > > > On Wednesday, January 22, 2014 3:52 PM, Andrew Palumbo > wrote: > > > Everything seems to run well on my local machine: > > Checked out revision 1560364. > > CentOS 6 > Apache Maven 3.1.2-SNAPSHOT > Java version: 1.6.0_45, vendor: Sun Microsystems Inc. > Java home: /usr/java/jdk1.6.0_45/jre > OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch: "amd64", > family: "unix" > Hadoop 2.2.0 > > > mvn clean compile -DSkipTests [OK-Several Warnings] > mvn clean test [PASSED ALL] > mvn clean install -DskipTests [OK] > > > $MAHOUT_LOCAL=3Dtrue > > classify-20newsgroups.sh->1 [Accuracy 89.3529%] > classify-20newsgroups.sh->2 [Accuracy 90.8317%] > classify-20newsgroups.sh->3 [Accuracy 76.2746%] > classify-20newsgroups.sh->4 [cleans up] > > cluster-reuters.sh->1 [20 clusters] -kmeans > cluster-reuters.sh->2 [INFO: 20 clusters] -fkmeans > cluster-reuters.sh->3 [OK] -lda > cluster-reuters.sh->4 [10 (9) clusters- see attached] -streaming kmeans > > ./cluster-syntheticcontrol.sh->1 [INFO: Wrote 6 clusters] > ./cluster-syntheticcontrol.sh->2 [INFO: Wrote 6 clusters] > ./cluster-syntheticcontrol.sh->3 [INFO: Wrote 6 clusters] > > ./factorize-movielens-1M.sh /home/andy/test_data/ml-1m/ratings.dat [RMSE > is: 0.851264570339848] > > > > > Attached is full output of cluster-reuters.sh->4 Streaming K-Means. > > > > From cluster-reuters.sh->4 Streaming K-Means: > > Cluster 0 is has 1 data point. Need atleast 2 data points in a cluster fo= r > OnlineSummarizer. > Average distance in cluster 1 [2816]: 3438.913758 > Average distance in cluster 2 [112]: 20617.345993 > Average distance in cluster 3 [4]: 32504.085379 > Average distance in cluster 4 [435]: 18476.579935 > Average distance in cluster 5 [27]: 21153.167574 > Average distance in cluster 6 [15480]: 2040.864416 > Average distance in cluster 7 [1711]: 5281.742482 > Average distance in cluster 8 [964]: 15762.976239 > Average distance in cluster 9 [28]: 19762.109632 > Num clusters: 10; maxDistance: 107106.379648 > > > > > [Dunn Index] First: 0.002272 > [Davies-Bouldin Index] First: 57.871266 > Jan 22, 2014 12:14:47 PM org.slf4j.impl.JCLLoggerAdapter info > INFO: Program took 77695 ms (Minutes: 1.2949166666666667) > cluster,distance.mean,distance.sd > ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.tra= in > 1,3438.913758 > ,2430.072640,250.635051,1793.254765,2908.356638,4444.702564,22173.892767,= 2816,train > > 2,20617.345993,3978.577827,-8306.835555,17787.685767,19584.319120,22864.6= 37511,37305.829397,112,train > > 3,32504.085379,29250.558538,-12174.296092,12174.296092,36522.888276,26372= .137172,107106.379648,4,train > > 4,18476.579935,3600.742072,-7212.729374,15841.995992,17431.838259,20066.6= 10494,40205.090209,435,train > > 5,21153.167574,4963.661797,-8880.583978,19729.348269,21251.400944,24588.7= 43549,27926.248558,27,train > > 6,2040.864416,2007.719699,53.622493,841.033934,1571.121917,2396.407672,18= 967.768820,15480,train > > 7,5281.742482,3083.071478,1933.759989,3216.929268,4074.689928,6371.577109= ,20292.193673,1711,train > > 8,15762.976239,3158.956443,65.031208,13511.867700,14744.029626,17287.0069= 57,31483.809655,964,train > > 9,19762.109632,4355.120345,-8902.814641,18669.317253,20712.227220,21602.6= 60490,27452.910312,28,train > > > > > > > From: ap.dev@outlook.com > > To: dev@mahout.apache.org; user@mahout.apache.org > > Subject: RE: MAHOUT 0.9 Release - New URL > > Date: Wed, 22 Jan 2014 09:37:06 -0500 > > > > will do! > > > > > Date: Wed, 22 Jan 2014 01:24:05 -0800 > > > From: suneel_marthi@yahoo.com > > > Subject: Re: MAHOUT 0.9 Release - New URL > > > To: dev@mahout.apache.org; user@mahout.apache.org > > > > > > Andrew M., Andrew P. and others, > > > > > > Sebastian and me fixed a few issues today (for 0.9): > > > > > > a) Removed asf-email-examples.sh script and few other scripts that > should have been removed. Also removed references/invocations to algorith= ms > that have been removed from the codebase. > > > b) Fixed the issue with Streaming Kmeans clustering and checked in th= e > code. > > > c) Resurrected Frequent Pattern Mining implementation for 0.9. > > > > > > Please checkout the latest code from trunk, run a build locally and > run thru the example scripts. > > > > > > Thanks and Regards. > > > > > > > > > > > > > > > > > > > > > On Wednesday, January 22, 2014 12:11 AM, Andrew Musselman < > andrew.musselman@gmail.com> wrote: > > > > > > *factorize-movielens-1M.sh:* > > > RMSE is: > > > > > > 0.8519064098265133 > > > > > > > > > Sample recommendations: > > > > > > 2229 > > > > [2197:4.4961276,527:4.4309845,2972:4.4180074,318:4.379484,572:4.312696,30= 92:4.249903] > > > 5848 > > > > [1900:4.6775646,3787:4.6623707,632:4.641377,2609:4.608225,3808:4.6058936,= 2998:4.6057487] > > > 3728 > > > > [572:4.951382,47:4.748921,874:4.6945343,1563:4.679901,3314:4.6621537,50:4= .655838] > > > 1252 > > > > [53:5.0,3816:4.9664702,3077:4.9494777,213:4.94007,3808:4.9060082,978:4.85= 68053] > > > 634 > > > > [572:5.0,3092:4.779557,1872:4.72024,2687:4.629712,2125:4.615142,3853:4.52= 61393] > > > 5516 > [572:5.0,2197:5.0,3092:5.0,318:4.908213,356:4.885,3844:4.8237453] > > > 2276 [1204:5.0,572:5.0,912:5.0,1250:5.0,1272:4.999891,1262:4.98965= 2] > > > 4219 > > > > [53:4.8112006,598:4.775032,858:4.761604,572:4.7579737,1219:4.680987,1221:= 4.6604886] > > > 91 [1198:5.0,2762:5.0,1207:5.0,1234:5.0,318:5.0,260:5.0] > > > 502 > > > > [953:5.0,260:4.9800477,1234:4.869403,1198:4.8527064,1207:4.8497486,3469:4= .847286] > > > > > > factorize-netflix.sh: > > > References a no-longer-available data set that Netflix took down afte= r > the > > > competition; should at least mention that the data set is no longer > > > "online" at least. > > > > > > > > > On Tue, Jan 21, 2014 at 8:05 PM, Andrew Musselman < > > > andrew.musselman@gmail.com> wrote: > > > > > > > *clustering-syntheticcontrol.sh* > > > > > > > > *Canopy:* > > > > [snip] > > > > 1.0 : [distance-squared=3D1740.681000315628]: [35.486, 25.6= 00, > > > > 29.914, 30.200, 27.654, 28.647, 26.582, 32.941, 34.728, 29.047, > 34.706, > > > > 34.764, 28.816, 30.271, 25.784, 25.035, 35.436, 29.075, 24.267, > 24.625, > > > > 28.162, 26.218, 28.773, 28.966, 28.802, 34.292, 26.211, 33.363, > 32.920, > > > > 31.891, 34.504, 32.686, 24.327, 35.981, 31.390, 10.832, 20.238, > 10.051, > > > > 14.877, 10.570, 19.603, 14.544, 10.667, 16.470, 19.007, 10.352, > 13.473, > > > > 12.196, 10.684, 16.620, 20.434, 17.069, 18.744, 9.599, 11.195, > 12.002, > > > > 10.017, 17.149, 14.850, 10.890] > > > > 1.0 : [distance-squared=3D1455.363773097357]: [31.022, 28.1= 40, > > > > 26.730, 26.570, 29.561, 26.966, 28.049, 25.673, 33.721, 26.275, > 30.410, > > > > 31.101, 24.019, 35.659, 25.253, 25.932, 28.618, 32.423, 33.666, > 33.745, > > > > 35.118, 29.164, 25.477, 31.947, 35.491, 30.730, 25.820, 24.651, > 25.528, > > > > 31.343, 29.005, 31.825, 26.891, 28.194, 31.429, 16.935, 8.070, > 16.604, > > > > 14.743, 10.342, 8.155, 10.395, 17.689, 16.791, 14.138, 15.761, 6.78= 7, > > > > 13.062, 16.660, 15.021, 9.891, 9.216, 11.550, 8.877, 18.220, 9.477, > 10.342, > > > > 16.430, 11.898, 15.366] > > > > 1.0 : [distance-squared=3D1679.9304895378882]: [29.625, 25.= 503, > > > > 31.598, 31.466, 33.549, 28.294, 28.924, 30.692, 25.330, 26.873, > 31.817, > > > > 24.267, 31.387, 31.654, 34.849, 29.251, 28.272, 35.781, 31.472, > 32.322, > > > > 28.508, 29.867, 31.474, 29.153, 24.125, 25.376, 15.918, 22.231, > 18.264, > > > > 24.582, 18.679, 26.370, 24.154, 25.902, 24.800, 17.273, 25.463, > 22.296, > > > > 26.876, 24.511, 25.702, 21.356, 25.968, 15.507, 24.281, 25.025, > 21.750, > > > > 16.837, 15.079, 17.333, 26.747, 18.880, 21.332, 23.692, 22.310, > 19.136, > > > > 15.285, 22.528, 20.657, 24.129] > > > > 1.0 : [distance-squared=3D2044.2887801683828]: [27.414, 25.= 397, > > > > 26.460, 31.978, 26.125, 27.463, 30.489, 34.929, 27.558, 30.686, > 27.511, > > > > 32.269, 32.834, 27.129, 24.991, 32.610, 25.387, 32.674, 34.607, > 33.519, > > > > 29.012, 28.705, 32.116, 29.121, 26.424, 33.452, 33.623, 29.457, > 35.025, > > > > 26.607, 34.442, 34.847, 28.897, 34.439, 32.011, 34.816, 27.773, > 11.549, > > > > 20.219, 19.678, 14.715, 14.384, 15.556, 9.573, 10.636, 16.639, > 17.236, > > > > 19.643, 18.317, 15.323, 19.106, 11.455, 16.888, 18.269, 11.583, > 14.118, > > > > 20.229, 11.131, 9.980, 10.720] > > > > 1.0 : [distance-squared=3D1385.3154063160764]: [35.899, 26.= 672, > > > > 34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587, > 31.032, > > > > 34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037, > 32.979, > > > > 26.118, 26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016, > 24.553, > > > > 21.452, 15.836, 21.311, 20.879, 22.559, 21.694, 25.856, 20.533, > 21.542, > > > > 25.766, 26.018, 20.820, 24.959, 18.959, 23.346, 16.068, 22.836, > 21.939, > > > > 25.722, 19.671, 26.299, 21.879, 16.002, 15.288, 16.946, 17.534, > 16.846, > > > > 16.546, 15.927, 18.084, 17.475] > > > > 1.0 : [distance-squared=3D1920.6376615603585]: [24.538, 24.= 280, > > > > 28.281, 27.132, 26.662, 32.110, 32.810, 30.483, 35.859, 25.387, > 31.301, > > > > 25.429, 26.866, 30.852, 24.478, 25.665, 25.296, 30.263, 29.657, > 25.295, > > > > 25.022, 35.264, 26.109, 9.600, 12.675, 16.575, 19.760, 13.349, > 18.137, > > > > 7.993, 16.751, 16.341, 15.349, 9.476, 9.943, 16.609, 12.331, 8.645, > 19.457, > > > > 10.836, 10.349, 9.726, 14.575, 18.959, 15.822, 17.364, 11.915, > 13.762, > > > > 12.402, 19.628, 19.644, 11.524, 15.419, 12.670, 13.116, 8.235, > 12.042, > > > > 19.310, 12.999, 17.460] > > > > 1.0 : [distance-squared=3D2192.939571172661]: [34.335, 30.9= 38, > > > > 31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089, > 31.371, > > > > 26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650, > 24.940, > > > > 33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634, > 12.694, > > > > 6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729, > 6.976, > > > > 17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.40= 0, > > > > 15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.91= 3, > > > > 11.743, 11.699, 10.152] > > > > Jan 22, 2014 3:50:29 AM org.slf4j.impl.JCLLoggerAdapter info > > > > INFO: Wrote 6 clusters > > > > Jan 22, 2014 3:50:29 AM org.slf4j.impl.JCLLoggerAdapter info > > > > INFO: Program took 5827 ms (Minutes: 0.09711666666666667) > > > > > > > > *K-means:* > > > > [snip] > > > > 1.0 : [distance-squared=3D2873.881301031739]: [26.369, 37.7= 91, > > > > 41.839, 39.694, 36.728, 35.079, 30.668, 24.755, 20.610, 18.885, > 15.459, > > > > 14.074, 19.117, 34.230, 32.202, 40.715, 39.543, 37.193, 39.448, > 30.829, > > > > 25.273, 22.324, 19.801, 13.505, 17.462, 24.287, 27.398, 38.577, > 42.108, > > > > 41.947, 43.987, 41.331, 28.958, 23.664, 20.308, 22.244, 12.149, > 15.768, > > > > 18.964, 32.579, 33.202, 36.205, 42.364, 40.601, 35.415, 39.576, > 33.145, > > > > 19.936, 23.062, 19.053, 24.383, 19.611, 25.218, 38.860, 36.570, > 38.964, > > > > 36.946, 36.900, 32.593, 31.563] > > > > 1.0 : [distance-squared=3D2525.8924241648783]: [35.389, 31.= 178, > > > > 40.041, 43.034, 49.524, 40.942, 42.369, 30.153, 26.253, 23.178, > 19.885, > > > > 19.693, 21.837, 26.858, 23.533, 29.798, 43.401, 43.349, 48.238, > 43.868, > > > > 44.017, 28.056, 25.284, 22.737, 13.703, 14.023, 22.115, 23.720, > 25.478, > > > > 37.870, 37.868, 46.954, 43.135, 41.286, 37.269, 29.347, 24.312, > 21.743, > > > > 19.152, 21.668, 10.882, 16.147, 30.020, 28.472, 38.716, 44.620, > 47.847, > > > > 48.846, 39.361, 38.449, 33.105, 19.935, 14.961, 12.149, 12.630, > 13.459, > > > > 18.967, 35.473, 30.146, 45.527] > > > > 1.0 : [distance-squared=3D2392.7171990886272]: [27.662, 37.= 199, > > > > 39.158, 44.264, 46.473, 40.178, 38.728, 24.412, 25.155, 15.938, > 13.125, > > > > 16.867, 13.875, 29.969, 34.322, 40.870, 44.225, 47.594, 47.607, > 44.751, > > > > 39.813, 32.461, 16.646, 16.305, 21.256, 20.627, 19.267, 31.901, > 34.995, > > > > 43.643, 43.152, 47.125, 48.524, 33.131, 32.697, 20.300, 14.350, > 18.330, > > > > 12.155, 11.261, 19.701, 29.313, 30.457, 39.914, 49.543, 49.851, > 46.445, > > > > 39.561, 28.860, 22.486, 13.967, 10.006, 11.715, 15.945, 21.348, > 25.439, > > > > 41.024, 37.105, 45.623, 43.589] > > > > 1.0 : [distance-squared=3D1419.8378244373016]: [25.784, 34.= 129, > > > > 42.659, 37.176, 35.961, 34.307, 32.108, 29.749, 25.047, 17.455, > 24.652, > > > > 25.311, 22.995, 30.256, 25.955, 28.426, 34.556, 40.386, 39.642, > 40.566, > > > > 32.612, 34.091, 26.033, 25.668, 26.545, 17.338, 24.980, 20.134, > 27.542, > > > > 36.612, 31.855, 37.947, 39.736, 33.535, 36.607, 37.479, 32.612, > 22.262, > > > > 20.662, 16.124, 24.547, 27.686, 21.747, 27.198, 31.259, 40.569, > 37.067, > > > > 34.465, 34.730, 33.371, 23.060, 30.162, 22.022, 22.216, 14.812, > 19.357, > > > > 24.508, 34.432, 32.155, 34.839] > > > > 1.0 : [distance-squared=3D4186.814512311335]: [25.870, 39.1= 95, > > > > 36.908, 47.052, 47.384, 40.741, 42.494, 30.282, 25.834, 17.650, > 16.004, > > > > 17.895, 13.321, 19.045, 27.440, 31.911, 39.208, 43.622, 41.567, > 44.815, > > > > 44.921, 35.422, 35.477, 23.190, 17.859, 14.684, 23.504, 23.141, > 21.746, > > > > 30.816, 31.361, 37.015, 38.094, 46.688, 47.681, 43.777, 39.652, > 31.701, > > > > 23.767, 22.265, 22.654, 22.327, 19.195, 21.163, 29.602, 27.563, > 36.244, > > > > 38.859, 44.234, 42.352, 42.160, 40.172, 30.094, 21.092, 25.193, > 13.096, > > > > 18.111, 14.754, 27.386, 27.026] > > > > 1.0 : [distance-squared=3D1544.4011543572997]: [28.075, 41.= 784, > > > > 42.120, 38.735, 44.320, 34.316, 32.212, 31.868, 24.301, 14.547, > 17.178, > > > > 22.279, 24.357, 31.011, 31.444, 34.837, 46.550, 48.301, 38.859, > 42.363, > > > > 35.657, 31.499, 19.794, 12.124, 15.371, 12.436, 15.763, 24.679, > 32.597, > > > > 43.004, 36.616, 38.935, 42.954, 34.957, 36.183, 28.177, 16.326, > 14.988, > > > > 10.680, 22.728, 24.075, 24.058, 36.616, 43.982, 39.198, 40.118, > 40.078, > > > > 34.752, 34.018, 23.750, 18.374, 12.251, 15.539, 18.699, 28.973, > 24.044, > > > > 39.404, 38.034, 46.458, 44.432] > > > > 1.0 : [distance-squared=3D825.9338725427806]: [33.670, 38.6= 75, > > > > 39.742, 41.989, 37.291, 43.975, 31.909, 25.878, 31.080, 15.858, > 13.950, > > > > 23.097, 19.983, 21.692, 31.579, 38.570, 33.376, 38.843, 41.936, > 33.534, > > > > 39.195, 32.897, 25.343, 18.523, 15.089, 17.771, 22.614, 25.313, > 23.687, > > > > 29.010, 41.995, 35.712, 40.872, 41.669, 32.156, 25.162, 24.980, > 23.705, > > > > 18.413, 20.975, 14.906, 26.171, 30.165, 27.818, 35.083, 39.514, > 37.851, > > > > 33.967, 32.338, 34.977, 26.589, 28.079, 19.597, 24.669, 23.098, > 25.685, > > > > 28.215, 34.940, 36.910, 39.749] > > > > Jan 22, 2014 4:01:31 AM org.slf4j.impl.JCLLoggerAdapter info > > > > INFO: Wrote 6 clusters > > > > Jan 22, 2014 4:01:31 AM org.slf4j.impl.JCLLoggerAdapter info > > > > INFO: Program took 16902 ms (Minutes: 0.2817) > > > > > > > > *Fuzzy k-means:* > > > > [snip] > > > > 1.0 : [distance-squared=3D971.7369782121968]: [29.625, 25.5= 03, > > > > 31.598, 31.466, 33.549, 28.294, 28.924, 30.692, 25.330, 26.873, > 31.817, > > > > 24.267, 31.387, 31.654, 34.849, 29.251, 28.272, 35.781, 31.472, > 32.322, > > > > 28.508, 29.867, 31.474, 29.153, 24.125, 25.376, 15.918, 22.231, > 18.264, > > > > 24.582, 18.679, 26.370, 24.154, 25.902, 24.800, 17.273, 25.463, > 22.296, > > > > 26.876, 24.511, 25.702, 21.356, 25.968, 15.507, 24.281, 25.025, > 21.750, > > > > 16.837, 15.079, 17.333, 26.747, 18.880, 21.332, 23.692, 22.310, > 19.136, > > > > 15.285, 22.528, 20.657, 24.129] > > > > 1.0 : [distance-squared=3D2054.618163154475]: [27.414, 25.3= 97, > > > > 26.460, 31.978, 26.125, 27.463, 30.489, 34.929, 27.558, 30.686, > 27.511, > > > > 32.269, 32.834, 27.129, 24.991, 32.610, 25.387, 32.674, 34.607, > 33.519, > > > > 29.012, 28.705, 32.116, 29.121, 26.424, 33.452, 33.623, 29.457, > 35.025, > > > > 26.607, 34.442, 34.847, 28.897, 34.439, 32.011, 34.816, 27.773, > 11.549, > > > > 20.219, 19.678, 14.715, 14.384, 15.556, 9.573, 10.636, 16.639, > 17.236, > > > > 19.643, 18.317, 15.323, 19.106, 11.455, 16.888, 18.269, 11.583, > 14.118, > > > > 20.229, 11.131, 9.980, 10.720] > > > > 1.0 : [distance-squared=3D954.6503560728597]: [35.899, 26.6= 72, > > > > 34.191, 35.827, 25.101, 24.856, 25.814, 30.630, 34.212, 32.587, > 31.032, > > > > 34.304, 24.555, 35.870, 30.683, 29.058, 28.637, 29.855, 32.037, > 32.979, > > > > 26.118, 26.107, 25.096, 22.703, 17.698, 16.281, 18.186, 24.016, > 24.553, > > > > 21.452, 15.836, 21.311, 20.879, 22.559, 21.694, 25.856, 20.533, > 21.542, > > > > 25.766, 26.018, 20.820, 24.959, 18.959, 23.346, 16.068, 22.836, > 21.939, > > > > 25.722, 19.671, 26.299, 21.879, 16.002, 15.288, 16.946, 17.534, > 16.846, > > > > 16.546, 15.927, 18.084, 17.475] > > > > 1.0 : [distance-squared=3D2817.9170498632957]: [24.538, 24.= 280, > > > > 28.281, 27.132, 26.662, 32.110, 32.810, 30.483, 35.859, 25.387, > 31.301, > > > > 25.429, 26.866, 30.852, 24.478, 25.665, 25.296, 30.263, 29.657, > 25.295, > > > > 25.022, 35.264, 26.109, 9.600, 12.675, 16.575, 19.760, 13.349, > 18.137, > > > > 7.993, 16.751, 16.341, 15.349, 9.476, 9.943, 16.609, 12.331, 8.645, > 19.457, > > > > 10.836, 10.349, 9.726, 14.575, 18.959, 15.822, 17.364, 11.915, > 13.762, > > > > 12.402, 19.628, 19.644, 11.524, 15.419, 12.670, 13.116, 8.235, > 12.042, > > > > 19.310, 12.999, 17.460] > > > > 1.0 : [distance-squared=3D3472.3684696871424]: [34.335, 30.= 938, > > > > 31.953, 31.146, 24.519, 24.393, 27.696, 29.874, 26.767, 33.089, > 31.371, > > > > 26.233, 26.383, 35.661, 32.663, 27.685, 29.277, 31.761, 34.650, > 24.940, > > > > 33.434, 26.849, 28.714, 26.581, 34.825, 34.026, 8.823, 12.634, > 12.694, > > > > 6.279, 13.644, 16.651, 18.078, 7.975, 9.274, 9.208, 12.879, 12.729, > 6.976, > > > > 17.832, 13.330, 6.326, 12.131, 11.842, 16.716, 10.425, 9.445, 14.40= 0, > > > > 15.696, 11.028, 10.608, 15.190, 9.076, 17.909, 9.846, 15.013, 13.91= 3, > > > > 11.743, 11.699, 10.152] > > > > Jan 22, 2014 4:03:56 AM org.slf4j.impl.JCLLoggerAdapter info > > > > INFO: Wrote 6 clusters > > > > Jan 22, 2014 4:03:56 AM org.slf4j.impl.JCLLoggerAdapter info > > > > INFO: Program took 106615 ms (Minutes: 1.7769166666666667) > > > > > > > > *Dirichlet and Meanshift:* > > > > Already detailed in M-1400, deprecated jobs still referenced. > > > > > > > > > > > > > > > > On Tue, Jan 21, 2014 at 6:20 PM, Andrew Musselman < > > > > andrew.musselman@gmail.com> wrote: > > > > > > > >> *cluster-reuters.sh* > > > >> *k-means:* > > > >> > > > >> [snip] > > > >> :VL-19482{n=3D913 c=3D[0.06:0.011, 0.1:0.007, 0.13:0.010, 0.25:0.0= 16, > > > >> 0.38:0.020, 0.4:0.007, 0.5:0.032, 0 > > > >> Top Terms: > > > >> banks =3D> > > > >> 3.841823268955143 > > > >> bank =3D> > > > >> 3.80633066361209 > > > >> debt =3D> > > > >> 3.28065219870794 > > > >> said =3D> > > > >> 2.5965700942088583 > > > >> he =3D> > > > >> 2.335682813857497 > > > >> foreign =3D> > > > >> 2.2217853688201403 > > > >> billion =3D> > > > >> 2.1970193848291335 > > > >> would =3D> > > > >> 1.9932392063955617 > > > >> loans =3D> > > > >> 1.9309276792854233 > > > >> interest =3D> > > > >> 1.787324501938 > > > >> have =3D> > > > >> 1.762981951432578 > > > >> its =3D> > > > >> 1.7615109954971866 > > > >> which =3D> > > > >> 1.5822081148036862 > > > >> has =3D> > > > >> 1.5600708189041956 > > > >> dlrs =3D> > > > >> 1.5571038313005996 > > > >> finance =3D> > > > >> 1.5539758811252924 > > > >> new =3D> > > > >> 1.5176015811577555 > > > >> had =3D> > > > >> 1.5138723701401844 > > > >> brazil =3D> > > > >> 1.5083369853593172 > > > >> payments =3D> > > > >> 1.4539044255886517 > > > >> Weight : [props - optional]: Point: > > > >> > > > >> :VL-7320{n=3D2726 c=3D[0:0.003, 0.1:0.010, 0.2:0.007, 0.3:0.009, > 0.4:0.007, > > > >> 0.40:0.003, 0.5:0.009, 0.57:0 > > > >> Top Terms: > > > >> vs =3D> > > > >> 6.126130791333171 > > > >> net =3D> > > > >> 4.012191567277523 > > > >> cts =3D> > > > >> 3.822006848832744 > > > >> shr =3D> > > > >> 3.6786004856764527 > > > >> mln =3D> > > > >> 2.9011643584038698 > > > >> loss =3D> > > > >> 2.788368861463607 > > > >> qtr =3D> > > > >> 2.714140225051522 > > > >> revs =3D> > > > >> 2.4739861236454717 > > > >> profit =3D> > > > >> 1.8146888090247015 > > > >> note =3D> > > > >> 1.7977163272138388 > > > >> dlrs =3D> > > > >> 1.6164390808155846 > > > >> avg =3D> > > > >> 1.3901765773336587 > > > >> shrs =3D> > > > >> 1.3856326531419314 > > > >> mths =3D> > > > >> 1.3168717272038506 > > > >> 4th =3D> > > > >> 1.2161158425617289 > > > >> oper =3D> > > > >> 1.182419473776814 > > > >> year =3D> > > > >> 1.178086061733047 > > > >> nine =3D> > > > >> 1.0670554836445316 > > > >> 3rd =3D> > > > >> 1.041334410056592 > > > >> inc =3D> > > > >> 1.0019361981554935 > > > >> Weight : [props - optional]: Point: > > > >> > > > >> > > > >> Inter-Cluster Density: 0.45562152681859414 > > > >> Intra-Cluster Density: 0.6952712632167628 > > > >> CDbw Inter-Cluster Density: 0.0 > > > >> CDbw Intra-Cluster Density: 16.486930227598684 > > > >> CDbw Separation: 194.49005884464628 > > > >> > > > >> *fuzzy k-means:* > > > >> :SV-18539{n=3D1039 c=3D[0:0.026, 0.003:0.001, 0.006913:0.001, > 0.007050:0.001, > > > >> 0.01:0.005, 0.02:0.002, 0.0 > > > >> Top Terms: > > > >> said =3D> > > > >> 1.8665592354713065 > > > >> its =3D> > > > >> 1.1335212213411592 > > > >> pct =3D> > > > >> 1.0862816801353348 > > > >> dlrs =3D> > > > >> 1.0854998884993752 > > > >> mln =3D> > > > >> 1.043163996400643 > > > >> from =3D> > > > >> 0.9684961110525736 > > > >> has =3D> > > > >> 0.912161511978058 > > > >> company =3D> > > > >> 0.8754186972808333 > > > >> mar =3D> > > > >> 0.8675333452422878 > > > >> inc =3D> > > > >> 0.7678617590362815 > > > >> would =3D> > > > >> 0.7610968883652675 > > > >> he =3D> > > > >> 0.7459988770503974 > > > >> which =3D> > > > >> 0.7435613119406804 > > > >> year =3D> > > > >> 0.7302840632748394 > > > >> u.s =3D> > > > >> 0.7281061062439116 > > > >> shares =3D> > > > >> 0.7260764102983083 > > > >> corp =3D> > > > >> 0.7179807367808658 > > > >> new =3D> > > > >> 0.7044203783157115 > > > >> stock =3D> > > > >> 0.6962010978721442 > > > >> have =3D> > > > >> 0.6464265467298506 > > > >> :SV-9431{n=3D1034 c=3D[0:0.023, 0.003:0.001, 0.006913:0.001, > 0.007050:0.001, > > > >> 0.01:0.004, 0.02:0.002, 0.02 > > > >> Top Terms: > > > >> said =3D> > > > >> 1.864911184196927 > > > >> dlrs =3D> > > > >> 1.199286689822081 > > > >> mln =3D> > > > >> 1.1802134783562215 > > > >> pct =3D> > > > >> 1.1529704214798124 > > > >> its =3D> > > > >> 1.1184398851519701 > > > >> from =3D> > > > >> 1.016647848050332 > > > >> company =3D> > > > >> 0.894703604722841 > > > >> mar =3D> > > > >> 0.879986159541356 > > > >> has =3D> > > > >> 0.8642799128491316 > > > >> year =3D> > > > >> 0.8271823503717782 > > > >> inc =3D> > > > >> 0.7871293745341424 > > > >> corp =3D> > > > >> 0.737705498468879 > > > >> which =3D> > > > >> 0.722975201852743 > > > >> would =3D> > > > >> 0.708000816484415 > > > >> u.s =3D> > > > >> 0.7073294276173905 > > > >> billion =3D> > > > >> 0.7055723996916351 > > > >> he =3D> > > > >> 0.7042684217823294 > > > >> new =3D> > > > >> 0.6834737905434939 > > > >> shares =3D> > > > >> 0.6753327384172428 > > > >> stock =3D> > > > >> 0.6576225144041699 > > > >> :SV-4785{n=3D1044 c=3D[0:0.023, 0.003:0.001, 0.006913:0.001, > 0.007050:0.001, > > > >> 0.01:0.006, 0.02:0.002, 0.02 > > > >> Top Terms: > > > >> said =3D> > > > >> 1.8796076179735086 > > > >> its =3D> > > > >> 1.172025965452378 > > > >> dlrs =3D> > > > >> 1.130422792460914 > > > >> pct =3D> > > > >> 1.082038255241358 > > > >> mln =3D> > > > >> 1.0772146872767114 > > > >> company =3D> > > > >> 0.9662235879639138 > > > >> from =3D> > > > >> 0.9473172871605616 > > > >> has =3D> > > > >> 0.9224712965830099 > > > >> mar =3D> > > > >> 0.8769325856924421 > > > >> inc =3D> > > > >> 0.8360245257169788 > > > >> shares =3D> > > > >> 0.8334595641384324 > > > >> stock =3D> > > > >> 0.7704621839612175 > > > >> corp =3D> > > > >> 0.7682400250301806 > > > >> which =3D> > > > >> 0.7389988207856137 > > > >> would =3D> > > > >> 0.7339708917389389 > > > >> year =3D> > > > >> 0.7088414843731325 > > > >> new =3D> > > > >> 0.7038109468655172 > > > >> he =3D> > > > >> 0.6993994455501005 > > > >> u.s =3D> > > > >> 0.6772649147622415 > > > >> share =3D> > > > >> 0.6241804830055171 > > > >> > > > >> *lda:* > > > >> > > > >> [snip] > > > >> 21539 > > > >> > {0.02:0.7071698119320297,0.01:0.09185661419250732,0.055:0.055597162360300= 52,0:0.04416624482186779,0.046:0.04110903741109505,0.10:0.03666441778975499= 5,0.073:0.009543544335363714,0.006913:0.0050293341354450535,0.007050:0.0040= 31353935388081,0.003:0.0019163134919350053} > > > >> 21540 > > > >> > {0.04:0.4289840457590362,0.006913:0.3764552842292379,0.03:0.1420170003394= 2147,0.025:0.031492533562460345,0.01:0.0057428959027208,0.0625:0.0055636159= 96288134,0.007050:0.004410121345864598,0.02:0.003315679992182833,0.06:0.001= 0680133665222197,0.057:2.6908116668663575E-4} > > > >> 21541 > > > >> > {0:0.6323790020346274,0.02:0.282038438551802,0.06:0.05929917592183956,0.0= 46:0.021922159161082488,0.025:0.003123559949176154,0.1:6.163861804777446E-4= ,0.057:1.1513607281652563E-4,0.077:7.546641269535658E-5,0.05:6.020069105221= 075E-5,0.04:5.727546417602751E-5} > > > >> 21542 > > > >> > {0.06:0.7916202902993545,0.003:0.12474538653424426,0.03:0.051618750199053= 5,0.073:0.013078206873952192,0.077:0.006874558859394474,0.1:0.0047262572151= 75674,0.07:0.0029954110394770084,0.02:0.0015535221634909381,0.0625:8.057802= 584933225E-4,0.006913:5.349608183182145E-4} > > > >> 21543 > > > >> > {0.03:0.8557703918728101,0.007050:0.0884696514495358,0.01:0.0238621774202= 5169,0.055:0.01794750983209314,0.046:0.007244240662980594,0.02:0.0025021610= 305199344,0.1:0.0011795344808501283,0.0625:9.555718731181631E-4,0.077:8.384= 080940465597E-4,0.003:2.4480111447900804E-4} > > > >> 21544 > > > >> > {0.006913:0.6497351267772595,0.03:0.2586364130498913,0.003:0.084530809108= 4703,0.05:0.003488921831506818,0.0625:0.0011991059579690732,0.06:5.30165953= 5652308E-4,0.055:3.152013670552699E-4,0.025:3.11949418681945E-4,0.02:2.2346= 509541410544E-4,0.057:1.727741085612565E-4} > > > >> 21545 > > > >> > {0.07:0.24937001680778675,0.03:0.22854680579732564,0.1:0.2206807081138248= 4,0.05:0.21891358916881656,0.06:0.04971205462818302,0.046:0.029384231252419= 384,0.025:0.0014118858333346275,0.0625:0.0012321257282372393,0.073:3.160640= 2052550454E-4,0.04:1.1662985389192366E-4} > > > >> 21546 > > > >> > {0.006913:0.652435612545713,0.073:0.24282539253890825,0.003:0.05502066310= 1050894,0.046:0.01978650831133704,0.04:0.011699600233998459,0.02:0.00682251= 8565048025,0.05:0.0042733514779345234,0.025:0.0020768247329981497,0.03:0.00= 15466188524926666,0:0.0010433788491090747} > > > >> 21547 > > > >> > {0.1:0.935072008706917,0.077:0.040616382672055494,0.046:0.023138054104883= 225,0.06:3.42252302356326E-4,0.03:2.672247473681925E-4,0.007050:9.004419280= 457053E-5,0.006913:7.844271097106069E-5,0.04:7.198835322717653E-5,0.073:5.8= 017696474244885E-5,0.02:4.474316852571109E-5} > > > >> 21548 > > > >> > {0.006913:0.4510141547325999,0.03:0.21472666934984877,0.07:0.150605707436= 72352,0.046:0.08216181853028293,0.05:0.07498297963542139,0.077:0.0140121553= 2342401,0.04:0.006135722806477439,0.073:0.0031124469556872442,0.02:0.001327= 252389650958,0.025:4.5167404646311124E-4} > > > >> 21549 > > > >> > {0.077:0.5249260290096315,0:0.27520186965742544,0.073:0.08959570186504386= ,0.057:0.05063809804337512,0.02:0.045313417578343,0.03:0.00856024297303885,= 0.06:0.0034039136814002697,0.07:8.611323331226122E-4,0.05:3.938725523495860= 7E-4,0.04:3.454752498776842E-4} > > > >> 21550 > > > >> > {0.077:0.8319708524327014,0.007050:0.16001430652201168,0.0625:0.002488452= 8530674226,0.03:0.0024228964094551416,0.006913:0.0010048446080994332,0.025:= 5.974607168723063E-4,0.06:2.7741593377792194E-4,0.057:2.2975316663437597E-4= ,0.073:2.1769059479546598E-4,0.01:1.4945234676270913E-4} > > > >> 21551 > > > >> > {0:0.43298549949426596,0.07:0.30407459462158964,0.0625:0.1574407768512413= 6,0.01:0.06385977925647884,0.025:0.020205557109867888,0.04:0.01322612309168= 0062,0.055:0.006501562964287696,0.073:5.871897881404074E-4,0.003:4.55483664= 4848264E-4,0.05:2.0489216962221778E-4} > > > >> 21552 > > > >> > {0.06:0.7383744333942458,0.02:0.07321126682102753,0.07:0.0591037884128831= 1,0.10:0.056907223730939045,0:0.02739559786902668,0.055:0.02228913751272657= ,0.1:0.00943274247398869,0.073:0.007301445750018608,0.03:0.0027711985062277= 246,0.046:0.0022569760697531112} > > > >> 21553 > > > >> > {0.006913:0.9383779606743132,0.07:0.018356074893823605,0.1:0.017572502072= 403694,0.025:0.011031848455352145,0.02:0.007110054905474798,0.10:0.00323909= 33761145377,0.01:0.0022676909091530165,0.06:6.35712654913703E-4,0.003:3.037= 608224814005E-4,0.0625:2.569989005512836E-4} > > > >> 21554 > > > >> > {0.02:0.4740260172915081,0.03:0.3264871353578654,0.003:0.1238307119245032= 3,0:0.0387913487693229,0.046:0.013922363892934853,0.01:0.01115832101415319,= 0.055:0.004438529958216764,0.006913:0.0031112721201723413,0.025:0.002399946= 628810436,0.073:6.131643125523258E-4} > > > >> 21555 > > > >> > {0.01:0.42396888624053325,0.057:0.2876207419392007,0.055:0.13436747721404= 915,0.10:0.0923627978966975,0.05:0.030530372962547347,0.003:0.0084042341443= 69286,0.006913:0.007605013189108045,0.007050:0.005108825532219537,0.0625:0.= 004651573366090424,0.025:0.002180181204608079} > > > >> 21556 > > > >> > {0.02:0.9634685201058687,0.077:0.02816791882994464,0.003:0.00736800130574= 7207,0.01:4.6299664334060455E-4,0.055:6.358811991095171E-5,0.06:5.524007658= 1247115E-5,0.03:5.231350505975146E-5,0.073:4.4773553407989E-5,0:3.681045788= 2911134E-5,0.006913:3.469265861451538E-5} > > > >> 21557 > > > >> > {0.06:0.9993947051352264,0.05:2.53296407549323E-4,0.03:9.647254237540585E= -5,0.04:4.3439270433017595E-5,0.006913:3.354681491698619E-5,0.046:2.8586727= 566416525E-5,0.007050:2.5584606074089293E-5,0.02:2.2477243875316502E-5,0:1.= 9194783598477564E-5,0.073:1.6900267481892075E-5} > > > >> 21558 > > > >> > {0.06:0.9985073788555696,0.03:8.033091100049725E-4,0.02:1.195376673961020= 2E-4,0.1:9.257610652171745E-5,0.006913:6.903783269939522E-5,0.04:5.79609672= 80609926E-5,0.07:5.5767050956214925E-5,0.046:4.857027337508731E-5,0.007050:= 3.5964741541364354E-5,0.057:3.538185773175377E-5} > > > >> 21559 > > > >> > {0.006913:0.5411158302162348,0.073:0.10595054605908563,0.04:0.10013413310= 674449,0.007050:0.08891690362990352,0.003:0.08028744789933502,0.03:0.077798= 66329563544,0.025:0.0024314950846438975,0.0625:0.0018111845757907532,0.077:= 5.019763638644379E-4,0.055:2.6705221435486376E-4} > > > >> 21560 > > > >> > {0.06:0.9978884964462115,0.03:9.139179655096763E-4,0.02:6.066525825847198= E-4,0.04:3.2209695809936266E-4,0.006913:7.970816960964983E-5,0.007050:5.589= 84012266038E-5,0.046:1.545206468939752E-5,0.077:1.5319891034946036E-5,0:1.5= 223193391279898E-5,0.073:1.1687759018690983E-5} > > > >> 21561 > > > >> > {0.06:0.8704787441462888,0.007050:0.09131090501970876,0.0625:0.0377856627= 3654969,0.046:8.111987924033587E-5,0.01:6.397000919080148E-5,0.1:4.97913887= 5277178E-5,0.073:2.7942853854174412E-5,0.03:2.762357950066146E-5,0.077:2.68= 20742114533983E-5,0.025:2.6208487691114472E-5} > > > >> 21562 > > > >> > {0.06:0.9625603310705717,0.02:0.01708933716171641,0.007050:0.011374975845= 817934,0.07:0.00482114999912725,0.057:0.003310531318631415,0.077:2.56845746= 2720504E-4,0.025:1.3439876933758153E-4,0.1:1.0512433283405881E-4,0:8.737443= 941062146E-5,0.046:5.9284232059258864E-5} > > > >> 21563 > > > >> > {0.06:0.9996809273982157,0.046:1.6325482120709976E-4,0.02:2.1084598024339= 765E-5,0.006913:1.7901242019979392E-5,0.04:1.3991162886383784E-5,0.03:1.368= 2157013015017E-5,0.077:1.3602492456590212E-5,0.007050:1.206712606141859E-5,= 0.1:1.0231842284246997E-5,0.057:7.704725064510759E-6} > > > >> 21564 > > > >> > {0.06:0.9997861963976675,0.03:2.7140899197995276E-5,0.077:2.4456158311423= 386E-5,0.04:1.7952818233846462E-5,0.02:1.6334455815684533E-5,0.006913:1.491= 0843270220926E-5,0.073:1.4893553937733922E-5,0.1:1.2545291899719683E-5,0.00= 7050:1.2337617858874285E-5,0.07:1.121302251254011E-5} > > > >> 21565 > > > >> > {0.006913:0.5047832315246878,0.007050:0.2502505818382197,0.04:0.099375339= 60784072,0.03:0.05332716291468396,0.0625:0.035738976624857435,0.05:0.023139= 962103851885,0.1:0.01510786357969295,0.025:0.01294816540331917,0.06:0.00358= 9347327961106,0.073:4.8607269023994543E-4} > > > >> 21566 > > > >> > {0.03:0.865579490292393,0.073:0.065013560785593,0.077:0.05662209410876746= 5,0.046:0.006811842330071251,0.057:0.0021561477140846267,0.01:0.00131493759= 57061502,0.04:7.456782721333958E-4,0.05:6.412995789267404E-4,0:2.1070218793= 25011E-4,0.007050:1.8121393989190674E-4} > > > >> 21567 > > > >> > {0.077:0.6528663315309344,0.03:0.2794089480653573,0.025:0.060214953606503= 134,0.003:0.0029851203917978303,0.01:0.002840784719750811,0.007050:5.508511= 345707982E-4,0.02:3.3395642786457786E-4,0.07:1.7452781529689483E-4,0.055:1.= 541344869853217E-4,0.046:1.0154945247629696E-4} > > > >> 21568 > > > >> > {0.057:0.7925855379581803,0.03:0.2036907633660934,0.06:0.0022839240106577= 22,0.046:0.0011125177332923534,0.10:6.706442563331911E-5,0.02:3.90702271315= 96934E-5,0.07:3.266094677087569E-5,0.1:2.3436639268605713E-5,0.077:2.318067= 689954084E-5,0.006913:2.273173553155518E-5} > > > >> 21569 > > > >> > {0.06:0.7223525910216753,0.1:0.22280373045161775,0.04:0.04239924319412595= ,0.02:0.006529556700876843,0.007050:0.004661124794787862,0.10:2.01797403464= 8702E-4,0.05:2.017516118028694E-4,0.025:1.583677755896652E-4,0.006913:1.193= 9663934259253E-4,0:8.736457986006156E-5} > > > >> 21570 > > > >> > {0.073:0.5321098304788365,0.006913:0.3897385574581158,0.02:0.051125703897= 749404,0.077:0.012417023254098358,0.01:0.006491518762040415,0.03:0.00543461= 0908750246,0.055:9.352347087701305E-4,0.06:5.239363525283659E-4,0.10:4.4114= 444135088393E-4,0.003:2.2360302221231853E-4} > > > >> 21571 > > > >> > {0.06:0.9074255414695478,0.05:0.08971808812931319,0.02:0.0019124104766371= 694,0.1:2.581234320438502E-4,0.073:1.9041498001195312E-4,0.046:8.3483553821= 04383E-5,0.006913:5.748062878632201E-5,0.057:4.8979196235823963E-5,0.04:4.4= 005272512340306E-5,0.03:4.1089625609562384E-5} > > > >> 21572 > > > >> > {0.06:0.7216888928389846,0.04:0.1955684645266304,0.006913:0.0823545444706= 5854,0.1:9.82899034505965E-5,0.046:4.9184577303445956E-5,0.05:4.02196507016= 7039E-5,0.007050:3.081280652634891E-5,0.073:2.791145099471127E-5,0.02:1.956= 81856192452E-5,0:1.9037190007395713E-5} > > > >> 21573 > > > >> > {0.05:0.8568971411565196,0.046:0.12909436895238377,0.06:0.012704234652048= 044,0.04:2.472784666357729E-4,0.055:2.1886698996361582E-4,0.1:1.19744514447= 57112E-4,0.0625:9.082357988309755E-5,0.07:9.030017229129562E-5,0.03:7.26804= 5692763623E-5,0.073:6.607926928741721E-5} > > > >> 21574 > > > >> > {0.046:0.5619466458628039,0.006913:0.184782367089353,0.0625:0.09726566772= 972363,0.003:0.09534816862353344,0.02:0.015692927163565275,0.073:0.01549201= 7672231727,0.01:0.01411218625979968,0.007050:0.01037341031640615,0.055:0.00= 20124341216292752,0:0.0013828685922332715} > > > >> 21575 > > > >> > {0.05:0.3167681189235041,0.06:0.2879333280436204,0.046:0.2258462850652174= 5,0.003:0.15994241628395953,0.0625:0.006482994028630967,0.1:0.0016773104050= 919493,0.055:4.8725015996676173E-4,0.03:2.432893903382962E-4,0.025:1.601591= 4759364425E-4,0.006913:1.0021292427951807E-4} > > > >> 21576 > > > >> > {0.077:0.2407816981967022,0.003:0.18594668103110193,0.10:0.14951775492012= 523,0.0625:0.09241152906714677,0.007050:0.08997645163280943,0.057:0.0710286= 5286733068,0.055:0.048046579920457584,0.05:0.03776387140040494,0.073:0.0333= 55081179026046,0.006913:0.015980584385115525} > > > >> 21577 > > > >> > {0.06:0.9599074612361259,0.1:0.03694135499501186,0.04:0.00219413150482731= 86,0.057:4.7070027358666304E-4,0.02:1.8231606308803002E-4,0.0625:6.23494255= 7920162E-5,0.006913:5.159011604129561E-5,0.046:3.9832812943124216E-5,0.0070= 50:3.0380683715134534E-5,0.05:2.8920851352755496E-5} > > > >> > > > >> > > > >> *Streaming k-means:* > > > >> > > > >> [snip] > > > >> INFO: Number of Centroids: 0 > > > >> Jan 22, 2014 1:52:22 AM org.apache.hadoop.mapred.LocalJobRunner$Jo= b > run > > > >> WARNING: job_local23982482_0001 > > > >> java.lang.IllegalArgumentException: Must have nonzero number of > training > > > >> and test vectors. Asked for %.1f %% of %d vectors for test > > > >> [10.000000149011612, 0] > > > >> at > > > >> > com.google.common.base.Preconditions.checkArgument(Preconditions.java:120= ) > > > >> at > > > >> > org.apache.mahout.clustering.streaming.cluster.BallKMeans.splitTrainTest(= BallKMeans.java:176) > > > >> at > > > >> > org.apache.mahout.clustering.streaming.cluster.BallKMeans.cluster(BallKMe= ans.java:192) > > > >> at > > > >> > org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.g= etBestCentroids(StreamingKMeansReducer.java:107) > > > >> at > > > >> > org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.r= educe(StreamingKMeansReducer.java:73) > > > >> at > > > >> > org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.r= educe(StreamingKMeansReducer.java:37) > > > >> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:17= 7) > > > >> at > > > >> > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649) > > > >> at > org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418) > > > >> at > > > >> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398) > > > >> > > > >> [snip] > > > >> > > > >> WARNING: No qualcluster.props found on classpath, will use > command-line > > > >> arguments only > > > >> Num clusters: 0; maxDistance: 0.000000 > > > >> [Dunn Index] First: Infinity > > > >> [Davies-Bouldin Index] First: NaN > > > >> Jan 22, 2014 1:52:24 AM org.slf4j.impl.JCLLoggerAdapter info > > > >> INFO: Program took 535 ms (Minutes: 0.008916666666666666) > > > >> cluster,distance.mean,distance.sd > > > >> > ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.tra= in > > > >> > > > >> > > > >> On Tue, Jan 21, 2014 at 1:47 PM, Andrew Musselman < > > > >> andrew.musselman@gmail.com> wrote: > > > >> > > > >>> *classify-20newsgroups.sh* > > > >>> > > > >>> *Complementary naive bayes:* > > > >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D > > > >>> Summary > > > >>> ------------------------------------------------------- > > > >>> Correctly Classified Instances : 11207 98.940= 6% > > > >>> Incorrectly Classified Instances : 120 1.059= 4% > > > >>> Total Classified Instances : 11327 > > > >>> > > > >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D > > > >>> Confusion Matrix > > > >>> ------------------------------------------------------- > > > >>> a b c d e f g h i > > > >>> j k l m n o p q r > s > > > >>> t <--Classified as > > > >>> 475 0 0 1 0 0 0 0 0 > > > >>> 0 0 0 0 0 1 0 1 0 > 0 > > > >>> 0 | 478 a =3D alt.atheism > > > >>> 0 597 1 1 0 1 1 0 0 > > > >>> 0 0 1 0 2 1 0 0 0 > 0 > > > >>> 0 | 605 b =3D comp.graphics > > > >>> 0 1 620 3 0 1 0 0 0 > > > >>> 0 0 1 0 0 1 0 0 0 > 0 > > > >>> 0 | 627 c =3D comp.os.ms-windows.misc > > > >>> 1 1 1 593 2 0 0 0 0 > > > >>> 0 0 0 0 0 0 1 0 0 > 0 > > > >>> 0 | 599 d =3D comp.sys.ibm.pc.hardware > > > >>> 0 1 1 0 568 0 1 0 0 > > > >>> 0 1 1 2 0 0 0 0 1 > 0 > > > >>> 0 | 576 e =3D comp.sys.mac.hardware > > > >>> 0 4 2 0 0 581 0 0 0 > > > >>> 0 0 0 0 0 0 0 0 0 > 0 > > > >>> 0 | 587 f =3D comp.windows.x > > > >>> 0 0 0 1 2 0 571 3 0 > > > >>> 0 1 1 4 1 0 0 0 0 > 0 > > > >>> 0 | 584 g =3D misc.forsale > > > >>> 0 0 0 1 0 0 0 589 1 > > > >>> 0 0 1 1 0 0 0 0 0 > 0 > > > >>> 0 | 593 h =3D rec.autos > > > >>> 0 0 0 0 0 0 0 1 5= 65 > > > >>> 0 0 0 0 0 1 0 0 0 > 0 > > > >>> 0 | 567 i =3D rec.motorcycles > > > >>> 0 0 0 0 0 0 0 0 0 > > > >>> 600 2 0 0 0 1 0 0 0 > 0 > > > >>> 0 | 603 j =3D rec.sport.baseball > > > >>> 0 0 0 0 0 0 0 0 0 > > > >>> 1 584 0 0 0 0 0 0 0 > 0 > > > >>> 0 | 585 k =3D rec.sport.hockey > > > >>> 0 0 0 0 0 0 0 0 0 > > > >>> 0 0 579 0 0 0 0 0 1 > 0 > > > >>> 0 | 580 l =3D sci.crypt > > > >>> 0 0 0 1 3 0 2 0 0 > > > >>> 2 0 0 567 1 2 1 0 0 > 0 > > > >>> 0 | 579 m =3D sci.electronics > > > >>> 0 0 0 0 0 0 0 0 0 > > > >>> 0 0 0 1 605 0 0 0 0 > 0 > > > >>> 0 | 606 n =3D sci.med > > > >>> 0 0 0 0 0 0 0 0 0 > > > >>> 0 0 0 0 0 602 0 0 0 > 0 > > > >>> 0 | 602 o =3D sci.space > > > >>> 0 0 0 0 0 0 0 0 0 > > > >>> 0 0 0 0 1 0 602 0 0 > 1 > > > >>> 0 | 604 p =3D soc.religion.christian > > > >>> 0 0 0 0 0 0 0 0 0 > > > >>> 0 0 0 0 0 0 0 556 0 > 0 > > > >>> 0 | 556 q =3D talk.politics.mideast > > > >>> 0 0 1 0 0 0 0 0 0 > > > >>> 0 0 1 0 0 1 0 0 > 568 0 > > > >>> 0 | 571 r =3D talk.politics.guns > > > >>> 11 0 0 0 0 0 0 0 0 > > > >>> 1 0 0 0 1 3 8 1 4 > 338 > > > >>> 2 | 369 s =3D talk.religion.misc > > > >>> 0 0 0 0 0 0 0 0 0 > > > >>> 0 1 0 0 0 1 0 3 4 > 0 > > > >>> 447 | 456 t =3D talk.politics.misc > > > >>> > > > >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D > > > >>> Statistics > > > >>> ------------------------------------------------------- > > > >>> Kappa 0.9806 > > > >>> Accuracy 98.9406% > > > >>> Reliability 94.0932% > > > >>> Reliability (standard deviation) 0.2163 > > > >>> > > > >>> Jan 21, 2014 6:37:28 PM org.slf4j.impl.JCLLoggerAdapter info > > > >>> INFO: Program took 15870 ms (Minutes: 0.2645) > > > >>> + echo 'Testing on holdout set' > > > >>> Testing on holdout set > > > >>> + ./bin/mahout testnb -i > /tmp/mahout-work-ec2-user/20news-test-vectors > > > >>> -m /tmp/mahout-work-ec2-user/model -l > /tmp/mahout-work-ec2-user/labelindex > > > >>> -ow -o /tmp/mahout-work-ec2-user/20news-testing -c > > > >>> > > > >>> [snip] > > > >>> > > > >>> INFO: Complementary Results: > > > >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D > > > >>> Summary > > > >>> ------------------------------------------------------- > > > >>> Correctly Classified Instances : 6715 89.307= 1% > > > >>> Incorrectly Classified Instances : 804 10.692= 9% > > > >>> Total Classified Instances : 7519 > > > >>> > > > >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D > > > >>> Confusion Matrix > > > >>> ------------------------------------------------------- > > > >>> a b c d e f g h i > > > >>> j k l m n o p q r > s > > > >>> t <--Classified as > > > >>> 298 0 0 0 0 0 0 0 0 > > > >>> 1 0 0 0 1 2 5 1 0 > 13 > > > >>> 0 | 321 a =3D alt.atheism > > > >>> 0 298 11 6 1 12 2 2 1 > > > >>> 1 3 8 3 4 2 4 1 4 > 4 > > > >>> 1 | 368 b =3D comp.graphics > > > >>> 1 17 286 16 4 9 6 3 2 > > > >>> 0 1 0 1 7 1 0 2 1 > 0 > > > >>> 1 | 358 c =3D comp.os.ms-windows.misc > > > >>> 2 6 11 309 9 5 14 8 1 > > > >>> 0 2 0 6 4 2 0 1 2 > 1 > > > >>> 0 | 383 d =3D comp.sys.ibm.pc.hardware > > > >>> 0 10 8 7 334 7 5 5 2 > > > >>> 0 3 0 2 1 1 0 1 1 > 0 > > > >>> 0 | 387 e =3D comp.sys.mac.hardware > > > >>> 1 13 7 8 2 355 2 0 2 > > > >>> 0 0 5 1 1 3 0 0 1 > 0 > > > >>> 0 | 401 f =3D comp.windows.x > > > >>> 0 7 11 29 12 9 268 16 8 > > > >>> 4 3 2 6 4 2 1 3 1 > 2 > > > >>> 3 | 391 g =3D misc.forsale > > > >>> 0 1 0 0 3 0 7 362 8 > > > >>> 2 2 1 2 0 2 0 1 2 > 0 > > > >>> 4 | 397 h =3D rec.autos > > > >>> 0 0 0 1 0 0 1 0 4= 23 > > > >>> 0 0 0 2 1 0 1 0 0 > 0 > > > >>> 0 | 429 i =3D rec.motorcycles > > > >>> 0 0 1 0 0 0 0 2 2 > > > >>> 371 8 0 2 3 0 2 0 0 > 0 > > > >>> 0 | 391 j =3D rec.sport.baseball > > > >>> 0 0 1 0 0 0 1 0 0 > > > >>> 2 409 0 0 0 0 0 0 0 > 0 > > > >>> 1 | 414 k =3D rec.sport.hockey > > > >>> 0 0 1 2 1 0 1 0 0 > > > >>> 0 0 404 0 0 0 0 0 1 > 0 > > > >>> 1 | 411 l =3D sci.crypt > > > >>> 0 5 4 11 1 3 7 9 2 > > > >>> 5 3 3 339 2 6 0 1 1 > 2 > > > >>> 1 | 405 m =3D sci.electronics > > > >>> 0 4 0 1 0 0 0 1 0 > > > >>> 1 1 0 3 367 3 1 2 0 > 0 > > > >>> 0 | 384 n =3D sci.med > > > >>> 0 1 2 0 1 0 2 0 0 > > > >>> 1 0 0 1 1 375 0 1 0 > 0 > > > >>> 0 | 385 o =3D sci.space > > > >>> 4 2 1 1 0 0 1 1 2 > > > >>> 0 0 1 1 5 1 367 4 0 > 1 > > > >>> 1 | 393 p =3D soc.religion.christian > > > >>> 0 1 0 0 0 0 0 0 0 > > > >>> 2 0 0 0 0 0 2 378 0 > 1 > > > >>> 0 | 384 q =3D talk.politics.mideast > > > >>> 0 0 0 0 0 2 1 1 1 > > > >>> 1 0 3 0 3 0 0 2 > 319 2 > > > >>> 4 | 339 r =3D talk.politics.guns > > > >>> 32 0 0 1 0 0 0 0 0 > > > >>> 1 1 1 0 2 2 26 5 7 > 175 > > > >>> 6 | 259 s =3D talk.religion.misc > > > >>> 0 0 0 2 0 0 0 0 0 > > > >>> 1 2 2 0 1 2 1 10 1= 8 > 2 > > > >>> 278 | 319 t =3D talk.politics.misc > > > >>> > > > >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D > > > >>> Statistics > > > >>> ------------------------------------------------------- > > > >>> Kappa 0.8594 > > > >>> Accuracy 89.3071% > > > >>> Reliability 84.611% > > > >>> Reliability (standard deviation) 0.2148 > > > >>> > > > >>> Jan 21, 2014 6:37:39 PM org.slf4j.impl.JCLLoggerAdapter info > > > >>> INFO: Program took 10840 ms (Minutes: 0.18066666666666667) > > > >>> > > > >>> > > > >>> *Naive bayes:* > > > >>> INFO: Standard NB Results: > > > >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D > > > >>> Summary > > > >>> ------------------------------------------------------- > > > >>> Correctly Classified Instances : 11286 99.086= 9% > > > >>> Incorrectly Classified Instances : 104 0.913= 1% > > > >>> Total Classified Instances : 11390 > > > >>> > > > >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D > > > >>> Confusion Matrix > > > >>> ------------------------------------------------------- > > > >>> a b c d e f g h i > > > >>> j k l m n o p q r > s > > > >>> t <--Classified as > > > >>> 474 0 0 0 0 0 0 0 0 > > > >>> 0 0 0 0 0 0 0 0 0 > 2 > > > >>> 1 | 477 a =3D alt.atheism > > > >>> 0 566 0 2 0 1 0 0 0 > > > >>> 0 0 0 0 0 0 0 0 0 > 0 > > > >>> 0 | 569 b =3D comp.graphics > > > >>> 0 10 590 29 2 4 1 0 0 > > > >>> 0 0 0 1 0 0 0 0 0 > 0 > > > >>> 1 | 638 c =3D comp.os.ms-windows.misc > > > >>> 0 0 0 596 0 0 0 0 0 > > > >>> 0 0 0 0 0 0 0 0 0 > 0 > > > >>> 0 | 596 d =3D comp.sys.ibm.pc.hardware > > > >>> 0 0 0 0 575 0 1 0 0 > > > >>> 0 0 0 1 0 0 0 0 0 > 0 > > > >>> 0 | 577 e =3D comp.sys.mac.hardware > > > >>> 0 2 2 2 0 593 1 0 0 > > > >>> 0 0 0 0 0 1 0 0 0 > 0 > > > >>> 0 | 601 f =3D comp.windows.x > > > >>> 0 0 0 1 0 0 589 1 0 > > > >>> 0 1 0 2 0 0 0 0 0 > 0 > > > >>> 0 | 594 g =3D misc.forsale > > > >>> 0 0 0 0 0 0 0 594 0 > > > >>> 0 0 0 0 0 0 0 0 0 > 0 > > > >>> 0 | 594 h =3D rec.autos > > > >>> 0 0 0 0 0 0 0 0 6= 11 > > > >>> 0 0 0 0 0 0 0 0 0 > 0 > > > >>> 0 | 611 i =3D rec.motorcycles > > > >>> 0 0 0 0 0 0 0 0 0 > > > >>> 616 1 0 0 0 0 0 0 0 > 0 > > > >>> 0 | 617 j =3D rec.sport.baseball > > > >>> 0 0 0 0 0 0 1 0 0 > > > >>> 0 620 0 0 0 0 0 0 0 > 0 > > > >>> 0 | 621 k =3D rec.sport.hockey > > > >>> 0 0 0 0 0 0 0 0 0 > > > >>> 0 0 580 0 0 0 0 0 1 > 0 > > > >>> 0 | 581 l =3D sci.crypt > > > >>> 0 0 0 3 1 0 0 0 0 > > > >>> 0 0 0 571 0 0 0 0 0 > 0 > > > >>> 0 | 575 m =3D sci.electronics > > > >>> 0 0 0 0 0 0 0 0 0 > > > >>> 0 0 0 2 583 0 0 0 0 > 0 > > > >>> 0 | 585 n =3D sci.med > > > >>> 0 0 0 0 0 0 0 0 0 > > > >>> 0 0 0 0 1 599 0 0 0 > 0 > > > >>> 0 | 600 o =3D sci.space > > > >>> 0 1 0 0 0 0 0 0 0 > > > >>> 0 0 0 0 0 0 615 0 0 > 0 > > > >>> 0 | 616 p =3D soc.religion.christian > > > >>> 1 0 0 0 0 0 0 0 0 > > > >>> 0 0 0 0 0 0 1 560 0 > 0 > > > >>> 0 | 562 q =3D talk.politics.mideast > > > >>> 0 0 1 0 0 0 0 0 0 > > > >>> 0 0 1 0 0 0 0 0 > 548 0 > > > >>> 1 | 551 r =3D talk.politics.guns > > > >>> 10 0 0 0 0 0 0 0 0 > > > >>> 0 0 0 0 0 1 1 0 2 > 344 > > > >>> 1 | 359 s =3D talk.religion.misc > > > >>> 0 0 0 0 0 0 0 0 0 > > > >>> 0 0 1 1 0 0 0 0 2 > 0 > > > >>> 462 | 466 t =3D talk.politics.misc > > > >>> > > > >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D > > > >>> Statistics > > > >>> ------------------------------------------------------- > > > >>> Kappa 0.9847 > > > >>> Accuracy 99.0869% > > > >>> Reliability 94.3334% > > > >>> Reliability (standard deviation) 0.2169 > > > >>> > > > >>> Jan 21, 2014 9:30:25 PM org.slf4j.impl.JCLLoggerAdapter info > > > >>> INFO: Program took 14304 ms (Minutes: 0.2384) > > > >>> + echo 'Testing on holdout set' > > > >>> Testing on holdout set > > > >>> > > > >>> [snip] > > > >>> > > > >>> INFO: Standard NB Results: > > > >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D > > > >>> Summary > > > >>> ------------------------------------------------------- > > > >>> Correctly Classified Instances : 6718 90.101= 9% > > > >>> Incorrectly Classified Instances : 738 9.898= 1% > > > >>> Total Classified Instances : 7456 > > > >>> > > > >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D > > > >>> Confusion Matrix > > > >>> ------------------------------------------------------- > > > >>> a b c d e f g h i > > > >>> j k l m n o p q r > s > > > >>> t <--Classified as > > > >>> 294 0 0 0 0 0 0 0 0 > > > >>> 0 0 2 0 1 1 6 1 1 > 16 > > > >>> 0 | 322 a =3D alt.atheism > > > >>> 0 345 6 14 6 11 6 0 0 > > > >>> 0 0 5 7 1 3 0 0 0 > 0 > > > >>> 0 | 404 b =3D comp.graphics > > > >>> 2 29 177 78 22 19 9 1 0 > > > >>> 0 0 4 2 0 1 1 0 0 > 1 > > > >>> 1 | 347 c =3D comp.os.ms-windows.misc > > > >>> 1 9 2 335 18 2 10 0 0 > > > >>> 0 1 0 8 0 0 0 0 0 > 0 > > > >>> 0 | 386 d =3D comp.sys.ibm.pc.hardware > > > >>> 1 4 2 13 347 3 5 1 0 > > > >>> 0 1 0 7 1 0 0 0 1 > 0 > > > >>> 0 | 386 e =3D comp.sys.mac.hardware > > > >>> 0 20 0 4 0 352 4 0 0 > > > >>> 0 0 0 1 1 3 0 1 0 > 1 > > > >>> 0 | 387 f =3D comp.windows.x > > > >>> 0 2 0 21 5 1 323 7 2 > > > >>> 2 0 2 12 0 3 0 0 0 > 0 > > > >>> 1 | 381 g =3D misc.forsale > > > >>> 0 1 0 0 1 0 15 363 8 > > > >>> 1 0 0 4 1 0 0 0 1 > 0 > > > >>> 1 | 396 h =3D rec.autos > > > >>> 0 1 0 0 0 0 6 6 3= 70 > > > >>> 0 0 0 0 1 0 0 0 0 > 1 > > > >>> 0 | 385 i =3D rec.motorcycles > > > >>> 1 0 0 1 1 0 2 1 2 > > > >>> 362 5 0 2 0 0 0 0 0 > 0 > > > >>> 0 | 377 j =3D rec.sport.baseball > > > >>> 0 0 0 1 2 0 0 0 0 > > > >>> 3 371 0 0 0 0 0 0 0 > 0 > > > >>> 1 | 378 k =3D rec.sport.hockey > > > >>> 0 3 1 0 1 0 2 0 0 > > > >>> 0 0 396 0 1 0 0 1 1 > 1 > > > >>> 3 | 410 l =3D sci.crypt > > > >>> 0 7 0 7 7 2 6 4 0 > > > >>> 0 0 1 369 2 2 0 0 0 > 0 > > > >>> 2 | 409 m =3D sci.electronics > > > >>> 0 3 0 2 1 0 2 0 0 > > > >>> 0 0 1 4 383 4 0 0 1 > 0 > > > >>> 4 | 405 n =3D sci.med > > > >>> 0 5 0 0 1 0 3 0 0 > > > >>> 0 0 0 1 0 374 1 0 0 > 1 > > > >>> 1 | 387 o =3D sci.space > > > >>> 6 2 0 1 1 0 0 1 0 > > > >>> 1 0 0 1 5 0 352 2 1 > 7 > > > >>> 1 | 381 p =3D soc.religion.christian > > > >>> 1 1 0 0 0 0 0 0 0 > > > >>> 0 1 0 0 0 0 0 373 1 > 0 > > > >>> 1 | 378 q =3D talk.politics.mideast > > > >>> 0 0 0 0 0 0 1 0 1 > > > >>> 0 0 2 0 0 0 0 0 > 346 2 > > > >>> 7 | 359 r =3D talk.politics.guns > > > >>> 26 1 0 1 0 0 0 2 0 > > > >>> 1 1 0 0 1 1 20 2 6 > 200 > > > >>> 7 | 269 s =3D talk.religion.misc > > > >>> 1 0 0 0 0 0 0 2 0 > > > >>> 0 1 0 0 2 2 0 1 1= 4 > 0 > > > >>> 286 | 309 t =3D talk.politics.misc > > > >>> > > > >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D > > > >>> Statistics > > > >>> ------------------------------------------------------- > > > >>> Kappa 0.8726 > > > >>> Accuracy 90.1019% > > > >>> Reliability 85.4491% > > > >>> Reliability (standard deviation) 0.2222 > > > >>> > > > >>> Jan 21, 2014 9:30:37 PM org.slf4j.impl.JCLLoggerAdapter info > > > >>> INFO: Program took 10878 ms (Minutes: 0.1813) > > > >>> > > > >>> *SGD:* > > > >>> 7532 test files > > > >>> > > > >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D > > > >>> Summary > > > >>> ------------------------------------------------------- > > > >>> Correctly Classified Instances : 5649 7= 5% > > > >>> Incorrectly Classified Instances : 1883 2= 5% > > > >>> Total Classified Instances : 7532 > > > >>> > > > >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D > > > >>> Confusion Matrix > > > >>> ------------------------------------------------------- > > > >>> a b c d e f g h i > > > >>> j k l m n o p q r > s > > > >>> t <--Classified as > > > >>> 186 6 3 10 5 0 33 4 1= 3 > > > >>> 15 7 1 24 15 3 15 5 = 5 > 29 > > > >>> 15 | 394 a =3D sci.space > > > >>> 5 309 0 3 2 5 0 0 0 > > > >>> 1 9 21 2 0 0 18 4 4 > 1 > > > >>> 1 | 385 b =3D comp.sys.mac.hardware > > > >>> 4 1 101 3 0 1 63 0 7 > > > >>> 0 1 1 5 16 3 0 3 7 > 1 > > > >>> 34 | 251 c =3D talk.religion.misc > > > >>> 11 12 1 265 1 10 3 0 0 > > > >>> 17 10 11 5 2 0 11 3 6 > 21 > > > >>> 0 | 389 d =3D comp.graphics > > > >>> 2 1 1 0 349 2 3 0 3 > > > >>> 2 6 1 5 1 0 2 15 2 > 1 > > > >>> 2 | 398 e =3D rec.motorcycles > > > >>> 7 20 3 19 2 254 6 0 2 > > > >>> 11 2 39 7 2 0 4 2 2 > 9 > > > >>> 3 | 394 f =3D comp.os.ms-windows.misc > > > >>> 2 1 13 0 0 0 247 0 1 > > > >>> 1 3 0 6 2 4 0 2 3 > 5 > > > >>> 29 | 319 g =3D alt.atheism > > > >>> 1 1 0 0 2 0 2 361 0 > > > >>> 1 2 0 2 0 0 1 3 2= 2 > 0 > > > >>> 1 | 399 h =3D rec.sport.hockey > > > >>> 3 0 3 1 0 0 5 0 1= 61 > > > >>> 0 1 2 12 102 0 0 1 2 > 11 > > > >>> 6 | 310 i =3D talk.politics.misc > > > >>> 2 8 0 19 0 19 0 0 1 > > > >>> 294 10 11 4 2 0 5 0 3 > 11 > > > >>> 6 | 395 j =3D comp.windows.x > > > >>> 2 10 0 1 1 0 0 0 0 > > > >>> 1 347 13 2 1 0 5 3 2 > 2 > > > >>> 0 | 390 k =3D misc.forsale > > > >>> 1 36 0 6 1 25 0 0 1 > > > >>> 6 10 257 2 1 0 34 6 0 > 6 > > > >>> 0 | 392 l =3D comp.sys.ibm.pc.hardware > > > >>> 2 2 2 2 1 0 12 0 0 > > > >>> 6 10 4 312 5 2 13 11 3 > 3 > > > >>> 6 | 396 m =3D sci.med > > > >>> 2 0 3 2 1 0 0 1 1= 3 > > > >>> 0 5 1 2 314 2 0 2 = 2 > 10 > > > >>> 4 | 364 n =3D talk.politics.guns > > > >>> 1 0 2 1 1 0 34 1 3= 3 > > > >>> 1 3 0 1 8 271 1 4 = 5 > 6 > > > >>> 3 | 376 o =3D talk.politics.mideast > > > >>> 3 14 0 8 2 8 3 1 1 > > > >>> 7 12 29 6 2 1 245 13 2 > 32 > > > >>> 4 | 393 p =3D sci.electronics > > > >>> 3 3 0 2 11 0 1 0 2 > > > >>> 1 11 6 4 2 0 11 330 4 > 4 > > > >>> 1 | 396 q =3D rec.autos > > > >>> 0 0 1 0 1 0 4 12 3 > > > >>> 1 3 0 0 0 0 5 6 > 359 1 > > > >>> 1 | 397 r =3D rec.sport.baseball > > > >>> 0 1 0 0 0 1 0 0 3 > > > >>> 3 0 0 3 2 1 6 1 6 > 366 > > > >>> 3 | 396 s =3D sci.crypt > > > >>> 0 2 11 1 1 0 40 0 1 > > > >>> 2 3 4 2 1 0 5 0 2 > 2 > > > >>> 321 | 398 t =3D soc.religion.christian > > > >>> > > > >>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D > > > >>> Statistics > > > >>> ------------------------------------------------------- > > > >>> Kappa 0.7073 > > > >>> Accuracy 75% > > > >>> Reliability 70.6238% > > > >>> Reliability (standard deviation) 0.2187 > > > >>> Log-likelihood mean : -1.1182 > > > >>> 25%-ile : -1.6911 > > > >>> 75%-ile : -0.0803 > > > >>> > > > >>> Jan 21, 2014 9:46:39 PM org.slf4j.impl.JCLLoggerAdapter info > > > >>> INFO: Program took 10783 ms (Minutes: 0.17971666666666666) > > > >>> > > > >>> > > > >>> > > > >>> > > > >>> On Tue, Jan 21, 2014 at 1:08 PM, Suneel Marthi < > suneel_marthi@yahoo.com>wrote: > > > >>> > > > >>>> Thanks Andrew for reporting that. I rolled back the release to > fix this > > > >>>> and few other issues. > > > >>>> > > > >>>> We have removed asf-examples*.sh from trunk as the sample file a= t > the > > > >>>> url mentioned in ur email is not available. > > > >>>> This is something we need to fix and restore in 1.0. > > > >>>> > > > >>>> > > > >>>> > > > >>>> > > > >>>> > > > >>>> > > > >>>> > > > >>>> On Tuesday, January 21, 2014 3:21 PM, Andrew Palumbo < > > > >>>> ap.dev@outlook.com> wrote: > > > >>>> > > > >>>> from the asf-email-examples.sh script: > > > >>>> > > > >>>> # You will need to download or otherwise obtain some or all of t= he > > > >>>> Amazon ASF Em > > > >>>> ail Public Dataset ( > http://aws.amazon.com/datasets/7791434387204566) > > > >>>> to use this > > > >>>> script. > > > >>>> # To obtain a full copy you will need to launch an EC2 instance > and > > > >>>> mount the da > > > >>>> taset to download it, otherwise you can get a sample of it at > > > >>>> # > > > >>>> > http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout > > > >>>> > > > >>>> It looks like the: > > > >>>> > > > >>>> > http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout > > > >>>> > > > >>>> link is down. > > > >>>> > > > >>>> Is there somewhere else that we can get a subset of the ASF > emails? > > > >>>> > > > >>>> > > > >>>> > > > >>>> Date: Tue, 21 Jan 2014 09:48:06 -0800 > > > >>>> > Subject: Re: MAHOUT 0.9 Release - New URL > > > >>>> > From: andrew.musselman@gmail.com > > > >>>> > To: dev@mahout.apache.org > > > >>>> > > > > >>>> > Sure thing; continuing to smoke test the other examples tonigh= t > > > >>>> > > > > >>>> > > > > >>>> > On Tue, Jan 21, 2014 at 9:23 AM, Suneel Marthi < > > > >>>> suneel_marthi@yahoo.com>wrote: > > > >>>> > > > > >>>> > > Thanks Andrew M., see that some of the example scripts need > to be > > > >>>> fixed as > > > >>>> > > they still refer to the deprecated algorithms. > > > >>>> > > See that the Streaming KMeans has failed for you as well. > > > >>>> > > > > > >>>> > > I'll be rolling back the release today to fix these issues. > > > >>>> > > > > > >>>> > > > > > >>>> > > > > > >>>> > > > > > >>>> > > > > > >>>> > > On Tuesday, January 21, 2014 1:22 AM, Andrew Musselman < > > > >>>> > > andrew.musselman@gmail.com> wrote: > > > >>>> > > > > > >>>> > > Builds on Ubuntu 12.04 from tarball and zip, and on AWS's > default > > > >>>> 64-bit > > > >>>> > > Linux AMI from tarball. > > > >>>> > > > > > >>>> > > All tests pass. > > > >>>> > > > > > >>>> > > *Output of examples:* > > > >>>> > > *asf-email-examples.sh, run on mahout.apache.org > > > >>>> > > :* > > > >>>> > > *recommendations:* > > > >>>> > > [ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat > > > >>>> > > /user/ec2-user/asf-output/prefs/recommendations/part-r-00000 > | less > > > >>>> > > 1 > > > >>>> > > > > > >>>> > > > > > >>>> > [21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,194= 83:1.0,24067:1.0] > > > >>>> > > 4 > > > >>>> > > > > > >>>> > > > > > >>>> > [14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,147= 77:1.0,11792:1.0,26764:1.0] > > > >>>> > > 6 > > > >>>> > > > > > >>>> > > > > > >>>> > [5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,117= 92:1.0,26707:1.0,28116:1.0] > > > >>>> > > 8 > > > >>>> > > [12758:1.0,19409:1.0,11112:1.0] > > > >>>> > > 11 > > > >>>> > > > > > >>>> > > > > > >>>> > [25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,145= 61:1.0,3686:1.0,26707:1.0] > > > >>>> > > 14 > > > >>>> > > > > > >>>> > > > > > >>>> > [29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17= 290:1.0,17819:1.0,3829:1.0] > > > >>>> > > 15 > > > >>>> > > > > > >>>> > > > > > >>>> > [15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,152= 86:1.0,20916:1.0,2812:1.0] > > > >>>> > > 16 > > > >>>> > > > > > >>>> > > > > > >>>> > [23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503= :1.0,19409:1.0,27700:1.0] > > > >>>> > > 18 > > > >>>> > > > > > >>>> > > > > > >>>> > [29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15= 228:1.0,24930:1.0,23708:1.0] > > > >>>> > > 19 > [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0] > > > >>>> > > 20 > > > >>>> > > > > > >>>> > > > > > >>>> > [19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,116= 26:1.0,18577:1.0,16734:1.0] > > > >>>> > > [snip] > > > >>>> > > > > > >>>> > > *clustering; kmeans:* > > > >>>> > > [snip] > > > >>>> > > Weight : [props - optional]: Point: > > > >>>> > > 1.0 : > > > >>>> > > [distance-squared=3D1.0193102046188427]: > > > >>>> > > > /commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus=3D > > > >>>> > > [1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110, > > > >>>> 7573:0.204, > > > >>>> > > 7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093, > > > >>>> 9779:0.159, > > > >>>> > > 10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, > 17007:0.244, > > > >>>> > > 19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, > 24649:0.095, > > > >>>> > > 25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, > 31559:0.075, > > > >>>> > > 31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, > 38378:0.130, > > > >>>> > > 39789:0.110, 40743:0.190, 45775:0.086] > > > >>>> > > 1.0 : [distance-squared=3D0.9823018320457279]: > > > >>>> > > > /commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus=3D > > > >>>> > > [1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104, > > > >>>> 5336:0.106, > > > >>>> > > 6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173, > > > >>>> 7832:0.072, > > > >>>> > > 10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, > 19359:0.177, > > > >>>> > > 19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, > 24649:0.092, > > > >>>> > > 25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, > 30459:0.072, > > > >>>> > > 30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, > 36491:0.073, > > > >>>> > > 36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, > 45775:0.083] > > > >>>> > > 1.0 : [distance-squared=3D0.9509142993214911]: > > > >>>> > > > > > >>>> > /commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor =3D > > > >>>> > > [648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048, > > > >>>> > > 4419:0.076, > > > >>>> > > 4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056, > > > >>>> 7235:0.048, > > > >>>> > > 7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123, > > > >>>> 7683:0.077, > > > >>>> > > 7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118, > > > >>>> 10225:0.081, > > > >>>> > > 10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, > 11663:0.087, > > > >>>> > > 12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, > 14352:0.061, > > > >>>> > > 14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, > 19774:0.124, > > > >>>> > > 20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078, > 23974:0.105, > > > >>>> > > 23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151, > 25128:0.052, > > > >>>> > > 29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046, > 31727:0.104, > > > >>>> > > 31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069, > 33112:0.177, > > > >>>> > > 33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042, > 35795:0.066, > > > >>>> > > 36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200, > 37111:0.071, > > > >>>> > > 39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079, > 41155:0.167, > > > >>>> > > 41280:0.065, 41696:0.072, 41947:0.118, > > > >>>> > > 43685:0.086, 44077:0.308, > > > >>>> > > 44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052, > 46766:0.074, > > > >>>> > > 47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110] > > > >>>> > > [snip] > > > >>>> > > > > > >>>> > > *clustering; dirichlet:* > > > >>>> > > Get this complaint: > > > >>>> > > Running Dirichlet with K =3D 8 > > > >>>> > > Running on hadoop, using > /home/ec2-user/hadoop-1.2.1/bin/hadoop and > > > >>>> > > HADOOP_CONF_DIR=3D > > > >>>> > > MAHOUT-JOB: > > > >>>> > > > > > >>>> > > > > > >>>> > /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.= 9-job.jar > > > >>>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add > class: > > > >>>> dirichlet > > > >>>> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.pro= ps > > > >>>> found on > > > >>>> > > classpath, will use command-line arguments only > > > >>>> > > Unknown program 'dirichlet' chosen. > > > >>>> > > > > > >>>> > > *clustering: minhash:* > > > >>>> > > Running Minhash > > > >>>> > > Running on hadoop, using > /home/ec2-user/hadoop-1.2.1/bin/hadoop and > > > >>>> > > HADOOP_CONF_DIR=3D > > > >>>> > > MAHOUT-JOB: > > > >>>> > > > > > >>>> > > > > > >>>> > /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.= 9-job.jar > > > >>>> > > 14/01/21 05:17:27 WARN > > > >>>> > > driver.MahoutDriver: Unable to add class: minhash > > > >>>> > > 14/01/21 05:17:27 WARN driver.MahoutDriver: No minhash.props > found > > > >>>> on > > > >>>> > > classpath, will use command-line arguments only > > > >>>> > > Unknown program 'minhash' chosen. > > > >>>> > > > > > >>>> > > *classification; standard:* > > > >>>> > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > >>>> > > Summary > > > >>>> > > ------------------------------------------------------- > > > >>>> > > Correctly Classified Instances : 5384 > 87.7874% > > > >>>> > > Incorrectly Classified Instances : 749 > 12.2126% > > > >>>> > > Total Classified Instances : 6133 > > > >>>> > > > > > >>>> > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > >>>> > > Confusion Matrix > > > >>>> > > ------------------------------------------------------- > > > >>>> > > a b c d > > > >>>> > > <--Classified as > > > >>>> > > 2949 7 531 25 | 3512 a =3D de= v > > > >>>> > > 0 0 0 0 | 0 b =3D > general > > > >>>> > > 99 8 1763 8 | 1878 c =3D us= er > > > >>>> > > 41 1 29 672 | 743 d =3D > commits > > > >>>> > > > > > >>>> > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > >>>> > > Statistics > > > >>>> > > ------------------------------------------------------- > > > >>>> > > Kappa > > > >>>> > > 0.7877 > > > >>>> > > Accuracy 87.7874% > > > >>>> > > Reliability 53.658% > > > >>>> > > Reliability (standard deviation) 0.4911 > > > >>>> > > > > > >>>> > > *classification; complementary:* > > > >>>> > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > >>>> > > Summary > > > >>>> > > ------------------------------------------------------- > > > >>>> > > Correctly Classified Instances : 5530 > 90.1679% > > > >>>> > > Incorrectly Classified Instances : 603 > 9.8321% > > > >>>> > > Total Classified Instances : > > > >>>> > > 6133 > > > >>>> > > > > > >>>> > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > >>>> > > Confusion Matrix > > > >>>> > > ------------------------------------------------------- > > > >>>> > > a b c d <--Classified as > > > >>>> > > 3168 0 276 68 | 3512 a =3D de= v > > > >>>> > > 0 0 0 0 | 0 b =3D > general > > > >>>> > > 196 0 1652 30 | 1878 c =3D us= er > > > >>>> > > 25 0 8 710 | 743 d =3D > > > >>>> > > commits > > > >>>> > > > > > >>>> > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > >>>> > > Statistics > > > >>>> > > ------------------------------------------------------- > > > >>>> > > Kappa 0.8259 > > > >>>> > > Accuracy 90.1679% > > > >>>> > > Reliability 54.7459% > > > >>>> > > Reliability (standard deviation) 0.5005 > > > >>>> > > > > > >>>> > > 14/01/21 05:28:42 INFO driver.MahoutDriver: Program took > 20901 ms > > > >>>> (Minutes: > > > >>>> > > 0.34836666666666666) > > > >>>> > > > > > >>>> > > *classification; sgd, with three categories:* > > > >>>> > > Running SGD Training > > > >>>> > > Running on hadoop, using > /home/ec2-user/hadoop-1.2.1/bin/hadoop > > > >>>> > > and > > > >>>> > > HADOOP_CONF_DIR=3D > > > >>>> > > MAHOUT-JOB: > > > >>>> > > > > > >>>> > > > > > >>>> > /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.= 9-job.jar > > > >>>> > > 14/01/21 05:58:00 WARN driver.MahoutDriver: No > > > >>>> > > org.apache.mahout.classifier.sgd.TrainASFEmail.props found o= n > > > >>>> classpath, > > > >>>> > > will use command-line arguments only > > > >>>> > > 14/01/21 05:58:00 INFO common.AbstractJob: Command line > arguments: > > > >>>> > > {--cardinality=3D[100000], --categories=3D[3], > --endPhase=3D[2147483647], > > > >>>> > > --input=3D[asf-output/classification/sgd/splits/mapRedOut/], > > > >>>> > > --output=3D[asf-output/classification/sgd/models], > --poolSize=3D[5], > > > >>>> > > --startPhase=3D[0], --tempDir=3D[temp], --threads=3D[20]} > > > >>>> > > 24168 training files > > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 > 1 > > > >>>> > > 0.000 0.00 none > > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 > > > >>>> > > 2 > > > >>>> > > 0.000 0.00 none > > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 > 3 > > > >>>> > > 0.000 0.00 none > > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 > 4 > > > >>>> > > 0.000 0.00 none > > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 > 6 > > > >>>> > > 0.000 0.00 none > > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 > 8 > > > >>>> > > 0.000 0.00 none > > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 > 10 > > > >>>> > > 0.000 0.00 none > > > >>>> > > 0.00 0.00 > > > >>>> > > 0.00 0.00 0.0000000 0.0000000 12 > > > >>>> > > 0.000 0.00 none > > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 > 15 > > > >>>> > > 0.000 0.00 none > > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 > 20 > > > >>>> > > 0.000 0.00 none > > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 > 25 > > > >>>> > > 0.000 0.00 none > > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 > 30 > > > >>>> > > 0.000 0.00 none > > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 > > > >>>> > > 0.0000000 40 > > > >>>> > > 0.000 0.00 none > > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 > 50 > > > >>>> > > 0.000 0.00 none > > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 > 60 > > > >>>> > > 0.000 0.00 none > > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 > 70 > > > >>>> > > 0.000 0.00 none > > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 > 80 > > > >>>> > > 0.000 0.00 none > > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 > 100 > > > >>>> > > 0.000 > > > >>>> > > 0.00 none > > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 > 120 > > > >>>> > > 0.000 0.00 none > > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 > 140 > > > >>>> > > 0.000 0.00 none > > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 > 150 > > > >>>> > > 0.000 0.00 none > > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 > 200 > > > >>>> > > 0.000 0.00 none > > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 > 250 > > > >>>> > > 0.000 0.00 none > > > >>>> > > 0.00 0.00 > > > >>>> > > 0.00 0.00 0.0000000 0.0000000 300 > > > >>>> > > 0.000 0.00 none > > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 > 400 > > > >>>> > > 0.000 0.00 none > > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 > 500 > > > >>>> > > 0.000 0.00 none > > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 > 600 > > > >>>> > > 0.000 0.00 none > > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 > 700 > > > >>>> > > 0.000 0.00 none > > > >>>> > > 0.00 0.00 0.00 0.00 0.0000000 > > > >>>> > > 0.0000000 800 > > > >>>> > > 0.000 0.00 none > > > >>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-0= 8 > > > >>>> > > 1.0019413e-08 1000 -0.607 75.78 none > > > >>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-0= 8 > > > >>>> > > 1.0019413e-08 1200 -0.607 75.78 none > > > >>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-0= 8 > > > >>>> > > 1.0019413e-08 1400 -0.607 75.78 none > > > >>>> > > 0.13 32659.00 12672.00 82.50 1.3512194e-0= 8 > > > >>>> > > 1.0019413e-08 1500 -0.607 75.78 none > > > >>>> > > 0.24 43686.00 17924.00 329.50 > > > >>>> > > 1.0571799e-08 > > > >>>> > > 1.0032261e-08 2000 -0.487 82.65 none > > > >>>> > > 0.24 49753.00 21610.00 330.71 1.3770070e-0= 8 > > > >>>> > > 1.0011902e-08 2500 -0.439 83.90 none > > > >>>> > > 0.24 49753.00 21610.00 330.71 1.3770070e-0= 8 > > > >>>> > > 1.0011902e-08 3000 -0.439 83.90 none > > > >>>> > > 0.32 50635.00 28531.00 437.09 1.0551175e-0= 8 > > > >>>> > > 1.0000001e-08 4000 -0.351 88.14 none > > > >>>> > > 0.32 50635.00 32642.00 437.09 1.0551175e-0= 8 > > > >>>> > > 1.0000000e-08 5000 -0.378 87.10 none > > > >>>> > > 0.32 50635.00 36461.00 437.09 > > > >>>> > > 1.0556652e-08 > > > >>>> > > 1.0000001e-08 6000 -0.372 86.89 none > > > >>>> > > 0.32 50635.00 37768.00 437.09 1.0576742e-0= 8 > > > >>>> > > 1.0000001e-08 7000 -0.334 89.26 none > > > >>>> > > 0.32 50635.00 38807.00 437.09 1.0576742e-0= 8 > > > >>>> > > 1.0000000e-08 8000 -0.368 87.52 none > > > >>>> > > 0.32 50635.00 44731.00 437.09 1.0576716e-0= 8 > > > >>>> > > 1.0000000e-08 10000 -0.374 87.39 none > > > >>>> > > 0.32 50635.00 45672.00 437.09 1.0576716e-0= 8 > > > >>>> > > 1.0000000e-08 12000 -0.298 88.26 none > > > >>>> > > Exception in thread "main" java.lang.IllegalStateException: > > > >>>> > > java.lang.ArrayIndexOutOfBoundsException: > > > >>>> > > 2 > > > >>>> > > at > > > >>>> > > > > > >>>> > > > > > >>>> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBuff= eredExamples(AdaptiveLogisticRegression.java:175) > > > >>>> > > at > > > >>>> > > > > > >>>> > > > > > >>>> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(Adaptiv= eLogisticRegression.java:147) > > > >>>> > > at > > > >>>> > > > > > >>>> > > > > > >>>> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(Adaptiv= eLogisticRegression.java:132) > > > >>>> > > at > > > >>>> > > > > > >>>> > org.apache.mahout.classifier.sgd.TrainASFEmail.run(TrainASFEmail.java:109= ) > > > >>>> > > at > > > >>>> > > > > > >>>> > org.apache.mahout.classifier.sgd.TrainASFEmail.main(TrainASFEmail.java:14= 2) > > > >>>> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Nati= ve > > > >>>> Method) > > > >>>> > > at > > > >>>> > > > > > >>>> > > > > > >>>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java= :57) > > > >>>> > > > > > >>>> > > at > > > >>>> > > > > > >>>> > > > > > >>>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI= mpl.java:43) > > > >>>> > > at java.lang.reflect.Method.invoke(Method.java:622) > > > >>>> > > at > > > >>>> > > > > > >>>> > > > > > >>>> > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDri= ver.java:68) > > > >>>> > > at > > > >>>> > > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) > > > >>>> > > at > > > >>>> > > > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) > > > >>>> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Nati= ve > > > >>>> Method) > > > >>>> > > at > > > >>>> > > > > > >>>> > > > > > >>>> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java= :57) > > > >>>> > > at > > > >>>> > > > > > >>>> > > > > > >>>> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI= mpl.java:43) > > > >>>> > > at java.lang.reflect.Method.invoke(Method.java:622) > > > >>>> > > at > > > >>>> > > org.apache.hadoop.util.RunJar.main(RunJar.java:160) > > > >>>> > > Caused by: java.lang.ArrayIndexOutOfBoundsException: 2 > > > >>>> > > at > > > >>>> > > > org.apache.mahout.math.DenseVector.setQuick(DenseVector.java:141) > > > >>>> > > at > > > >>>> > > > > > >>>> > > > > > >>>> > org.apache.mahout.classifier.sgd.DefaultGradient.apply(DefaultGradient.ja= va:44) > > > >>>> > > at > > > >>>> > > > > > >>>> > > > > > >>>> > org.apache.mahout.classifier.sgd.AbstractOnlineLogisticRegression.train(A= bstractOnlineLogisticRegression.java:167) > > > >>>> > > at > > > >>>> > > > > > >>>> > > > > > >>>> > org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.= java:137) > > > >>>> > > at > > > >>>> > > > > > >>>> > > > > > >>>> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train= (AdaptiveLogisticRegression.java:444) > > > >>>> > > at > > > >>>> > > > > > >>>> > > > > > >>>> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(Adapt= iveLogisticRegression.java:158) > > > >>>> > > > > > >>>> > > at > > > >>>> > > > > > >>>> > > > > > >>>> > org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(Adapt= iveLogisticRegression.java:153) > > > >>>> > > at > > > >>>> > > > > > >>>> > > > > > >>>> > org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:= 148) > > > >>>> > > at > > > >>>> > > > > > >>>> > > > > > >>>> > org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:= 145) > > > >>>> > > at > > > >>>> > > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) > > > >>>> > > at > java.util.concurrent.FutureTask.run(FutureTask.java:166) > > > >>>> > > at > > > >>>> > > > > > >>>> > > > > > >>>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java= :1146) > > > >>>> > > at > > > >>>> > > > > > >>>> > > > > > >>>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.jav= a:615) > > > >>>> > > at java.lang.Thread.run(Thread.java:701) > > > >>>> > > > > > >>>> > > > > > >>>> > > > > > >>>> > > > > > >>>> > > > > > >>>> > > > > > >>>> > > > > > >>>> > > > > > >>>> > > > > > >>>> > > > > > >>>> > > > > > >>>> > > > > > >>>> > > On Mon, Jan 20, 2014 at 9:37 AM, Andrew Musselman < > > > >>>> > > andrew.musselman@gmail.com> wrote: > > > >>>> > > > > > >>>> > > > Trying out the build today > > > >>>> > > > > > > >>>> > > > > > > >>>> > > > On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi < > > > >>>> suneel_marthi@yahoo.com > > > >>>> > > >wrote: > > > >>>> > > > > > > >>>> > > >> This is an issue (trivial one though) that needs to be > fixed for > > > >>>> 0.9 > > > >>>> > > >> Release, will be rerolling the release today (in the next > few > > > >>>> hrs) and > > > >>>> > > >> putting out a new release candidate in staging. > > > >>>> > > >> > > > >>>> > > >> Thanks for reporting this Andrew P. > > > >>>> > > >> > > > >>>> > > >> > > > >>>> > > >> > > > >>>> > > >> > > > >>>> > > >> > > > >>>> > > >> On Monday, January 20, 2014 12:34 AM, Andrew Palumbo < > > > >>>> > > ap.dev@outlook.com> > > > >>>> > > >> wrote: > > > >>>> > > >> > > > >>>> > > >> I ran through the tests with on a CentOS VM > > > >>>> > > AMD64 2 cores 4 GB RAM. Had > > > >>>> > > >> a bit of trouble getting the Hadoop natives to compile an= d > > > >>>> therefore may > > > >>>> > > >> have run into some problems because of the hadoop setup. > Ran > > > >>>> into some > > > >>>> > > >> problems in the example scripts. Particularly with > > > >>>> > > >> ./cluster-syntheticcontrol.sh ->4,5. I will run through > the > > > >>>> rest of the > > > >>>> > > >> examples when im sure I've got hadoop setup right. > > > >>>> > > >> > > > >>>> > > >> > > > >>>> > > >> Apache Maven 3.1.2-SNAPSHOT > > > >>>> > > >> Java version: 1.6.0_45, vendor: Sun Microsystems Inc. > > > >>>> > > >> Java home: /usr/java/jdk1.6.0_45/jre > > > >>>> > > >> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", > arch: > > > >>>> "amd64", > > > >>>> > > >> family: "unix" > > > >>>> > > >> $MAHOUT_LOCAL=3Dtrue > > > >>>> > > >> Hadoop 2.2.0 > > > >>>> > > >> > > > >>>> > > >> > > > >>>> > > >> a) Verify that u can unpack the release (tar or zip) > ...passed > > > >>>> (tar) > > > >>>> > > >> [passed ] > > > >>>> > > >> > > > >>>> > > >> b) Verify u r able to compile the > > > >>>> > > distro > > > >>>> > > >> > > > >>>> > > >> mvn compile- [passed with warnings] > > > >>>> > > >> > > > >>>> > > >> [WARNING] Expected all dependencies to require Scala > > > >>>> version: 2.9.3 > > > >>>> > > >> [WARNING] org.apache.mahout:mahout-math-scala:0.9 > requires > > > >>>> scala > > > >>>> > > >> version: 2.9.3 > > > >>>> > > >> [WARNING] org.scalatest:scalatest_2.9.2:1.9.1 > requires scala > > > >>>> > > >> version: 2.9.2 > > > >>>> > > >> [WARNING] Multiple versions of scala libraries > detected! > > > >>>> > > >> > > > >>>> > > >> c) Run through the unit tests: mvn clean test > > > >>>> > > >> mvn clean test [passed] > > > >>>> > > >> > > > >>>> > > >> d) Run the > > > >>>> > > >> example scripts under $MAHOUT_HOME/examples/bin. > > > >>>> > > >> Please run through all the different options in each scri= pt > > > >>>> > > >> > > > >>>> > > >> Running example scripts with $MAHOUT_LOCAL=3Dtrue > > > >>>> > > >> > > > >>>> > > >> > > > >>>> > > ./cluster-syntheticcontrol.sh ->1 [works] > > > >>>> > > >> ./cluster-syntheticcontrol.sh ->2 [works] > > > >>>> > > >> ./cluster-syntheticcontrol.sh ->3 [works] > > > >>>> > > >> > > > >>>> > > >> > > > >>>> > > >> ./cluster-syntheticcontrol.sh ->4 [exits, throws > exception] > > > >>>> > > >> [...] > > > >>>> > > >> WARNING: Unable to add class: > > > >>>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.J= ob > > > >>>> > > >> java.lang.ClassNotFoundException: > > > >>>> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.J= ob > > > >>>> > > >> at > > > >>>> > > >> java.net.URLClassLoader$1.run(URLClassLoader.java:202) > > > >>>> > > >> at > java.security.AccessController.doPrivileged(Native > > > >>>> Method) > > > >>>> > > >> at > > > >>>> java.net.URLClassLoader.findClass(URLClassLoader.java:190) > > > >>>> > > >> at > > > >>>> > > java.lang.ClassLoader.loadClass(ClassLoader.java:306) > > > >>>> > > >> at > > > >>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) > > > >>>> > > >> at > java.lang.ClassLoader.loadClass(ClassLoader.java:247) > > > >>>> > > >> at java.lang.Class.forName0(Native Method) > > > >>>> > > >> at java.lang.Class.forName(Class.java:171) > > > >>>> > > >> at > > > >>>> > > >> > > > >>>> > org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237) > > > >>>> > > >> at > > > >>>> > > >> > > > >>>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128= ) > > > >>>> > > >> Jan 19, 2014 7:55:31 PM > org.slf4j.impl.JCLLoggerAdapter warn > > > >>>> > > >> > > > >>>> > > >> > > > >>>> > > >> ./cluster-syntheticcontrol.sh ->5 [exits, throws > exception] > > > >>>> > > >> > > > >>>> > > >> WARNING: Unable to add class: > > > >>>> > > >> > > > >>>> > > org.apache.mahout.clustering.syntheticcontrol.meanshift.Job > > > >>>> > > >> java.lang.ClassNotFoundException: > > > >>>> > > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.J= ob > > > >>>> > > >> at > java.net.URLClassLoader$1.run(URLClassLoader.java:202) > > > >>>> > > >> at > java.security.AccessController.doPrivileged(Native > > > >>>> Method) > > > >>>> > > >> at > > > >>>> java.net.URLClassLoader.findClass(URLClassLoader.java:190) > > > >>>> > > >> at > java.lang.ClassLoader.loadClass(ClassLoader.java:306) > > > >>>> > > >> at > > > >>>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) > > > >>>> > > >> at > java.lang.ClassLoader.loadClass(ClassLoader.java:247) > > > >>>> > > >> at java.lang.Class.forName0(Native Method) > > > >>>> > > >> at > > > >>>> > > java.lang.Class.forName(Class.java:171) > > > >>>> > > >> at > > > >>>> > > >> > > > >>>> > org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237) > > > >>>> > > >> at > > > >>>> > > >> > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128) > > > >>>> > > >> Jan 19, 2014 7:59:51 PM > org.slf4j.impl.JCLLoggerAdapter warn > > > >>>> > > >> WARNING: No > > > >>>> > > >> > > > >>>> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.prop= s > found > > > >>>> > > on > > > >>>> > > >> classpath, will use command-line arguments only > > > >>>> > > >> Unknown program > > > >>>> > > >> > 'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job' > > > >>>> chosen. > > > >>>> > > >> > > > >>>> > > >> > > > >>>> > > >> ./classify-20newsgroups.sh ->1 [works] > > > >>>> > > >> ./classify-20newsgroups.sh ->2 [works] > > > >>>> > > >> > > > >>>> > > >> > > > >>>> > > >> cluster-reuters.sh ->1 [works] > > > >>>> > > >> > > > >>>> > > cluster-reuters.sh ->2 [works] > > > >>>> > > >> cluster-reuters.sh ->3 [works] > > > >>>> > > >> > > > >>>> > > >> Same error as noted previosly in the thread: > > > >>>> > > >> > > > >>>> > > >> cluster-reuters.sh ->4 [0 clusters] > > > >>>> > > >> > > > >>>> > > >> [...] > > > >>>> > > >> > > > >>>> > > >> WARNING: No qualcluster.props found on classpath, wil= l > use > > > >>>> > > >> command-line arguments only > > > >>>> > > >> Num clusters: 0; maxDistance: 0.000000 > > > >>>> > > >> [Dunn Index] > > > >>>> > > >> First: Infinity > > > >>>> > > >> [Davies-Bouldin Index] First: NaN > > > >>>> > > >> Jan 19, 2014 7:13:57 PM > org.slf4j.impl.JCLLoggerAdapter info > > > >>>> > > >> INFO: Program took 669 ms (Minutes: 0.01115) > > > >>>> > > >> cluster,distance.mean,distance.sd > > > >>>> > > >> > > > >>>> > > > > > >>>> > > > > > >>>> > ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.tra= in > > > >>>> > > >> > > > >>>> > > >> > > > >>>> > > >> > > > >>>> > > >> > > > >>>> > > >> > > > >>>> > > >> > > > >>>> > > >> > Date: Thu, 16 Jan 2014 06:41:09 -0800 > > > >>>> > > >> > From: suneel_marthi@yahoo.com > > > >>>> > > >> > Subject: MAHOUT 0.9 Release - New URL > > > >>>> > > >> > To: user@mahout.apache.org; dev@mahout.apache.org > > > >>>> > > >> > > > > >>>> > > >> > Third time's a Charm!!! > > > >>>> > > >> > > > > >>>> > > >> > > > > >>>> > > >> > Here's the new URL for Mahout 0.9 Release: > > > >>>> > > >> > > > > >>>> > > >> > > > >>>> > > > > > >>>> > https://repository.apache.org/content/repositories/orgapachemahout-1002/o= rg/apache/mahout/mahout-distribution/0.9/ > > > >>>> > > >> > > > > >>>> > > >> > For those volunteering to test this, some of the things > to be > > > >>>> > > verified: > > > >>>> > > >> > > > > >>>> > > >> > a) Verify that u can unpack the release (tar or zip) > > > >>>> > > >> > b) Verify u r able to compile the distro > > > >>>> > > >> > c) Run through the unit tests: mvn clean test > > > >>>> > > >> > d) Run the example scripts > > > >>>> > > >> under $MAHOUT_HOME/examples/bin. Please run through all > the > > > >>>> different > > > >>>> > > >> options in each script. > > > >>>> > > >> > > > > >>>> > > >> > > > > >>>> > > >> > Committers > > > >>>> > > >> > and PMC members: > > > >>>> > > >> > --------------------------------------- > > > >>>> > > >> > > > > >>>> > > >> > Need 'at least 3 +1 votes' for the Release to pass. > > > >>>> > > >> > > > > >>>> > > >> > > > > >>>> > > >> > Thanks and > > > >>>> > > Regards. > > > >>>> > > >> > > > >>>> > > > > > > >>>> > > > > > > >>>> > > > > > >>>> > > > >>> > > > >>> > > > >> > > > > > > > --001a113aaa3e3e895c04f096c2b4--