mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Reinis Vicups (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAHOUT-1486) Streaming KMeans NPE
Date Mon, 24 Mar 2014 14:23:48 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13945154#comment-13945154
] 

Reinis Vicups commented on MAHOUT-1486:
---------------------------------------

This gives NPE in combination with -rskm option (mahout 0.8 I was pointed by Mr Marthi to
use mahout 0.9 so likely this is not relevant for 0.9 anymore):

{code}
mahout streamingkmeans -i /output/tfidf-vectors -o /ticket-text-clusters/output -k 230 -km
900 -rskm -ow --distanceMeasure org.apache.mahout.common.distance.ChebyshevDistanceMeasure
{code}

About number of points. Am not sure how to determine this - I did seqdumper with -c on tfidf-vectors
and get this:

{code}
Count: 328485
{code}

> Streaming KMeans NPE
> --------------------
>
>                 Key: MAHOUT-1486
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1486
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.8
>            Reporter: Reinis Vicups
>            Assignee: Suneel Marthi
>             Fix For: 1.0
>
>
> I am assuming that this occurs because of  --reduceStreamingKMeans (-rskm) option set.
Will try and test it without reduce and report if the NPE goes away.
> Error: java.lang.NullPointerException
>         at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:187)
>         at org.apache.mahout.math.random.WeightedThing.<init>(WeightedThing.java:31)
>         at org.apache.mahout.math.neighborhood.BruteSearch.searchFirst(BruteSearch.java:127)
>         at org.apache.mahout.clustering.ClusteringUtils.estimateDistanceCutoff(ClusteringUtils.java:116)
>         at org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansThread.call(StreamingKMeansThread.java:63)
>         at org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:55)
>         at org.apache.mahout.clustering.streaming.mapreduce.StreamingKMeansReducer.reduce(StreamingKMeansReducer.java:35)
>         at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
>         at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:165)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:160)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message