Mean Shift accumulates the pointIds of every point assigned to a cluster, so I would expect n= to be correct in the cluster dumper output. It is most likely the postprocessor is misbehaving. Please create a JIRA and attach your dataset and we will take a look at it. It would also be useful for you to include the exact CLI commands which you used to duplicate this problem. On 1/25/12 2:41 AM, gaurav redkar wrote: > Hello, > > I was able to rectify the afore-mentioned problem after i implemented a > custom partitioner instead of using the default hash partitioner. I have > another issue though. After running the post processor the number of points > that each cluster contains is not matching the number of points each > cluster should contain as stated by clusterdumper. > > > MSV-287{ n=90 c=[0.05195, 0.05675, 0.07151, 0.05713, 0.06946,...} > > MSV-145{ n=90 c=[0.93685, 0.93071, 0.93641, 0.94629, 0.94409,..} > the n mentioned in clusters-n-final against each cluster is different from > the number of points actually contained in d directory for each cluster. > Any idea why is this happening ...? > > PS: the dataset on which i tested the algorithm has 1000 records with 200 > attributes per record. I can share the dataset that i have used if needed. > > Thanks, > > Gaurav > > On Fri, Jan 6, 2012 at 6:12 PM, Paritosh Ranjan wrote: > >> ClusterOutputProcessorDriver has options to run either sequentially or in >> a mapreduce way. >> >> If the clustering was done sequetially, then ClusterOutputProcessor should >> be run sequentially, and if the clustering was done in a mapreduce way, >> then run the ClusterOutputPostProcessor with option mapreduce=true. >> >> If you have already tried this, and its still now working, then filing a >> bug (as Lance mentioned) would be appropriate. >> >> >> On 06-01-2012 17:18, gaurav redkar wrote: >> >>> Hello, >>> wen I ran the ClusterOutputPostProcessor on synthetic_control_data in >>> mapreduce mode, I observed that one directory contained points belonging to >>> 2 other clusters and the directories relating to those 2 clusters were not >>> created as their "part- *" files were empty and the function "** >>> movePartFilesToRespectiveDirec**tories()" was not able to create the >>> directories to put them into. I have converted the sequence file containing >>> the points belonging to those 3 clusters into text file(by changing the >>> output format to TextOutputFormat). Kindly find the attached part-file >>> which can be viewed. >>> Any suggestions as to why this might be happening...? >>> Note: The program runs fine in sequential mode. >>> Thanks. >>> >>> >>> No virus found in this message. >>> Checked by AVG - www.avg.com >>> Version: 10.0.1416 / Virus Database: 2109/4125 - Release Date: 01/05/12 >>> >>>