I am trying to cluster documents stored in a lucene index using the command line tools. How can I obtain the original document IDs from the clustering output? Here is the sequence of commands I am using: ./mahout lucene.vector --dir $index_path --output /tmp/mahout/vector --field content --dictOut /tmp/mahout/dict --idField _uid -md 2 -w TFIDF -x 70 ./mahout canopy -i /tmp/mahout/vector -o /tmp/mahout_canopy -dm org.apache.mahout.common.distance.CosineDistanceMeasure --t1 10 --t2 5 ./mahout kmeans -i /tmp/mahout/vector -c /tmp/mahout_canopy/clusters-0-final/part-r-00000 -o /tmp/mahout_kmeans -dm org.apache.mahout.common.distance.CosineDistanceMeasure -k 20 -x 20 -cd 0.1 ./mahout clusterdump -dt text -d /tmp/mahout/dict -s /tmp/mahout_kmeans/clusters-1-final/ -b 20 -n 20 A similar question was asked on this thread [1], but I did not see a resolution. Thanks in advance for your help! - Ben [1] http://mail-archives.apache.org/mod_mbox/mahout-user/201204.mbox/%3CCA+y9ocWgS2se7dOqQrsE3p+QE5GVXCt8XUTucFdZvGkJkPOaew@mail.gmail.com%3E