Return-Path: X-Original-To: apmail-mahout-commits-archive@www.apache.org Delivered-To: apmail-mahout-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CD71417E87 for ; Fri, 29 May 2015 19:07:25 +0000 (UTC) Received: (qmail 45164 invoked by uid 500); 29 May 2015 19:07:23 -0000 Delivered-To: apmail-mahout-commits-archive@mahout.apache.org Received: (qmail 45041 invoked by uid 500); 29 May 2015 19:07:23 -0000 Mailing-List: contact commits-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mahout.apache.org Delivered-To: mailing list commits@mahout.apache.org Received: (qmail 44387 invoked by uid 99); 29 May 2015 19:07:23 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 May 2015 19:07:23 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 3654DE03C7; Fri, 29 May 2015 19:07:23 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: dlyubimov@apache.org To: commits@mahout.apache.org Date: Fri, 29 May 2015 19:07:36 -0000 Message-Id: <4c638792488c46c1aa0d8eed8fe47675@git.apache.org> In-Reply-To: References: X-Mailer: ASF-Git Admin Mailer Subject: [15/26] mahout git commit: mahout-spark docs. http://git-wip-us.apache.org/repos/asf/mahout/blob/21e5ddb7/docs/mahout-spark/org/apache/mahout/drivers/RowSimilarityDriver$.html ---------------------------------------------------------------------- diff --git a/docs/mahout-spark/org/apache/mahout/drivers/RowSimilarityDriver$.html b/docs/mahout-spark/org/apache/mahout/drivers/RowSimilarityDriver$.html new file mode 100644 index 0000000..1f3db87 --- /dev/null +++ b/docs/mahout-spark/org/apache/mahout/drivers/RowSimilarityDriver$.html @@ -0,0 +1,563 @@ + + + + + RowSimilarityDriver - Mahout Spark bindings 0.10.0 API - org.apache.mahout.drivers.RowSimilarityDriver + + + + + + + + + + +
+ +

org.apache.mahout.drivers

+

RowSimilarityDriver

+
+ +

+ + + object + + + RowSimilarityDriver extends MahoutSparkDriver + +

+ +

Command line interface for ). +Reads a text delimited file containing rows of a org.apache.mahout.math.indexeddataset.IndexedDataset +with domain specific IDS of the form (row id, column id: strength, ...). The IDs will be preserved in the +output. The rows define a matrix and ) +will be used to calculate row-wise similarity using log-likelihood. The options allow control of the input +schema, file discovery, output schema, and control of algorithm parameters.

To get help run

mahout spark-rowsimilarity

for a full explanation of options. The default +values for formatting will read (rowID<tab>columnID1:strength1<space>columnID2:strength2....) +and write (rowID<tab>rowID1:strength1<space>rowID2:strength2....) +Each output line will contain a row ID and similar columns sorted by LLR strength descending. +mahout spark-rowsimilarity +}}} +values for formatting will read (rowID<tab>columnID1:strength1<space>columnID2:strength2....) +and write (rowID<tab>rowID1:strength1<space>rowID2:strength2....) +Each output line will contain a row ID and similar columns sorted by LLR strength descending.

Note

To use with a Spark cluster see the --master option, if you run out of heap space check + the --sparkExecutorMemory option. +

+ Linear Supertypes +
MahoutSparkDriver, MahoutDriver, AnyRef, Any
+
+ + +
+
+
+ Ordering +
    + +
  1. Alphabetic
  2. +
  3. By inheritance
  4. +
+
+
+ Inherited
+
+
    +
  1. RowSimilarityDriver
  2. MahoutSparkDriver
  3. MahoutDriver
  4. AnyRef
  5. Any
  6. +
+
+ +
    +
  1. Hide All
  2. +
  3. Show all
  4. +
+ Learn more about member selection +
+
+ Visibility +
  1. Public
  2. All
+
+
+ +
+
+ + + + + + +
+

Value Members

+
  1. + + +

    + + final + def + + + !=(arg0: AnyRef): Boolean + +

    +
    Definition Classes
    AnyRef
    +
  2. + + +

    + + final + def + + + !=(arg0: Any): Boolean + +

    +
    Definition Classes
    Any
    +
  3. + + +

    + + final + def + + + ##(): Int + +

    +
    Definition Classes
    AnyRef → Any
    +
  4. + + +

    + + final + def + + + ==(arg0: AnyRef): Boolean + +

    +
    Definition Classes
    AnyRef
    +
  5. + + +

    + + final + def + + + ==(arg0: Any): Boolean + +

    +
    Definition Classes
    Any
    +
  6. + + +

    + + + var + + + _useExistingContext: Boolean + +

    +
    Definition Classes
    MahoutDriver
    +
  7. + + +

    + + final + def + + + asInstanceOf[T0]: T0 + +

    +
    Definition Classes
    Any
    +
  8. + + +

    + + + def + + + clone(): AnyRef + +

    +
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    + @throws( + + ... + ) + +
    +
  9. + + +

    + + final + def + + + eq(arg0: AnyRef): Boolean + +

    +
    Definition Classes
    AnyRef
    +
  10. + + +

    + + + def + + + equals(arg0: Any): Boolean + +

    +
    Definition Classes
    AnyRef → Any
    +
  11. + + +

    + + + def + + + finalize(): Unit + +

    +
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    + @throws( + + classOf[java.lang.Throwable] + ) + +
    +
  12. + + +

    + + final + def + + + getClass(): Class[_] + +

    +
    Definition Classes
    AnyRef → Any
    +
  13. + + +

    + + + def + + + hashCode(): Int + +

    +
    Definition Classes
    AnyRef → Any
    +
  14. + + +

    + + final + def + + + isInstanceOf[T0]: Boolean + +

    +
    Definition Classes
    Any
    +
  15. + + +

    + + + def + + + main(args: Array[String]): Unit + +

    +

    Entry point, not using Scala App trait

    Entry point, not using Scala App trait

    args

    Command line args, if empty a help message is printed. +

    Definition Classes
    RowSimilarityDriver → MahoutDriver
    +
  16. + + +

    + + implicit + var + + + mc: DistributedContext + +

    +
    Attributes
    protected
    Definition Classes
    MahoutDriver
    +
  17. + + +

    + + final + def + + + ne(arg0: AnyRef): Boolean + +

    +
    Definition Classes
    AnyRef
    +
  18. + + +

    + + final + def + + + notify(): Unit + +

    +
    Definition Classes
    AnyRef
    +
  19. + + +

    + + final + def + + + notifyAll(): Unit + +

    +
    Definition Classes
    AnyRef
    +
  20. + + +

    + + implicit + var + + + parser: MahoutOptionParser + +

    +
    Attributes
    protected
    Definition Classes
    MahoutDriver
    +
  21. + + +

    + + + def + + + process(): Unit + +

    +
    Definition Classes
    RowSimilarityDriver → MahoutDriver
    +
  22. + + +

    + + implicit + var + + + sparkConf: SparkConf + +

    +
    Definition Classes
    MahoutSparkDriver
    +
  23. + + +

    + + + def + + + start(): Unit + +

    +

    Creates a Spark context to run the job inside.

    Creates a Spark context to run the job inside. +Override to set the SparkConf values specific to the job, +these must be set before the context is created. +

    Attributes
    protected
    Definition Classes
    RowSimilarityDriverMahoutSparkDriver → MahoutDriver
    +
  24. + + +

    + + + def + + + stop(): Unit + +

    +
    Attributes
    protected
    Definition Classes
    MahoutDriver
    +
  25. + + +

    + + final + def + + + synchronized[T0](arg0: ⇒ T0): T0 + +

    +
    Definition Classes
    AnyRef
    +
  26. + + +

    + + + def + + + toString(): String + +

    +
    Definition Classes
    AnyRef → Any
    +
  27. + + +

    + + + def + + + useContext(context: DistributedContext): Unit + +

    +

    Call this before start to use an existing context as when running multiple drivers from a scalatest suite.

    Call this before start to use an existing context as when running multiple drivers from a scalatest suite.

    context

    An already set up context to run against +

    Definition Classes
    MahoutSparkDriver
    +
  28. + + +

    + + final + def + + + wait(): Unit + +

    +
    Definition Classes
    AnyRef
    Annotations
    + @throws( + + ... + ) + +
    +
  29. + + +

    + + final + def + + + wait(arg0: Long, arg1: Int): Unit + +

    +
    Definition Classes
    AnyRef
    Annotations
    + @throws( + + ... + ) + +
    +
  30. + + +

    + + final + def + + + wait(arg0: Long): Unit + +

    +
    Definition Classes
    AnyRef
    Annotations
    + @throws( + + ... + ) + +
    +
+
+ + + + +
+ +
+
+

Inherited from MahoutSparkDriver

+
+

Inherited from MahoutDriver

+
+

Inherited from AnyRef

+
+

Inherited from Any

+
+ +
+ +
+
+

Ungrouped

+ +
+
+ +
+ +
+ + + + + \ No newline at end of file http://git-wip-us.apache.org/repos/asf/mahout/blob/21e5ddb7/docs/mahout-spark/org/apache/mahout/drivers/TDIndexedDatasetReader.html ---------------------------------------------------------------------- diff --git a/docs/mahout-spark/org/apache/mahout/drivers/TDIndexedDatasetReader.html b/docs/mahout-spark/org/apache/mahout/drivers/TDIndexedDatasetReader.html new file mode 100644 index 0000000..cf4d375 --- /dev/null +++ b/docs/mahout-spark/org/apache/mahout/drivers/TDIndexedDatasetReader.html @@ -0,0 +1,519 @@ + + + + + TDIndexedDatasetReader - Mahout Spark bindings 0.10.0 API - org.apache.mahout.drivers.TDIndexedDatasetReader + + + + + + + + + + +
+ +

org.apache.mahout.drivers

+

TDIndexedDatasetReader

+
+ +

+ + + trait + + + TDIndexedDatasetReader extends Reader[IndexedDatasetSpark] + +

+ +

Extends Reader trait to supply the org.apache.mahout.sparkbindings.indexeddataset.IndexedDatasetSpark as +the type read and a element and row reader functions for reading text delimited files as described in the +org.apache.mahout.math.indexeddataset.Schema +

+ Linear Supertypes +
Reader[IndexedDatasetSpark], AnyRef, Any
+
+ + +
+
+
+ Ordering +
    + +
  1. Alphabetic
  2. +
  3. By inheritance
  4. +
+
+
+ Inherited
+
+
    +
  1. TDIndexedDatasetReader
  2. Reader
  3. AnyRef
  4. Any
  5. +
+
+ +
    +
  1. Hide All
  2. +
  3. Show all
  4. +
+ Learn more about member selection +
+
+ Visibility +
  1. Public
  2. All
+
+
+ +
+
+ + + + +
+

Abstract Value Members

+
  1. + + +

    + + abstract + val + + + mc: DistributedContext + +

    +
    Definition Classes
    Reader
    +
  2. + + +

    + + abstract + val + + + readSchema: Schema + +

    +
    Definition Classes
    Reader
    +
+
+ +
+

Concrete Value Members

+
  1. + + +

    + + final + def + + + !=(arg0: AnyRef): Boolean + +

    +
    Definition Classes
    AnyRef
    +
  2. + + +

    + + final + def + + + !=(arg0: Any): Boolean + +

    +
    Definition Classes
    Any
    +
  3. + + +

    + + final + def + + + ##(): Int + +

    +
    Definition Classes
    AnyRef → Any
    +
  4. + + +

    + + final + def + + + ==(arg0: AnyRef): Boolean + +

    +
    Definition Classes
    AnyRef
    +
  5. + + +

    + + final + def + + + ==(arg0: Any): Boolean + +

    +
    Definition Classes
    Any
    +
  6. + + +

    + + final + def + + + asInstanceOf[T0]: T0 + +

    +
    Definition Classes
    Any
    +
  7. + + +

    + + + def + + + clone(): AnyRef + +

    +
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    + @throws( + + ... + ) + +
    +
  8. + + +

    + + + def + + + elementReader(mc: DistributedContext, readSchema: Schema, source: String, existingRowIDs: BiMap[String, Int] = HashBiMap.create()): IndexedDatasetSpark + +

    +

    Read in text delimited elements from all URIs in the comma delimited source String and return +the DRM of all elements updating the dictionaries for row and column dictionaries.

    Read in text delimited elements from all URIs in the comma delimited source String and return +the DRM of all elements updating the dictionaries for row and column dictionaries. If there is +no strength value in the element, assume it's presence means a strength of 1.

    mc

    context for the Spark job

    readSchema

    describes the delimiters and positions of values in the text delimited file.

    source

    comma delimited URIs of text files to be read from

    returns

    a new org.apache.mahout.sparkbindings.indexeddataset.IndexedDatasetSpark +

    Attributes
    protected
    Definition Classes
    TDIndexedDatasetReader → Reader
    +
  9. + + +

    + + final + def + + + eq(arg0: AnyRef): Boolean + +

    +
    Definition Classes
    AnyRef
    +
  10. + + +

    + + + def + + + equals(arg0: Any): Boolean + +

    +
    Definition Classes
    AnyRef → Any
    +
  11. + + +

    + + + def + + + finalize(): Unit + +

    +
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    + @throws( + + classOf[java.lang.Throwable] + ) + +
    +
  12. + + +

    + + final + def + + + getClass(): Class[_] + +

    +
    Definition Classes
    AnyRef → Any
    +
  13. + + +

    + + + def + + + hashCode(): Int + +

    +
    Definition Classes
    AnyRef → Any
    +
  14. + + +

    + + final + def + + + isInstanceOf[T0]: Boolean + +

    +
    Definition Classes
    Any
    +
  15. + + +

    + + final + def + + + ne(arg0: AnyRef): Boolean + +

    +
    Definition Classes
    AnyRef
    +
  16. + + +

    + + final + def + + + notify(): Unit + +

    +
    Definition Classes
    AnyRef
    +
  17. + + +

    + + final + def + + + notifyAll(): Unit + +

    +
    Definition Classes
    AnyRef
    +
  18. + + +

    + + + def + + + readElementsFrom(source: String, existingRowIDs: BiMap[String, Int]): IndexedDatasetSpark + +

    +
    Definition Classes
    Reader
    +
  19. + + +

    + + + def + + + readRowsFrom(source: String, existingRowIDs: BiMap[String, Int]): IndexedDatasetSpark + +

    +
    Definition Classes
    Reader
    +
  20. + + +

    + + + def + + + rowReader(mc: DistributedContext, readSchema: Schema, source: String, existingRowIDs: BiMap[String, Int] = HashBiMap.create()): IndexedDatasetSpark + +

    +

    Read in text delimited rows from all URIs in this comma delimited source String and return +the DRM of all elements updating the dictionaries for row and column dictionaries.

    Read in text delimited rows from all URIs in this comma delimited source String and return +the DRM of all elements updating the dictionaries for row and column dictionaries. If there is +no strength value in the element, assume it's presence means a strength of 1.

    mc

    context for the Spark job

    readSchema

    describes the delimiters and positions of values in the text delimited file.

    source

    comma delimited URIs of text files to be read into the IndexedDatasetSpark

    returns

    a new org.apache.mahout.sparkbindings.indexeddataset.IndexedDatasetSpark +

    Attributes
    protected
    Definition Classes
    TDIndexedDatasetReader → Reader
    +
  21. + + +

    + + final + def + + + synchronized[T0](arg0: ⇒ T0): T0 + +

    +
    Definition Classes
    AnyRef
    +
  22. + + +

    + + + def + + + toString(): String + +

    +
    Definition Classes
    AnyRef → Any
    +
  23. + + +

    + + final + def + + + wait(): Unit + +

    +
    Definition Classes
    AnyRef
    Annotations
    + @throws( + + ... + ) + +
    +
  24. + + +

    + + final + def + + + wait(arg0: Long, arg1: Int): Unit + +

    +
    Definition Classes
    AnyRef
    Annotations
    + @throws( + + ... + ) + +
    +
  25. + + +

    + + final + def + + + wait(arg0: Long): Unit + +

    +
    Definition Classes
    AnyRef
    Annotations
    + @throws( + + ... + ) + +
    +
+
+ + + + +
+ +
+
+

Inherited from Reader[IndexedDatasetSpark]

+
+

Inherited from AnyRef

+
+

Inherited from Any

+
+ +
+ +
+
+

Ungrouped

+ +
+
+ +
+ +
+ + + + + \ No newline at end of file