storm-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kabh...@apache.org
Subject svn commit: r1746371 - in /storm/site/releases/1.0.1: Clojure-DSL.md Common-patterns.md Distributed-RPC.md SECURITY.md Setting-up-a-Storm-cluster.md State-checkpointing.md Transactional-topologies.md Trident-state.md Tutorial.md index.md
Date Wed, 01 Jun 2016 03:41:13 GMT
Author: kabhwan
Date: Wed Jun  1 03:41:12 2016
New Revision: 1746371

URL: http://svn.apache.org/viewvc?rev=1746371&view=rev
Log:
Updated 1.0.1 to include latest code

Modified:
    storm/site/releases/1.0.1/Clojure-DSL.md
    storm/site/releases/1.0.1/Common-patterns.md
    storm/site/releases/1.0.1/Distributed-RPC.md
    storm/site/releases/1.0.1/SECURITY.md
    storm/site/releases/1.0.1/Setting-up-a-Storm-cluster.md
    storm/site/releases/1.0.1/State-checkpointing.md
    storm/site/releases/1.0.1/Transactional-topologies.md
    storm/site/releases/1.0.1/Trident-state.md
    storm/site/releases/1.0.1/Tutorial.md
    storm/site/releases/1.0.1/index.md

Modified: storm/site/releases/1.0.1/Clojure-DSL.md
URL: http://svn.apache.org/viewvc/storm/site/releases/1.0.1/Clojure-DSL.md?rev=1746371&r1=1746370&r2=1746371&view=diff
==============================================================================
--- storm/site/releases/1.0.1/Clojure-DSL.md (original)
+++ storm/site/releases/1.0.1/Clojure-DSL.md Wed Jun  1 03:41:12 2016
@@ -17,7 +17,7 @@ This page outlines all the pieces of the
 
 To define a topology, use the `topology` function. `topology` takes in two arguments: a map
of "spout specs" and a map of "bolt specs". Each spout and bolt spec wires the code for the
component into the topology by specifying things like inputs and parallelism.
 
-Let's take a look at an example topology definition [from the storm-starter project]({{page.git-blob-base}}/examples/storm-starter/src/clj/storm/starter/clj/word_count.clj):
+Let's take a look at an example topology definition [from the storm-starter project]({{page.git-blob-base}}/examples/storm-starter/src/clj/org/apache/storm/starter/clj/word_count.clj):
 
 ```clojure
 (topology
@@ -203,7 +203,7 @@ The signature for `defspout` looks like
 
 If you leave out the option map, it defaults to {:prepare true}. The output declaration for
`defspout` has the same syntax as `defbolt`.
 
-Here's an example `defspout` implementation from [storm-starter]({{page.git-blob-base}}/examples/storm-starter/src/clj/storm/starter/clj/word_count.clj):
+Here's an example `defspout` implementation from [storm-starter]({{page.git-blob-base}}/examples/storm-starter/src/clj/org/apache/storm/starter/clj/word_count.clj):
 
 ```clojure
 (defspout sentence-spout ["sentence"]

Modified: storm/site/releases/1.0.1/Common-patterns.md
URL: http://svn.apache.org/viewvc/storm/site/releases/1.0.1/Common-patterns.md?rev=1746371&r1=1746370&r2=1746371&view=diff
==============================================================================
--- storm/site/releases/1.0.1/Common-patterns.md (original)
+++ storm/site/releases/1.0.1/Common-patterns.md Wed Jun  1 03:41:12 2016
@@ -70,7 +70,7 @@ builder.setBolt("merge", new MergeObject
   .globalGrouping("rank");
 ```
 
-This pattern works because of the fields grouping done by the first bolt which gives the
partitioning you need for this to be semantically correct. You can see an example of this
pattern in storm-starter [here]({{page.git-blob-base}}/examples/storm-starter/src/jvm/storm/starter/RollingTopWords.java).
+This pattern works because of the fields grouping done by the first bolt which gives the
partitioning you need for this to be semantically correct. You can see an example of this
pattern in storm-starter [here]({{page.git-blob-base}}/examples/storm-starter/src/jvm/org/apache/storm/starter/RollingTopWords.java).
 
 If however you have a known skew in the data being processed it can be advantageous to use
partialKeyGrouping instead of fieldsGrouping.  This will distribute the load for each key
between two downstream bolts instead of a single one.
 
@@ -83,7 +83,7 @@ builder.setBolt("merge", new MergeRanksO
   .globalGrouping("rank");
 ``` 
 
-The topology needs an extra layer of processing to aggregate the partial counts from the
upstream bolts but this only processes aggregated values now so the bolt it is not subject
to the load caused by the skewed data. You can see an example of this pattern in storm-starter
[here]({{page.git-blob-base}}/examples/storm-starter/src/jvm/storm/starter/SkewedRollingTopWords.java).
+The topology needs an extra layer of processing to aggregate the partial counts from the
upstream bolts but this only processes aggregated values now so the bolt it is not subject
to the load caused by the skewed data. You can see an example of this pattern in storm-starter
[here]({{page.git-blob-base}}/examples/storm-starter/src/jvm/org/apache/storm/starter/SkewedRollingTopWords.java).
 
 ### TimeCacheMap for efficiently keeping a cache of things that have been recently updated
 

Modified: storm/site/releases/1.0.1/Distributed-RPC.md
URL: http://svn.apache.org/viewvc/storm/site/releases/1.0.1/Distributed-RPC.md?rev=1746371&r1=1746370&r2=1746371&view=diff
==============================================================================
--- storm/site/releases/1.0.1/Distributed-RPC.md (original)
+++ storm/site/releases/1.0.1/Distributed-RPC.md Wed Jun  1 03:41:12 2016
@@ -118,7 +118,7 @@ The reach of a URL is the number of uniq
 
 A single reach computation can involve thousands of database calls and tens of millions of
follower records during the computation. It's a really, really intense computation. As you're
about to see, implementing this function on top of Storm is dead simple. On a single machine,
reach can take minutes to compute; on a Storm cluster, you can compute reach for even the
hardest URLs in a couple seconds.
 
-A sample reach topology is defined in storm-starter [here]({{page.git-blob-base}}/examples/storm-starter/src/jvm/storm/starter/ReachTopology.java).
Here's how you define the reach topology:
+A sample reach topology is defined in storm-starter [here]({{page.git-blob-base}}/examples/storm-starter/src/jvm/org/apache/storm/starter/ReachTopology.java).
Here's how you define the reach topology:
 
 ```java
 LinearDRPCTopologyBuilder builder = new LinearDRPCTopologyBuilder("reach");

Modified: storm/site/releases/1.0.1/SECURITY.md
URL: http://svn.apache.org/viewvc/storm/site/releases/1.0.1/SECURITY.md?rev=1746371&r1=1746370&r2=1746371&view=diff
==============================================================================
--- storm/site/releases/1.0.1/SECURITY.md (original)
+++ storm/site/releases/1.0.1/SECURITY.md Wed Jun  1 03:41:12 2016
@@ -391,7 +391,7 @@ A storm client may submit requests on be
 it can do so by leveraging the impersonation feature.In order to submit topology as some
other user , you can use `StormSubmitter.submitTopologyAs` API. Alternatively you can use
`NimbusClient.getConfiguredClientAs` 
 to get a nimbus client as some other user and perform any nimbus action(i.e. kill/rebalance/activate/deactivate)
using this client. 
 
-To ensure only authorized users can perform impersonation you should start nimbus with `nimbus.impersonation.authorizer`
set to `org.apache.storm.security.auth.authorizer.ImpersonationAuthorizer`. 
+Impersonation authorization is disabled by default which means any user can perform impersonation.
To ensure only authorized users can perform impersonation you should start nimbus with `nimbus.impersonation.authorizer`
set to `org.apache.storm.security.auth.authorizer.ImpersonationAuthorizer`.
 The `ImpersonationAuthorizer` uses `nimbus.impersonation.acl` as the acl to authorize users.
Following is a sample nimbus config for supporting impersonation:
 
 ```yaml

Modified: storm/site/releases/1.0.1/Setting-up-a-Storm-cluster.md
URL: http://svn.apache.org/viewvc/storm/site/releases/1.0.1/Setting-up-a-Storm-cluster.md?rev=1746371&r1=1746370&r2=1746371&view=diff
==============================================================================
--- storm/site/releases/1.0.1/Setting-up-a-Storm-cluster.md (original)
+++ storm/site/releases/1.0.1/Setting-up-a-Storm-cluster.md Wed Jun  1 03:41:12 2016
@@ -28,7 +28,7 @@ A few notes about Zookeeper deployment:
 
 Next you need to install Storm's dependencies on Nimbus and the worker machines. These are:
 
-1. Java 6
+1. Java 7
 2. Python 2.6.6
 
 These are the versions of the dependencies that have been tested with Storm. Storm may or
may not work with different versions of Java and/or Python.

Modified: storm/site/releases/1.0.1/State-checkpointing.md
URL: http://svn.apache.org/viewvc/storm/site/releases/1.0.1/State-checkpointing.md?rev=1746371&r1=1746370&r2=1746371&view=diff
==============================================================================
--- storm/site/releases/1.0.1/State-checkpointing.md (original)
+++ storm/site/releases/1.0.1/State-checkpointing.md Wed Jun  1 03:41:12 2016
@@ -94,6 +94,9 @@ is saved and then the checkpoint tuple i
 streams before it saves its state so that the state represents a consistent state across
the topology. Once the checkpoint spout receives
 ACK from all the bolts, the state commit is complete and the transaction is recorded as committed
by the checkpoint spout.
 
+The state checkpointing does not currently checkpoint the state of the spout. Yet, once the
state of all bolts are checkpointed, and once the checkpoint tuples are acked, the tuples
emitted by the spout are also acked. 
+It also implies that `topology.state.checkpoint.interval.ms` is lower than `topology.message.timeout.secs`.

+
 The state commit works like a three phase commit protocol with a prepare and commit phase
so that the state across the topology is saved
 in a consistent and atomic manner.
 

Modified: storm/site/releases/1.0.1/Transactional-topologies.md
URL: http://svn.apache.org/viewvc/storm/site/releases/1.0.1/Transactional-topologies.md?rev=1746371&r1=1746370&r2=1746371&view=diff
==============================================================================
--- storm/site/releases/1.0.1/Transactional-topologies.md (original)
+++ storm/site/releases/1.0.1/Transactional-topologies.md Wed Jun  1 03:41:12 2016
@@ -81,7 +81,7 @@ Finally, another thing to note is that t
 
 ## The basics through example
 
-You build transactional topologies by using [TransactionalTopologyBuilder](javadocs/org/apache/storm/transactional/TransactionalTopologyBuilder.html).
Here's the transactional topology definition for a topology that computes the global count
of tuples from the input stream. This code comes from [TransactionalGlobalCount]({{page.git-blob-base}}/examples/storm-starter/src/jvm/storm/starter/TransactionalGlobalCount.java)
in storm-starter.
+You build transactional topologies by using [TransactionalTopologyBuilder](javadocs/org/apache/storm/transactional/TransactionalTopologyBuilder.html).
Here's the transactional topology definition for a topology that computes the global count
of tuples from the input stream. This code comes from [TransactionalGlobalCount]({{page.git-blob-base}}/examples/storm-starter/src/jvm/org/apache/storm/starter/TransactionalGlobalCount.java)
in storm-starter.
 
 ```java
 MemoryTransactionalSpout spout = new MemoryTransactionalSpout(DATA, new Fields("word"), PARTITION_TAKE_PER_BATCH);
@@ -201,7 +201,7 @@ First, notice that this bolt implements
 
 The code for `finishBatch` in `UpdateGlobalCount` gets the current value from the database
and compares its transaction id to the transaction id for this batch. If they are the same,
it does nothing. Otherwise, it increments the value in the database by the partial count for
this batch.
 
-A more involved transactional topology example that updates multiple databases idempotently
can be found in storm-starter in the [TransactionalWords]({{page.git-blob-base}}/examples/storm-starter/src/jvm/storm/starter/TransactionalWords.java)
class.
+A more involved transactional topology example that updates multiple databases idempotently
can be found in storm-starter in the [TransactionalWords]({{page.git-blob-base}}/examples/storm-starter/src/jvm/org/apache/storm/starter/TransactionalWords.java)
class.
 
 ## Transactional Topology API
 
@@ -255,7 +255,7 @@ The details of implementing a `Transacti
 
 #### Partitioned Transactional Spout
 
-A common kind of transactional spout is one that reads the batches from a set of partitions
across many queue brokers. For example, this is how [TransactionalKafkaSpout]({{page.git-tree-base}}/external/storm-kafka/src/jvm/storm/kafka/TransactionalKafkaSpout.java)
works. An `IPartitionedTransactionalSpout` automates the bookkeeping work of managing the
state for each partition to ensure idempotent replayability. See [the Javadoc](javadocs/org/apache/storm/transactional/partitioned/IPartitionedTransactionalSpout.html)
for more details.
+A common kind of transactional spout is one that reads the batches from a set of partitions
across many queue brokers. For example, this is how [TransactionalKafkaSpout]({{page.git-tree-base}}/external/storm-kafka/src/jvm/org/apache/storm/kafka/TransactionalKafkaSpout.java)
works. An `IPartitionedTransactionalSpout` automates the bookkeeping work of managing the
state for each partition to ensure idempotent replayability. See [the Javadoc](javadocs/org/apache/storm/transactional/partitioned/IPartitionedTransactionalSpout.html)
for more details.
 
 ### Configuration
 
@@ -325,7 +325,7 @@ In this scenario, tuples 41-50 are skipp
 
 By failing all subsequent transactions on failure, no tuples are skipped. This also shows
that a requirement of transactional spouts is that they always emit where the last transaction
left off.
 
-A non-idempotent transactional spout is more concisely referred to as an "OpaqueTransactionalSpout"
(opaque is the opposite of idempotent). [IOpaquePartitionedTransactionalSpout](javadocs/org/apache/storm/transactional/partitioned/IOpaquePartitionedTransactionalSpout.html)
is an interface for implementing opaque partitioned transactional spouts, of which [OpaqueTransactionalKafkaSpout]({{page.git-tree-base}}/external/storm-kafka/src/jvm/storm/kafka/OpaqueTransactionalKafkaSpout.java)
is an example. `OpaqueTransactionalKafkaSpout` can withstand losing individual Kafka nodes
without sacrificing accuracy as long as you use the update strategy as explained in this section.
+A non-idempotent transactional spout is more concisely referred to as an "OpaqueTransactionalSpout"
(opaque is the opposite of idempotent). [IOpaquePartitionedTransactionalSpout](javadocs/org/apache/storm/transactional/partitioned/IOpaquePartitionedTransactionalSpout.html)
is an interface for implementing opaque partitioned transactional spouts, of which [OpaqueTransactionalKafkaSpout]({{page.git-tree-base}}/external/storm-kafka/src/jvm/org/apache/storm/kafka/OpaqueTransactionalKafkaSpout.java)
is an example. `OpaqueTransactionalKafkaSpout` can withstand losing individual Kafka nodes
without sacrificing accuracy as long as you use the update strategy as explained in this section.
 
 ## Implementation
 

Modified: storm/site/releases/1.0.1/Trident-state.md
URL: http://svn.apache.org/viewvc/storm/site/releases/1.0.1/Trident-state.md?rev=1746371&r1=1746370&r2=1746371&view=diff
==============================================================================
--- storm/site/releases/1.0.1/Trident-state.md (original)
+++ storm/site/releases/1.0.1/Trident-state.md Wed Jun  1 03:41:12 2016
@@ -28,7 +28,7 @@ Remember, Trident processes tuples as sm
 2. There's no overlap between batches of tuples (tuples are in one batch or another, never
multiple).
 3. Every tuple is in a batch (no tuples are skipped)
 
-This is a pretty easy type of spout to understand, the stream is divided into fixed batches
that never change. storm-contrib has [an implementation of a transactional spout]({{page.git-tree-base}}/external/storm-kafka/src/jvm/storm/kafka/trident/TransactionalTridentKafkaSpout.java)
for Kafka.
+This is a pretty easy type of spout to understand, the stream is divided into fixed batches
that never change. storm-contrib has [an implementation of a transactional spout]({{page.git-tree-base}}/external/storm-kafka/src/jvm/org/apache/storm/kafka/trident/TransactionalTridentKafkaSpout.java)
for Kafka.
 
 You might be wondering – why wouldn't you just always use a transactional spout? They're
simple and easy to understand. One reason you might not use one is because they're not necessarily
very fault-tolerant. For example, the way TransactionalTridentKafkaSpout works is the batch
for a txid will contain tuples from all the Kafka partitions for a topic. Once a batch has
been emitted, any time that batch is re-emitted in the future the exact same set of tuples
must be emitted to meet the semantics of transactional spouts. Now suppose a batch is emitted
from TransactionalTridentKafkaSpout, the batch fails to process, and at the same time one
of the Kafka nodes goes down. You're now incapable of replaying the same batch as you did
before (since the node is down and some partitions for the topic are not unavailable), and
processing will halt. 
 
@@ -72,7 +72,7 @@ As described before, an opaque transacti
 
 1. Every tuple is *successfully* processed in exactly one batch. However, it's possible for
a tuple to fail to process in one batch and then succeed to process in a later batch.
 
-[OpaqueTridentKafkaSpout]({{page.git-tree-base}}/external/storm-kafka/src/jvm/storm/kafka/trident/OpaqueTridentKafkaSpout.java)
is a spout that has this property and is fault-tolerant to losing Kafka nodes. Whenever it's
time for OpaqueTridentKafkaSpout to emit a batch, it emits tuples starting from where the
last batch finished emitting. This ensures that no tuple is ever skipped or successfully processed
by multiple batches.
+[OpaqueTridentKafkaSpout]({{page.git-tree-base}}/external/storm-kafka/src/jvm/org/apache/storm/kafka/trident/OpaqueTridentKafkaSpout.java)
is a spout that has this property and is fault-tolerant to losing Kafka nodes. Whenever it's
time for OpaqueTridentKafkaSpout to emit a batch, it emits tuples starting from where the
last batch finished emitting. This ensures that no tuple is ever skipped or successfully processed
by multiple batches.
 
 With opaque transactional spouts, it's no longer possible to use the trick of skipping state
updates if the transaction id in the database is the same as the transaction id for the current
batch. This is because the batch may have changed between state updates.
 

Modified: storm/site/releases/1.0.1/Tutorial.md
URL: http://svn.apache.org/viewvc/storm/site/releases/1.0.1/Tutorial.md?rev=1746371&r1=1746370&r2=1746371&view=diff
==============================================================================
--- storm/site/releases/1.0.1/Tutorial.md (original)
+++ storm/site/releases/1.0.1/Tutorial.md Wed Jun  1 03:41:12 2016
@@ -245,7 +245,7 @@ A stream grouping tells a topology how t
 
 When a task for Bolt A emits a tuple to Bolt B, which task should it send the tuple to?
 
-A "stream grouping" answers this question by telling Storm how to send tuples between sets
of tasks. Before we dig into the different kinds of stream groupings, let's take a look at
another topology from [storm-starter](http://github.com/apache/storm/blob/{{page.version}}/examples/storm-starter).
This [WordCountTopology]({{page.git-blob-base}}/examples/storm-starter/src/jvm/storm/starter/WordCountTopology.java)
reads sentences off of a spout and streams out of `WordCountBolt` the total number of times
it has seen that word before:
+A "stream grouping" answers this question by telling Storm how to send tuples between sets
of tasks. Before we dig into the different kinds of stream groupings, let's take a look at
another topology from [storm-starter](http://github.com/apache/storm/blob/{{page.version}}/examples/storm-starter).
This [WordCountTopology]({{page.git-blob-base}}/examples/storm-starter/src/jvm/org/apache/storm/starter/WordCountTopology.java)
reads sentences off of a spout and streams out of `WordCountBolt` the total number of times
it has seen that word before:
 
 ```java
 TopologyBuilder builder = new TopologyBuilder();

Modified: storm/site/releases/1.0.1/index.md
URL: http://svn.apache.org/viewvc/storm/site/releases/1.0.1/index.md?rev=1746371&r1=1746370&r2=1746371&view=diff
==============================================================================
--- storm/site/releases/1.0.1/index.md (original)
+++ storm/site/releases/1.0.1/index.md Wed Jun  1 03:41:12 2016
@@ -3,6 +3,19 @@ title: Documentation
 layout: documentation
 documentation: true
 ---
+
+
+> #### NOTE
+
+> In the latest version, the class packages have been changed from "backtype.storm" to
"org.apache.storm" so the topology code compiled with older version won't run on the Storm
1.0.0 just like that. Backward compatibility is available through following configuration

+
+> `client.jartransformer.class: "org.apache.storm.hack.StormShadeTransformer"`
+
+> You need to add the above config in storm installation if you want to run the code compiled
with older versions of storm. The config should be added in the machine you use to submit
your topologies.
+
+> Refer to https://issues.apache.org/jira/browse/STORM-1202 for more details. 
+
+
 ### Basics of Storm
 
 * [Javadoc](javadocs/index.html)



Mime
View raw message