incubator-blur-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Blur Wiki] Update of "GettingStarted" by AaronMcCurry
Date Sun, 30 Sep 2012 23:40:38 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Blur Wiki" for change notification.

The "GettingStarted" page has been changed by AaronMcCurry:
http://wiki.apache.org/blur/GettingStarted

New page:
== Getting Started ==

=== Clone ===

First clone the project and compile the project using Maven.  Once this is complete the blur
libraries and dependences will be copied into the lib directory.

=== Zookeeper Setup ===

Setup [Zookeeper][Zookeeper].  It is recommended that all production setups use a clustered
Zookeeper environment, following best [practices][replicated_zk].

=== Hadoop Setup ===

Blur requires Hadoop to be installed because of library dependencies, but running the Hadoop
daemons on the servers is optional.

=== HDFS Notes ===

If you are running Blur on a single machine this is not necessary, but [single node][single_node]
setup is still required for libraries.

Setup Hadoop's HDFS filesystem, which is required for clustered setup.  Though possible, the
Map/Reduce system is not recommended to be run on the same machines the are running the Blur
daemons.  Follow the Hadoop [cluster setup][cluster_setup] guide.

=== HDFS Options ===

HDFS is not required to be installed and running on the same servers as Blur.  However if
the source HDFS is being used for heavy Map/Reduce or any other heavy I/O operations, performance
could be affected.  The storage location for each table is setup independently and via a URI
location (e.g. hdfs://&lt;namenode&gt;:&lt;port&gt;/blur/tables/table/path).
 So there may be several tables online in a Blur cluster and each one could reference a different
HDFS instance.  This assumes that all the HDFS instances are compatible with one another.

=== blur-env.sh Configuration ===

Next you will need to configure the `config/blur-env.sh` file.  The two exports that are required:

export JAVA_HOME=/usr/lib/j2sdk1.6-sun
export HADOOP_HOME=/var/hadoop-0.20.2

=== blur.properties Configuration ===

Then you will need to setup the `config/blur-site.properties` file.  The default site configuration:

    blur.zookeeper.connection=localhost
    blur.cluster.name=default

There are many other options in that can be set, see `config/blur-default.properties`


=== shards ===

Then in the `config/shards` list the servers that should run as blur shard servers.  By default
shard servers run on port `40020` and bind to the `0.0.0.0` address.

    shard1
    shard2
    shard3

=== controllers ===

Like the shards file, in the `config/controllers` list servers that will run as the blur controller
servers.  By default controller servers run on port `40010` and bind to the `0.0.0.0` address.

    controller1
    controller2

NOTE: If you are going to run a single shard server running controllers is not required. 
A single shard server is fully functional on it's own.  Controllers and the shard servers
share the same thrift API, so later your code won't have to be modified to run against a cluster.

=== $BLUR_HOME ===

It is a good idea to add `export BLUR_HOME=/var/blur` in your `.bash_profile`.

=== Setup Nodes ===

Copy the Blur directory to the same location on all servers in the cluster.

Mime
View raw message