hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Martin Illecker (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HAMA-815) Hama Pipes uses C++ templates
Date Sat, 16 Nov 2013 13:57:20 GMT
Martin Illecker created HAMA-815:
------------------------------------

             Summary: Hama Pipes uses C++ templates
                 Key: HAMA-815
                 URL: https://issues.apache.org/jira/browse/HAMA-815
             Project: Hama
          Issue Type: New Feature
          Components: bsp core, pipes
    Affects Versions: 0.6.3
            Reporter: Martin Illecker
            Assignee: Martin Illecker
             Fix For: 0.7.0


*Extending Hama Pipes to use C++ templates*

Currently all messages are converted to *strings* before they are transferred over a socket
communication between C++ and Java and vice versa.

To take advantage of the binary socket communication we will serialize and deserialize basic
types like *int, long, float, double* directly without converting to strings. This will minimize
the risk of type conversation errors. Other types (except these basic types) are transferred
as strings. 

It's also possible to create custom *Writables* and serialize and deserialize the object to
string by overriding the following methods. (e.g., *PipesVectorWritable* and *PipesKeyValueWritable*)
{code}
@Override public void readFields(DataInput in) throws IOException
@Override public void write(DataOutput out) throws IOException
{code}

Hama Streaming, which depends on Hama Pipes, is still using strings.

The following methods change

{{virtual void sendMessage(const string& peerName, const string& msg)}}
{{virtual const string& getCurrentMessage()}}
{{virtual void write(const string& key, const string& value)}}
{{virtual bool readNext(string& key, string& value)}}

to support C++ templates:

{{virtual void sendMessage(const string& peer_name, *const M& msg*)}}
{{virtual *M* getCurrentMessage()}}
{{virtual void write(*const K2& key, const V2& value*)}}
{{virtual bool readNext(*K1& key, V1& value*)}}

Also *SequenceFile* functions uses templates:

{{bool sequenceFileReadNext(int32_t file_id, *K& key, V& value*)}}
{{bool sequenceFileAppend(int32_t file_id, *const K& key, const V& value*)}}

And the native *Partitioner* does support it:

{code}
  template<class K1, class V1, class K2, class V2, class M>
  class Partitioner {
  public:
    virtual int partition(const K1& key, const V1& value, int32_t num_tasks) = 0;
    virtual ~Partitioner() {}
  };
{code}

This will minimize type conversation errors and change the compilation procedure. Because
of the nature of C++ templates, static libraries are not possible anymore. The compiler will
substitute all templates at compile time.

The compile command will look like:

{code}
g++ -m64 -Ic++/src/main/native/utils/api \
      -Ic++/src/main/native/pipes/api \
      -Lc++/target/native \
      -lhadooputils -lpthread \
      PROGRAM.cc \
      -o PROGRAM \
      -g -Wall -O2
{code}

Finally the job configuration does support the following properties:

{code}
  <property>
    <name>bsp.input.format.class</name>
    <value>org.apache.hama.bsp.KeyValueTextInputFormat</value>
  </property>
  <property>
    <name>bsp.input.key.class</name>
    <value>org.apache.hadoop.io.Text</value>
  </property>
  <property>
    <name>bsp.input.value.class</name>
    <value>org.apache.hadoop.io.Text</value>
  </property>
  <property>
    <name>bsp.output.format.class</name>
    <value>org.apache.hama.bsp.SequenceFileOutputFormat</value>
  </property>
  <property>
    <name>bsp.output.key.class</name>
    <value>org.apache.hadoop.io.Text</value>
  </property>
  <property>
    <name>bsp.output.value.class</name>
    <value>org.apache.hadoop.io.DoubleWritable</value>
  </property>
  <property>
    <name>bsp.message.class</name>
    <value>org.apache.hadoop.io.DoubleWritable</value>
  </property>
{code}




--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message