tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Azuryy Yu <azury...@gmail.com>
Subject Re: Feedback for tajo-0.10.0
Date Mon, 16 Mar 2015 05:49:31 GMT
There is a typo in my Email. I corrected here:

for example:

  <property>
    <name>tajo.master.umbilical-rpc.address</name>
    <value>1-1-1-1:26001</value>
  </property>

which does work under tajo-0.9.0, but it complain "1-1-1-1:2601" is not a
valid network address under tajo-0.10.0.

I have to change to:
  <property>
    <name>tajo.master.umbilical-rpc.address</name>
    <value>1.1.1.1:26001</value>
  </property>


On Mon, Mar 16, 2015 at 1:44 PM, Azuryy Yu <azuryyyu@gmail.com> wrote:

> Hi,
> I compiled tajo-0.10 source based on hadoop-2.6.0, then post some feedback
> here.
>
> My cluster:
> 1 tajo-master, 9 tajo-worker
> 24 CPU(logic), 64GB mem, 4TB*12 HDD
>
> Feedback:
> 1) tajo task progress estimate is normal on partitioned table, which is
> incorrect sometimes in tajo-0.9.0
> 2) Tajo configuration doesn't support hostname in tajo-site.xml.
> for example:
>
>   <property>
>     <name>tajo.master.umbilical-rpc.address</name>
>     <value>1-1-1-1:26001</value>
>   </property>
>
> which does work under tajo-0.9.0, but it complain "1-1-1-1:2601" is not a
> valid network address.
>
> I have to change to:
>   <property>
>     <name>tajo.master.umbilical-rpc.address</name>
>     <value>1.1.1.1:26001</value>
>   </property>
>
> but we don't use IP in our cluster, only hostname. so I did a little in
> the code:
> org.apache.tajo.validation.NetworkAddressValidator.java:
> hostnamePattern = Pattern.compile("\\d*-\\d*-\\d*-\\d");
> then It works.
>
> 3) I did some test on the parquet, RCFILE(snappy compressed), RCFILE(GZIP
> compressed)
>
> they are the same data, only different from file format.
> the table has six partitions, 20 RCFILES, each parquet file is 1GB.
>
> then rcfile with snappy's performance is similiar to rcfile with gzip. but
> they are all two~three times better than parquet.
>
> 4) I compared tajo-0.10 and Impala-2.1.2,
> Impala can provide very good support for parquet. more better than Tajo.
>
> but impala is more *slow *with other format than Tajo.
> such as(I don't use WHERE because I want query all six partitions
> together):
>
> *Impala*:
>  > select sum (cast(movie_vv as bigint)), sum(cast(movie_cv as
> bigint)),sum(cast(movie_pt as bigint)) from par;
>
> +-------------------------------+-------------------------------+-------------------------------+
> | sum(cast(movie_vv as bigint)) | sum(cast(movie_cv as bigint)) |
> sum(cast(movie_pt as bigint)) |
>
> +-------------------------------+-------------------------------+-------------------------------+
> | 22557920                      | 19648838                      |
> 2005366694576           |
>
> +-------------------------------+-------------------------------+-------------------------------+
> Fetched 1 row(s) in 6.02s
>
> *Tajo:*
>
> *default*> select sum (cast(movie_vv as bigint)), sum(cast(movie_cv as
> bigint)),sum(cast(movie_pt as bigint)) from snappy;
> Progress: 0%, response time: 1.598 sec
> Progress: 0%, response time: 1.6 sec
> Progress: 0%, response time: 2.003 sec
> Progress: 0%, response time: 2.806 sec
> Progress: 37%, response time: 3.808 sec
> Progress: 100%, response time: 4.792 sec
> ?sum_3,  ?sum_4,  ?sum_5
> -------------------------------
> 22557920,  19648838,  2005366694576
> (1 rows, 4.792 sec, 32 B selected)
>
>
>
>
>
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message