tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Azuryy Yu <azury...@gmail.com>
Subject Re: Feedback for tajo-0.10.0
Date Mon, 16 Mar 2015 05:58:48 GMT
Another fix:
My test result is unfair during compare Imapla-2.1.2 and Tajo-0.10.0,
because I used Parquet with Impala and RCFILE snappy with Tajo. I should
use the same file format to compare.

because I've got a clear conclusion that Imapala works better on Parquet
than Tajo, so I use RCFILE as the test data.

*Tajo*:
default> select sum (cast(movie_vv as bigint)), sum(cast(movie_cv as
bigint)),sum(cast(movie_pt as bigint)) from snappy;
Progress: 0%, response time: 1.598 sec
Progress: 0%, response time: 1.6 sec
Progress: 0%, response time: 2.003 sec
Progress: 0%, response time: 2.806 sec
Progress: 37%, response time: 3.808 sec
Progress: 100%, response time: 4.792 sec
?sum_3,  ?sum_4,  ?sum_5
-------------------------------
22557920,  19648838,  2005366694576
(1 rows, 4.792 sec, 32 B selected)

*Impala*:
 > select sum (cast(movie_vv as bigint)), sum(cast(movie_cv as
bigint)),sum(cast(movie_pt as bigint)) from snappy;
+-------------------------------+-------------------------------+-------------------------------+
| sum(cast(movie_vv as bigint)) | sum(cast(movie_cv as bigint)) |
sum(cast(movie_pt as bigint)) |
+-------------------------------+-------------------------------+-------------------------------+
| 22557920                      | 19648838                      |
2005366694576                 |
+-------------------------------+-------------------------------+-------------------------------+
Fetched 1 row(s) in 11.12s



On Mon, Mar 16, 2015 at 1:49 PM, Azuryy Yu <azuryyyu@gmail.com> wrote:

> There is a typo in my Email. I corrected here:
>
> for example:
>
>   <property>
>     <name>tajo.master.umbilical-rpc.address</name>
>     <value>1-1-1-1:26001</value>
>   </property>
>
> which does work under tajo-0.9.0, but it complain "1-1-1-1:2601" is not a
> valid network address under tajo-0.10.0.
>
> I have to change to:
>   <property>
>     <name>tajo.master.umbilical-rpc.address</name>
>     <value>1.1.1.1:26001</value>
>   </property>
>
>
> On Mon, Mar 16, 2015 at 1:44 PM, Azuryy Yu <azuryyyu@gmail.com> wrote:
>
>> Hi,
>> I compiled tajo-0.10 source based on hadoop-2.6.0, then post some
>> feedback here.
>>
>> My cluster:
>> 1 tajo-master, 9 tajo-worker
>> 24 CPU(logic), 64GB mem, 4TB*12 HDD
>>
>> Feedback:
>> 1) tajo task progress estimate is normal on partitioned table, which is
>> incorrect sometimes in tajo-0.9.0
>> 2) Tajo configuration doesn't support hostname in tajo-site.xml.
>> for example:
>>
>>   <property>
>>     <name>tajo.master.umbilical-rpc.address</name>
>>     <value>1-1-1-1:26001</value>
>>   </property>
>>
>> which does work under tajo-0.9.0, but it complain "1-1-1-1:2601" is not a
>> valid network address.
>>
>> I have to change to:
>>   <property>
>>     <name>tajo.master.umbilical-rpc.address</name>
>>     <value>1.1.1.1:26001</value>
>>   </property>
>>
>> but we don't use IP in our cluster, only hostname. so I did a little in
>> the code:
>> org.apache.tajo.validation.NetworkAddressValidator.java:
>> hostnamePattern = Pattern.compile("\\d*-\\d*-\\d*-\\d");
>> then It works.
>>
>> 3) I did some test on the parquet, RCFILE(snappy compressed),
>> RCFILE(GZIP compressed)
>>
>> they are the same data, only different from file format.
>> the table has six partitions, 20 RCFILES, each parquet file is 1GB.
>>
>> then rcfile with snappy's performance is similiar to rcfile with gzip.
>> but they are all two~three times better than parquet.
>>
>> 4) I compared tajo-0.10 and Impala-2.1.2,
>> Impala can provide very good support for parquet. more better than Tajo.
>>
>> but impala is more *slow *with other format than Tajo.
>> such as(I don't use WHERE because I want query all six partitions
>> together):
>>
>> *Impala*:
>>  > select sum (cast(movie_vv as bigint)), sum(cast(movie_cv as
>> bigint)),sum(cast(movie_pt as bigint)) from par;
>>
>> +-------------------------------+-------------------------------+-------------------------------+
>> | sum(cast(movie_vv as bigint)) | sum(cast(movie_cv as bigint)) |
>> sum(cast(movie_pt as bigint)) |
>>
>> +-------------------------------+-------------------------------+-------------------------------+
>> | 22557920                      | 19648838                      |
>> 2005366694576           |
>>
>> +-------------------------------+-------------------------------+-------------------------------+
>> Fetched 1 row(s) in 6.02s
>>
>> *Tajo:*
>>
>> *default*> select sum (cast(movie_vv as bigint)), sum(cast(movie_cv as
>> bigint)),sum(cast(movie_pt as bigint)) from snappy;
>> Progress: 0%, response time: 1.598 sec
>> Progress: 0%, response time: 1.6 sec
>> Progress: 0%, response time: 2.003 sec
>> Progress: 0%, response time: 2.806 sec
>> Progress: 37%, response time: 3.808 sec
>> Progress: 100%, response time: 4.792 sec
>> ?sum_3,  ?sum_4,  ?sum_5
>> -------------------------------
>> 22557920,  19648838,  2005366694576
>> (1 rows, 4.792 sec, 32 B selected)
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message