asterixdb-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wenhai (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ASTERIXDB-1776) Data loss in many multi-partitions
Date Thu, 02 Feb 2017 08:24:51 GMT

    [ https://issues.apache.org/jira/browse/ASTERIXDB-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15849643#comment-15849643
] 

Wenhai commented on ASTERIXDB-1776:
-----------------------------------

lwh@node071:~/asterixdb> 
lwh@node071:~/asterixdb> while true; do managix describe;sleep 3;done
{noformat}
INFO: Name:fuzzy
Created:Sun Jan 29 05:00:15 CST 2017
Web-Url:http://10.10.10.71:19009
State:INACTIVE (Thu Feb 02 16:19:28 CST 2017)
Last modified timestamp:Sun Jan 29 05:01:13 CST 2017

WARNING!:Following process still running 
NC at client72 [ 31377 ] 
NC at client74 [ 33462 ] 
NC at client75 [ 9142 ] 
NC at client76 [ 2292 ] 
NC at client77 [ 1883 ] 
NC at client78 [ 39088 ] 
NC at client79 [ 28538 ] 
NC at client80 [ 32834 ] 
{noformat}

> Data loss in many multi-partitions
> ----------------------------------
>
>                 Key: ASTERIXDB-1776
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1776
>             Project: Apache AsterixDB
>          Issue Type: Bug
>          Components: Hyracks Core
>         Environment: MAC/Linux
>            Reporter: Wenhai
>            Assignee: Ian Maxon
>            Priority: Critical
>         Attachments: cc.log, demo.xml, execute.log, tpch_node1.log, tpch_node2.log
>
>
> Total description: If we configure more than 24 partitions in each NC, we always loss
almost half of the partitions, without any error information or logs.
> Schema:
> {noformat}
> drop dataverse tpch if exists;
> create dataverse tpch;
> use dataverse tpch;
> create type LineItemType as closed {
>   l_orderkey: int32,
>   l_partkey: int32,
>   l_suppkey: int32,
>   l_linenumber: int32,
>   l_quantity: int32,
>   l_extendedprice: double,
>   l_discount: double,
>   l_tax: double,
>   l_returnflag: string,
>   l_linestatus: string,
>   l_shipdate: string,
>   l_commitdate: string,
>   l_receiptdate: string,
>   l_shipinstruct: string,
>   l_shipmode: string,
>   l_comment: string
> }
> create dataset LineItem(LineItemType)
>   primary key l_orderkey, l_linenumber;
> load dataset LineItem 
> using localfs
> (("path"="127.0.0.1:///path-to-tpch-data/tpch0.001/lineitem.tbl"),("format"="delimited-text"),("delimiter"="|"));
> {noformat}
> Query:
> {noformat}
> use dataverse tpch;
> let $s := count(
> for $d in dataset LineItem
> return $d
> )
> return $s
> {noformat}
> Return:
> {noformat}
> 6005
> {noformat}
> Command:
> {noformat}
> managix stop -n tpch
> managix start -n tpch
> {noformat}
> Query:
> {noformat}
> use dataverse tpch;
> let $s := count(
> for $d in dataset LineItem
> return $d
> )
> return $s
> {noformat}
> Return:
> {noformat}
> 4521
> {noformat}
> We lose 1/3 records in this tiny test. When we increase the tpch scale onto 200gb across
196 partitions by the distribution of 8 X 24, we should get 1.2 billion records, but it only
returned 0.45 billion!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message