asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jianfeng Jia <jianfeng....@gmail.com>
Subject Need Feed experts' help with an hanging issue
Date Tue, 01 Dec 2015 00:52:52 GMT
Dear devs,

I hit an wield issue that is reproducible, but only if the data has duplications and also
is large enough. Let me explained it step by step:

1. The dataset is very simple that only has two fields.
DDL AQL:
—————————————
drop dataverse test if exists;
create dataverse test;
use dataverse test;

create type t_test as closed{
  fa: int64,
  fb : int64
}

create dataset ds_test(t_test) primary key fa;

create feed fd_test using socket_adapter
(
    ("sockets"="nc1:10001"),
    ("address-type"="nc"),
    ("type-name"="t_test"),
    ("format"="adm"),
    ("duration"="1200")
);

set wait-for-completion-feed "false";
connect feed fd_test to dataset ds_test using policy AdvancedFT_Discard;

——————————————————————————————

That AdvancedFT_Discard policy will ignore the exception from the insertion and keep ingesting.


2. Ingesting the data by a very simple socked adapter which reads the record one by one from
an adm file. The src is here:https://github.com/JavierJia/twitter-tracker/blob/master/src/main/java/edu/uci/ics/twitter/asterix/feed/FileFeedSocketAdapterClient.java
<https://github.com/JavierJia/twitter-tracker/blob/master/src/main/java/edu/uci/ics/twitter/asterix/feed/FileFeedSocketAdapterClient.java>
The data and the app package is provided here: https://drive.google.com/folderview?id=0B423M7wGZj9dYVQ1TkpBNzcwSlE&usp=sharing
<https://drive.google.com/folderview?id=0B423M7wGZj9dYVQ1TkpBNzcwSlE&usp=sharing>
To feed the data you can run:

./bin/feedFile -u 172.17.0.2 -p 10001 -c 5000000 ~/data/twitter/test.adm 

-u for sever url
-p for server port
-c for count of line you want to ingest

3. After ingestion, all the requests about the ds_test was hanging. There is no exception
and no responds for hours. However it can respond any other queries that on other datasets,
like Metadata. 

That data contains some duplicated records which should trigger the insert exception. If I
change the count from 5000000 to lower, let’s say 3000000, it has no problems, although
it contains duplications as well.

Any feed experts have any hint on which part could be wrong? cc and nc log was attached. Thank
you!



Best,

Jianfeng Jia
PhD Candidate of Computer Science
University of California, Irvine


Mime
View raw message