asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From abdullah alamoudi <bamou...@gmail.com>
Subject Re: Need Feed experts' help with an hanging issue
Date Tue, 01 Dec 2015 01:24:28 GMT
I know exactly what is going on here. The problem is you pointed out is
caused by the duplicate keys. If I remember correctly, the main issue is
that locks that are placed on the primary keys are not released.

I will start fixing this issue tonight.
Cheers,
Abdullah.

Amoudi, Abdullah.

On Mon, Nov 30, 2015 at 4:52 PM, Jianfeng Jia <jianfeng.jia@gmail.com>
wrote:

> Dear devs,
>
> I hit an wield issue that is reproducible, but only if the data has
> duplications and also is large enough. Let me explained it step by step:
>
> 1. The dataset is very simple that only has two fields.
> DDL AQL:
> —————————————
> drop dataverse test if exists;
> create dataverse test;
> use dataverse test;
>
> create type t_test as closed{
>   fa: int64,
>   fb : int64
> }
>
> create dataset ds_test(t_test) primary key fa;
>
> create feed fd_test using socket_adapter
> (
>     ("sockets"="nc1:10001"),
>     ("address-type"="nc"),
>     ("type-name"="t_test"),
>     ("format"="adm"),
>     ("duration"="1200")
> );
>
> set wait-for-completion-feed "false";
> connect feed fd_test to dataset ds_test using policy AdvancedFT_Discard;
>
> ——————————————————————————————
>
> That AdvancedFT_Discard policy will ignore the exception from the
> insertion and keep ingesting.
>
> 2. Ingesting the data by a very simple socked adapter which reads the
> record one by one from an adm file. The src is here:
> https://github.com/JavierJia/twitter-tracker/blob/master/src/main/java/edu/uci/ics/twitter/asterix/feed/FileFeedSocketAdapterClient.java
> The data and the app package is provided here:
> https://drive.google.com/folderview?id=0B423M7wGZj9dYVQ1TkpBNzcwSlE&usp=sharing
> To feed the data you can run:
>
> ./bin/feedFile -u 172.17.0.2 -p 10001 -c 5000000 ~/data/twitter/test.adm
>
> -u for sever url
> -p for server port
> -c for count of line you want to ingest
>
> 3. After ingestion, all the requests about the ds_test was hanging. There
> is no exception and no responds for hours. However it can respond any other
> queries that on other datasets, like Metadata.
>
> That data contains some duplicated records which should trigger the insert
> exception. If I change the count from 5000000 to lower, let’s say 3000000,
> it has no problems, although it contains duplications as well.
>
> Any feed experts have any hint on which part could be wrong? cc and nc log
> was attached. Thank you!
>
>
>
>
>
>
> Best,
>
> Jianfeng Jia
> PhD Candidate of Computer Science
> University of California, Irvine
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message