asterixdb-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jianfeng Jia (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ASTERIXDB-1264) Feed didn't release lock if the ingesting hit some exceptions
Date Thu, 09 Jun 2016 23:39:21 GMT

    [ https://issues.apache.org/jira/browse/ASTERIXDB-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15323604#comment-15323604
] 

Jianfeng Jia commented on ASTERIXDB-1264:
-----------------------------------------

This is the DDL: 
{code}
drop dataverse twitter if exists;
create dataverse twitter if not exists;
use dataverse twitter

create type typeUser if not exists as open {
    id: int64,
    name: string,
    screen_name : string,
    lang : string,
    location: string,
    create_at: date,
    description: string,
    followers_count: int32,
    friends_count: int32,
    statues_count: int64
}

create type typePlace if not exists as open{
    country : string,
    country_code : string,
    full_name : string,
    id : string,
    name : string,
    place_type : string,
    bounding_box : rectangle
}

create type typeGeoTag if not exists as open {
    stateID: int32,
    stateName: string,
    countyID: int32,
    countyName: string,
    cityID: int32?,
    cityName: string?
}

create type typeTweet if not exists as open{
    create_at : datetime,
    id: int64,
    "text": string,
    in_reply_to_status : int64,
    in_reply_to_user : int64,
    favorite_count : int64,
    coordinate: point?,
    retweet_count : int64,
    lang : string,
    is_retweet: boolean,
    hashtags : {{ string }} ?,
    user_mentions : {{ int64 }} ? ,
    user : typeUser,
    place : typePlace?,
    geo_tag: typeGeoTag
}

create dataset ds_tweet(typeTweet) if not exists primary key id with filter on create_at;
//"using" "compaction" "policy" CompactionPolicy ( Configuration )? )?
create index text_idx if not exists on ds_tweet("text") type keyword;
create index location_idx if not exists on ds_tweet(coordinate) type rtree;
// create index time_idx if not exists on ds_tweet(create_at) type btree;
create index state_idx if not exists on ds_tweet(geo_tag.stateID) type btree;
create index county_idx if not exists on ds_tweet(geo_tag.countyID) type btree;
create index city_idx if not exists on ds_tweet(geo_tag.cityID) type btree;

create feed MessageFeed using localfs(
("path"="128.195.52.77:///home/jianfeng/data/head20m.adm"),
("format"="adm"),
("type-name"="typeTweet"));

set wait-for-completion-feed "true";
connect feed MessageFeed to dataset ds_tweet;
{code}

This file feed worked OK. After the system read the adm, I am using another socket_adpter
to keep ingesting the real-time data, then the freeze scenario happens.
The original data is too big, I upload a small sample [here|https://drive.google.com/open?id=0B423M7wGZj9daWpCczRvalNZRkk].
Hopefully it can reproduce the problem. Let me know if it can't, I will upload the bigger
one.



> Feed didn't release lock if the ingesting hit some exceptions
> -------------------------------------------------------------
>
>                 Key: ASTERIXDB-1264
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1264
>             Project: Apache AsterixDB
>          Issue Type: Bug
>          Components: Feeds
>            Reporter: Jianfeng Jia
>            Assignee: Abdullah Alamoudi
>
> This is a discussed issue in the mailing list. I copy it here to make it more tractable
and shareable. 
> I hit an wield issue that is reproducible, but only if the data has duplications and
also is large enough. Let me explained it step by step:
> 1. The dataset is very simple that only has two fields.
> DDL AQL:
> {code}
> drop dataverse test if exists;
> create dataverse test;
> use dataverse test;
> create type t_test as closed{
>   fa: int64,
>   fb : int64
> }
> create dataset ds_test(t_test) primary key fa;
> create feed fd_test using socket_adapter
> (
>     ("sockets"="nc1:10001"),
>     ("address-type"="nc"),
>     ("type-name"="t_test"),
>     ("format"="adm"),
>     ("duration"="1200")
> );
> set wait-for-completion-feed "false";
> connect feed fd_test to dataset ds_test using policy AdvancedFT_Discard;
> {code}
> ——————————————————————————————
> That AdvancedFT_Discard policy will ignore the exception from the insertion and keep
ingesting. 
> 2. Ingesting the data by a very simple socked adapter which reads the record one by one
from an adm file. The src is here:https://github.com/JavierJia/twitter-tracker/blob/master/src/main/java/edu/uci/ics/twitter/asterix/feed/FileFeedSocketAdapterClient.java
> The data and the app package is provided here: https://drive.google.com/folderview?id=0B423M7wGZj9dYVQ1TkpBNzcwSlE&usp=sharing
> To feed the data you can run:
> ./bin/feedFile -u 172.17.0.2 -p 10001 -c 5000000 ~/data/twitter/test.adm 
> -u for sever url
> -p for server port
> -c for count of line you want to ingest
> 3. After ingestion, all the requests about the ds_test was hanging. There is no exception
and no responds for hours. However it can respond any other queries that on other datasets,
like Metadata. 
> That data contains some duplicated records which should trigger the insert exception.
If I change the count from 5000000 to lower, let’s say 3000000, it has no problems, although
it contains duplications as well.
> Answer from [~amoudi] : 
> I know exactly what is going on here. The problem is you pointed out is
> caused by the duplicate keys. If I remember correctly, the main issue is
> that locks that are placed on the primary keys are not released.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message