asterixdb-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Magnus Kongshem <kongs...@stud.ntnu.no>
Subject Re: Indexes not performing well
Date Fri, 27 May 2016 09:08:57 GMT
My DDL is the following:

drop dataverse bigd if exists;
    create dataverse bigd;
    use dataverse bigd;

    create type table as open {
uid: uuid,
        campus: string,
building: string,
floor: string,
        timestamp: int32,
dayOfWeek: int32,
hourOfDay: int32,
        latitude: double,
        salt_timestamp: int32,
        longitude: double,
        id: string,
accuracy: double
    }
create dataset posdata(table)
    primary key uid autogenerated;
create index stamp on posdata(timestamp);
create index hour on posdata(hourOfDay);
create index day on posdata(dayOfWeek);
create index id on posdata(id);
create index acc on posdata(accuracy);
create index building on posdata(building);
create index floor on posdata(floor);
create index stamp_id on posdata(timestamp, id);
create index stamp_hour on posdata(timestamp, hourOfDay);
create index stamp_day on posdata(timestamp, dayOfWeek);
create index stamp_acc on posdata(timestamp, accuracy);
create index stamp_building on posdata(timestamp, building);
create index stamp_hour_day on posdata(timestamp, hourOfDay, dayOfWeek);
create index stamp_hour_building on posdata(timestamp, hourOfDay, building);

The following is the syntax for my eight queries. I performed three
versions of each query where the timestamp was different.
The "startTime" was always 1412121600.
The "endTime" was 1412208000 which equals 1 day of data.
The "endTime" was 1412726400 which equals 7 days of data.
The "endTime" was 1414800000 which equals 30 days of data.

Note that query number six was only run when querying 1 day of data because
of the "id" field in the data set. The id field is a unique identifier for
a device on a specific day, and not unique for the entire data set.

count(from $obj in dataset posdata
    where $obj.timestamp >= 1412121600 and $obj.timestamp <= 1414800000
    return $obj);

count(from $obj in dataset posdata
    where $obj.timestamp >= 1412121600 and $obj.timestamp <= 1414800000 and
$obj.hourOfDay > 9 and $obj.hourOfDay < 15
    return $obj);

count(from $obj in dataset posdata
    where $obj.timestamp >= 1412121600 and $obj.timestamp <= 1414800000 and
$obj.hourOfDay > 9 and $obj.hourOfDay < 15 and ($obj.dayOfWeek = 1 or
$obj.dayOfWeek = 3)
    return $obj);

count(from $obj in dataset posdata
    where $obj.timestamp >= 1412121600 and $obj.timestamp <= 1414800000 and
$obj.hourOfDay > 9 and $obj.hourOfDay < 15 and $obj.building = "IT-Vest"
    return $obj);

count(from $obj in dataset posdata
    where $obj.timestamp >= 1412121600 and $obj.timestamp <= 1414800000 and
$obj.accuracy < 10
    return $obj);

count(from $obj in dataset posdata
    where $obj.timestamp >= 1412121600 and $obj.timestamp <= 1414800000 and
$obj.id = "10"
    return $obj);

count(from $obj in dataset posdata
    where $obj.timestamp >= 1412121600 and $obj.timestamp <= 1414800000 and
$obj.hourOfDay > 9 and $obj.hourOfDay < 15 and ($obj.dayOfWeek = 1 or
$obj.dayOfWeek = 3) and $obj.building = "Realfagbygget" and $obj.floor =
"4. etasje"
    return $obj);

count(from $obj in dataset posdata
    where $obj.timestamp >= 1412121600 and $obj.timestamp <= 1414800000 and
$obj.building = "Realfagbygget"
    return $obj);

If you need anything else, please tell me.

BG,

Magnus

On Thu, May 26, 2016 at 6:20 PM, Pouria Pirzadeh <pouria.pirzadeh@gmail.com>
wrote:

> Hi Magnus,
>
> Thanks for your email and sharing the information.
> If it is Ok with you, Would you please share with us the exact DDL
> (including type definitions, dataset and index definitions) and exact AQL
> queries that you ran against AsterixDB ?
> I am just interested in checking the query plans and see what ended up
> being run as jobs.
>
> Thanks.
> Pouria
>
> On Thu, May 26, 2016 at 4:59 AM, Magnus Kongshem <kongshem@online.ntnu.no>
> wrote:
>
>> Hi,
>>
>> There has been a lot of questions from me regarding AsterixDB and I thank
>> all of you who have answered me. So it is time for me to contribute with
>> some obeservations. I am writing my master thesis where I test multiple
>> databases on a large data set. I should also mention that I have installed
>> AsterixDB on a single machine.
>>
>> What I have observed is that asterixDB has a "poorer" read performance
>> when I specify indexes on the data set compared to not implementing any
>> indexes. See the attachment for details, its an excerpt of my thesis
>> explaining and describing the queries, the indexes and the test results.
>> Any thoughts on these test results?
>>
>> I also cannot help to notice that the read performance for a query
>> querying a small portion, medium portion and large portion of the data set
>> is very similar. The largest query finds 75 million records and the
>> smallest query finds 3.5 million records, but almost have the same read
>> performance. How can this be?
>>
>> Perhaps you can use these test results in the future development of
>> asterixDB.
>>
>> I you would like, I can send you my final thesis when it's done.
>>
>> --
>>
>> Mvh
>>
>> Magnus Alderslyst Kongshem
>> +47 415 65 906
>>
>
>


-- 

Mvh

Magnus Alderslyst Kongshem
Seniorkomiteen
Online, linjeforeningen for informatikk
+47 415 65 906

Mime
View raw message