asterixdb-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Magnus Kongshem <kongs...@stud.ntnu.no>
Subject Re: Indexes not performing well
Date Tue, 31 May 2016 18:42:07 GMT
Ok, so I took your advice and created a new data set with a filter on
timestamp and only a fraction of the indexes I had previously created. I
only kept the indexes based on the fields building, floor and id.

Attached you will find:

- The DLL for the data set
- The optimized logical plan for each query. (excluding query number six) I
used a timestamp range of 30 days, I'll test on 7 days and 1 day tomorrow.
- The optimized logical plan for a full table scan on the data set.

The results was disappointing unfortunately. The attached diagram shows the
query results compared to the other two tests I have performed. To be
clear, the gray bars represent the data set with the DLL attached in this
e-mail. I tried to decipher the logical plan for each query, but I did not
get anything reasonable out of it.

It's worth mentioning that AsterixDB currently does not support creating
filters on a data set with an auto generated UID field, unless there is
some magic that I am not aware of. This means I had to create a UUID for
each record in my data set before loading it into AsterixDB. This was
performed in Java 7 with UUID.randomUUID();

Any thoughts guys? Am I missing something?

BG,
Magnus

On Sun, May 29, 2016 at 8:32 AM, Sattam Alsubaiee <salsubaiee@gmail.com>
wrote:

> Creating indexes on fields with high selectivities (such as hourOfDay
> and dayOfWeek) are not encouraged at all. Each secondary index lookup will
> have to probe the primary index to fetch other fields in the record. It
> would be much more efficient if you just perform scans as opposed of
> accessing secondary indexes when querying such fields.
>
> I would recommend that you drop at least the following indexes:
> drop index posdata.hour;
> drop index posdata.day;
>
> Also I would highly recommend that you utilize AsterixDB filters, which is
> very good optimization (could save up to 99% of query time) when you deal
> with time-correlated fields such as timestamps:
> https://asterixdb.apache.org/docs/0.8.8-incubating/aql/filters.html
> http://dl.acm.org/citation.cfm?id=2786007
>
> Cheers,
> Sattam
>
> On Sun, May 29, 2016 at 8:58 AM, Michael Carey <mjcarey@ics.uci.edu>
> wrote:
>
>> @Pouria: Please share your findings here when you check this out - this
>> is quite strange, since none of the other performance results that have
>> been obtained on the system have looked anything like this.  (I will try to
>> look at this too at some point, but will unfortunately be MIA from June
>> 1-15 first.)  Weird....
>>
>> On 5/26/16 9:20 AM, Pouria Pirzadeh wrote:
>>
>> Hi Magnus,
>>
>> Thanks for your email and sharing the information.
>> If it is Ok with you, Would you please share with us the exact DDL
>> (including type definitions, dataset and index definitions) and exact AQL
>> queries that you ran against AsterixDB ?
>> I am just interested in checking the query plans and see what ended up
>> being run as jobs.
>>
>> Thanks.
>> Pouria
>>
>> On Thu, May 26, 2016 at 4:59 AM, Magnus Kongshem <kongshem@online.ntnu.no
>> > wrote:
>>
>>> Hi,
>>>
>>> There has been a lot of questions from me regarding AsterixDB and I
>>> thank all of you who have answered me. So it is time for me to contribute
>>> with some obeservations. I am writing my master thesis where I test
>>> multiple databases on a large data set. I should also mention that I have
>>> installed AsterixDB on a single machine.
>>>
>>> What I have observed is that asterixDB has a "poorer" read performance
>>> when I specify indexes on the data set compared to not implementing any
>>> indexes. See the attachment for details, its an excerpt of my thesis
>>> explaining and describing the queries, the indexes and the test results.
>>> Any thoughts on these test results?
>>>
>>> I also cannot help to notice that the read performance for a query
>>> querying a small portion, medium portion and large portion of the data set
>>> is very similar. The largest query finds 75 million records and the
>>> smallest query finds 3.5 million records, but almost have the same read
>>> performance. How can this be?
>>>
>>> Perhaps you can use these test results in the future development of
>>> asterixDB.
>>>
>>> I you would like, I can send you my final thesis when it's done.
>>>
>>> --
>>>
>>> Mvh
>>>
>>> Magnus Alderslyst Kongshem
>>> +47 415 65 906 <%2B47%20415%2065%20906>
>>>
>>
>>
>>

Mime
View raw message