asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Carey <mjca...@ics.uci.edu>
Subject Re: Indexes not performing well
Date Tue, 31 May 2016 05:03:31 GMT
Guys,

Reminder:  There's also an option to give a hint in the query to ignore 
a given index (you associate the hint with the predicate), which might 
be simpler for query re-execution sans indices.  (Can someone on the dev 
list add a pointer to the place where this is documented, albeit poorly 
in all likelihood?  I couldn't find it in a quick search just now, but I 
know it's hiding there...)

Cheers,
Mike

On 5/30/16 9:25 PM, Pouria Pirzadeh wrote:
> Hi Magnus,
>
> Thanks for spending time to gather useful information and sharing them 
> with us.
> I guess the first potential performance pitfall that you may be 
> falling into is the large selectivity of the filters you are using 
> against indexed fields, as Sattam also pointed out.
>
> 1) With respect to the attached plans to the previous email, did you 
> add the plans before removing indices or after that ? Because in the 
> plan for Q2, I can still see that the hour index (which you mentioned 
> you dropped) is being used at the bottom.
>
> 2) If it is not too painful for you, can you also try two sets of 
> additional queries (their repose time could be helpful):
>
>   * Running a simple count query (without any filter) on the dataset
>     (I just wanna figure out what is the total time for a pure full scan).
>   * Running those queries that do not benefit from secondary indices
>     (those with similar times across different versions) without *any*
>     index. This basically can be done by simply creating a new dataset
>     without any secondary index, loading your data into it (you can
>     either use bulk load to load from raw data file(s), OR an "insert
>     into" query to load from your exiting dataset to the new one) and
>     then running those queries against the new dataset. I just wanna
>     find out if ignoring all the indices makes a difference or not.
>
>
> Thanks.
> Pouria
>
>
> On Mon, May 30, 2016 at 6:08 AM, Magnus Kongshem 
> <kongshem@stud.ntnu.no <mailto:kongshem@stud.ntnu.no>> wrote:
>
>     Attached you will find:
>
>     - My DLL
>     - My queries and their optimized logical plan
>
>     From what I can interpret, the queries does not use any of the
>     combined indexes. And query number two, three and four uses the
>     "hour" index which is based on the hourOfDay field. As Sattam
>     states, indexes based on this field may not be optimal. I deleted
>     that index as well as the "day" index and re-ran the queries. The
>     result was that the queries used the "stamp" index instead with
>     the same poor execution time.
>
>     I'm having some trouble creating a dataset with a filter. I keep
>     getting:
>
>     type mismatch: missing a required closed field uid:uuid [IllegalStateException]
>
>     use dataverse bigd;
>
>         create type table2 as closed {
>     uid: uuid,
>             campus: string,
>     building: string,
>     floor: string,
>             timestamp: int32,
>     dayOfWeek: int32,
>     hourOfDay: int32,
>             latitude: double,
>             salt_timestamp: int32,
>             longitude: double,
>             id: string,
>     accuracy: double
>         }
>     create dataset posdata3(table2)
>     primary key uid autogenerated with filter on timestamp;
>
>     load dataset posdata3 using localfs
>     (("path"="path/to/file/all.adm"),("format"="adm"));
>
>     BG,
>     Magnus
>
>     On Mon, May 30, 2016 at 10:01 AM, Magnus Kongshem
>     <kongshem@stud.ntnu.no <mailto:kongshem@stud.ntnu.no>> wrote:
>
>         Thank you for your replies. I will test your thoughts by
>         dropping the hourOfDay and dayOfWeek indexes, send you the
>         optimized query plans and utilize the AsterixDB filters.
>
>         I'll be back.
>
>         BG,
>         Magnus
>
>         On Sun, May 29, 2016 at 8:32 AM, Sattam Alsubaiee
>         <salsubaiee@gmail.com <mailto:salsubaiee@gmail.com>> wrote:
>
>             Creating indexes on fields with high selectivities (such
>             as hourOfDay and dayOfWeek) are not encouraged at all.
>             Each secondary index lookup will have to probe the primary
>             index to fetch other fields in the record. It would be
>             much more efficient if you just perform scans as opposed
>             of accessing secondary indexes when querying such fields.
>
>             I would recommend that you drop at least the following
>             indexes:
>             drop index posdata.hour;
>             drop index posdata.day;
>
>             Also I would highly recommend that you utilize AsterixDB
>             filters, which is very good optimization (could save up to
>             99% of query time) when you deal with time-correlated
>             fields such as timestamps:
>             https://asterixdb.apache.org/docs/0.8.8-incubating/aql/filters.html
>             http://dl.acm.org/citation.cfm?id=2786007
>
>             Cheers,
>             Sattam
>
>             On Sun, May 29, 2016 at 8:58 AM, Michael Carey
>             <mjcarey@ics.uci.edu <mailto:mjcarey@ics.uci.edu>> wrote:
>
>                 @Pouria: Please share your findings here when you
>                 check this out - this is quite strange, since none of
>                 the other performance results that have been obtained
>                 on the system have looked anything like this.  (I will
>                 try to look at this too at some point, but will
>                 unfortunately be MIA from June 1-15 first.) Weird....
>
>
>                 On 5/26/16 9:20 AM, Pouria Pirzadeh wrote:
>>                 Hi Magnus,
>>
>>                 Thanks for your email and sharing the information.
>>                 If it is Ok with you, Would you please share with us
>>                 the exact DDL (including type definitions, dataset
>>                 and index definitions) and exact AQL queries that you
>>                 ran against AsterixDB ?
>>                 I am just interested in checking the query plans and
>>                 see what ended up being run as jobs.
>>
>>                 Thanks.
>>                 Pouria
>>
>>                 On Thu, May 26, 2016 at 4:59 AM, Magnus Kongshem
>>                 <kongshem@online.ntnu.no
>>                 <mailto:kongshem@online.ntnu.no>> wrote:
>>
>>                     Hi,
>>
>>                     There has been a lot of questions from me
>>                     regarding AsterixDB and I thank all of you who
>>                     have answered me. So it is time for me to
>>                     contribute with some obeservations. I am writing
>>                     my master thesis where I test multiple databases
>>                     on a large data set. I should also mention that I
>>                     have installed AsterixDB on a single machine.
>>
>>                     What I have observed is that asterixDB has a
>>                     "poorer" read performance when I specify indexes
>>                     on the data set compared to not implementing any
>>                     indexes. See the attachment for details, its an
>>                     excerpt of my thesis explaining and describing
>>                     the queries, the indexes and the test results.
>>                     Any thoughts on these test results?
>>
>>                     I also cannot help to notice that the read
>>                     performance for a query querying a small portion,
>>                     medium portion and large portion of the data set
>>                     is very similar. The largest query finds 75
>>                     million records and the smallest query finds 3.5
>>                     million records, but almost have the same read
>>                     performance. How can this be?
>>
>>                     Perhaps you can use these test results in the
>>                     future development of asterixDB.
>>
>>                     I you would like, I can send you my final thesis
>>                     when it's done.
>>
>>                     -- 
>>
>>                     Mvh
>>
>>                     Magnus Alderslyst Kongshem
>>                     +47 415 65 906 <tel:%2B47%20415%2065%20906>
>>
>>
>
>
>
>
>
>         -- 
>
>         Mvh
>
>         Magnus Alderslyst Kongshem
>         Seniorkomiteen
>         Online, linjeforeningen for informatikk
>         +47 415 65 906 <tel:%2B47%20415%2065%20906>
>
>
>
>
>     -- 
>
>     Mvh
>
>     Magnus Alderslyst Kongshem
>     Seniorkomiteen
>     Online, linjeforeningen for informatikk
>     +47 415 65 906 <tel:%2B47%20415%2065%20906>
>
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message