db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Harshvardhan Gupta (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DERBY-6937) Load the IMDB data set in Derby, obtain and adapt Join order Benchmark queries for use in derby
Date Sat, 03 Jun 2017 18:06:04 GMT

     [ https://issues.apache.org/jira/browse/DERBY-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Harshvardhan Gupta updated DERBY-6937:
--------------------------------------
    Attachment: schema_derby.sql

I have completed work on this by ingesting the same data used in the original paper in my
local derby env. I am also attaching the schema that is going to be used in derby (adapted
for derby) to run the queries.

IMDB Data Snapshot used in the original VLDB paper -
http://homepages.cwi.nl/~boncz/job/imdb.tgz

The license and links to the current version IMDB data set can be
found at 
http://www.imdb.com/interfaces

While using SYSCS_IMPORT_TABLE to ingest the data, I got problem with handling NULL values,
Derby expects null values to be blank in CSV, however when keyword "NULL" is present, it reads
in character sequence as is for char data types and throws error in case of non character
datatypes such as INTEGER, NUMERIC etc. I got around the problem using a quick hack.

Should we create a new variant of SYSCS_IMPORT_TABLE to take in an additional argument which
will specify the string used to represent NULL in CSV dump? Or should we rather work on making
existing procedures var-args rather than adding a new procedure.

I tried making the existing procedure as var-args but ran into same problem as discussed in
this JIRA -
https://issues.apache.org/jira/browse/DERBY-4555




> Load the IMDB data set in Derby, obtain and adapt Join order Benchmark queries for use
in derby 
> ------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-6937
>                 URL: https://issues.apache.org/jira/browse/DERBY-6937
>             Project: Derby
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Harshvardhan Gupta
>            Assignee: Harshvardhan Gupta
>            Priority: Minor
>         Attachments: schema_derby.sql
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message