db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sai Pullabhotla" <sai.pullabho...@jmethods.com>
Subject Re: Need Advice on Indexes
Date Thu, 25 Dec 2008 17:05:40 GMT

We went through all of these during the design phase and decided that
we do not have to do the kind of normalization you have mentioned.
Below are a couple of reasons:

As stated before, my jobs table is nothing more than a plain LOG file
which is stored in a database table instead of a regular file on the
file system. The only reason why we needed to store it in the database
is for easy search/filtering of records. If I were to use a plain log
file, I don't think I would want to print a user ID (a number) to a
log file which does not make much sense to the users reading it. I
guess the same holds true regardless of the underlying data store
(plain file or database table or something else).

About not really deleting a user record and having a "status" field in
the user table, we did go through this and here are the issues with
this approach:
Let us say my table definition is -
    user_id integer primary key,
    user_name char(20),
    full_name varchar(50));

We do want to have a unique constraint on the user_name field. If we
do not purge the user record when a delete is requested, then I cannot
have a new user with the same name as a user that was deleted 2 years
ago. Same is true even if you create a composite unique key with
user_name and status fields. Correct me if I'm wrong.

You said that -

> Based on this query, your index would never be used.

Why not and how do I make Derby to use my index in various scenarios
I've mentioned.

Sai Pullabhotla
Phone: (402) 408-5753
Fax: (402) 408-6861

On Wed, Dec 24, 2008 at 4:10 PM,  <derby@segel.com> wrote:
> Sai, please see my comments in yours...
>> -----Original Message-----
>> From: Sai Pullabhotla [mailto:sai.pullabhotla@jmethods.com]
>> Sent: Wednesday, December 24, 2008 1:54 PM
>> To: Derby Discussion; msegel@segel.com
>> Subject: Re: Need Advice on Indexes
>> Rick and Mike,
>> Thanks for taking time to reply to my question.
>> As far as normalizing the table that Mike suggested -
>> this jobs table is nothing more than a history table. So, a project or
>> user exists today but may not exist tomorrow. However, we want to
>> maintain the history of every project that was executed (this is
>> considered a job and hence the name). Since, a user may not exist in a
>> future time, I can not maintain a foreign key to my user table even
>> though I've a user table and an ID column in it. Same with projects. I
>> hope it makes sense now why this table is a flat table. However, I
>> think I can move the remarks column into its own table and use a
>> longvarchar or clob field.
> Ok,
> That's not a good design. Your user information should be stored separately
> and have an id associated with it. You don't want to purge this information;
> however you can associate a status field as part of this table. So you
> retain and don't reuse the ids and you can still keep historical data
> intact.
> One of the problems you may have is how you populate the field with the
> employee's name. What happens if you have two employees with the same name?
> (Yes it happens. ;-)
> The other problem is what happens when the person entering the name
> misspells the name? (Segel != Segal,Siegel,Siegal,Seagull) :-)
> In addition, your table isn't normalized when you clearly have a 1 to many
> relationship of data. Conceptually there are multiple jobs to a project. So
> when you store jobs and projects in the same data, its not normalized.
> Note: There are times when you break from using 3rd normal form, however it
> requires a lot of practical experience to justify it.
>> Now, as far as the queries go, Most of the times, it is -
>> select  job_id, proj_id, proj_name, proj_folder, submit_user,
>> run_user, queue_priority, run_priority, submit_time, start_time,
>> end_time, time_taken, job_log, status from dpa_job where start_time>=?
>> and start_time<? and status in(?, ?, ?, ?) order by start_time desc
> Ok,
> Not to nitpick, but I'm a bit old school and when writing a query I tend to
> want to see it formatted to be easier to read. Taking your query...
> SELECT  job_id, proj_id, proj_name, proj_folder,
>        submit_user, run_user, queue_priority,
>        run_priority, submit_time, start_time, end_time,
>        time_taken, job_log, status
> FROM    dpa_job
> WHERE   start_time>=?
> AND     start_time<?
> AND     status IN(?, ?, ?, ?)
> ORDER BY start_time desc
> It's a bit easier to read. The key words are capitalized and you can see
> your queries.
> Now looking at your common query example, the only thing you need to index
> on is the start time column. You can add status to the query, however its
> not going to really impact the performance of the query... unless you really
> have a lot of rows. (Like more than 10,000 rows in the result set if you
> ignore the status filter.)
> Does the IN clause effect the performance? There was an issue earlier in
> Derby, but I think it has been fixed for a couple of versions. In your
> example query with only 4 values, it is not that big of an issue.
> Based on this query, your index would never be used.
> You've indicated that this is a production database with customers using it.
> I'm going to strongly suggest that your company hire a competent DBA. You've
> said a couple of things that are major red flags.  When I say DBA, I mean
> someone who could be considered a logical DBA and has both data modeling and
> app development expertise. If not a FTE, get a consultant in to do a health
> check.
>> Sai Pullabhotla
>> Phone: (402) 408-5753
>> Fax: (402) 408-6861
>> www.jMethods.com

View raw message