db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Francois Orsini" <francois.ors...@gmail.com>
Subject Re: keeping the table ordered
Date Wed, 07 Feb 2007 01:27:50 GMT
On 2/6/07, Nurullah Akkaya <nurullah_akkaya@yahoo.com> wrote:
>
> for who ever searches these archives creating a composite index that
> includes all the columns i query solved the problem. i can read 2.7million records in
11 seconds another query that selects around  800k
> records dropped to about 4 secs.
>

This is also called 'Index Covering' - Dramatic performance increase can be
seen when all the columns needed by the query are included in the index and
you just observed that ;-)

On Feb 6, 2007, at 12:40 PM, Craig L Russell wrote:
>
> It sounds like the issue is the structure of the rows on the data pages.
> IIUC, once a row is stored on a data page, it's going to be there for a long
> long time. So as you add new rows, they are added to data pages where there
> is free space. The only way to move a row is to delete it and insert it so
> it ends up in another place.
> It might be that for this purpose, you might have to restructure the
> database so that it is organized the way you want it. For example, perform
> an index scan based on the desired order and insert rows into a new table,
> which will have the effect of ordering the rows on data pages according to
> the insert order.
>
> If availability is an issue, you might have to partition the data so that
> while the old table is being reorganized into the new table, you keep track
> of insert and delete activity so you can apply the changes to the new table
> before putting the new table into service.
>
> Craig
>
> On Feb 6, 2007, at 10:08 AM, Mamta Satoor wrote:
>
> To answer your question on compound index. It just means to define an
> index which includes more than one column. eg
> **
> CREATE TABLE t1(c1 INT, c2 INT, c3 INT, c4 INT)
> CREATE INDEX i1_2_4 ON t1(c1, c4, c2)
> **
> HTH,
> Mamta
>
>
> On 2/6/07, Nurullah Akkaya <nurullah_akkaya@yahoo.com> wrote:
> >
> >
> > On Feb 6, 2007, at 11:25 AM, Michael Segel wrote:
> >
> > > Sorry to top post, on my crackberry...
> > >
> > > I think you missed my point.
> > > Select the count of your documents that use the word 'the'.
> > >
> > > Ok so let's say that you want to search for all of the documents
> > > that use the word 'the'.
> > > You first lookup the integer representation of the word. Let's say
> > > that its = 100.
> > >
> > > How many times is the value 100 going to be in your index?
> > that varies with the document set with 2 million documents i have
> > around 2.5 million 'the' entries.
> >
> > >
> > > Ok?
> > >
> > > But to your other point... You see that your data is not
> > > contiguous. Hmmm ok,so assuming that your primary index is wordID,
> > > how do you handle documents that have multiple words? So if you
> > > search on 'the' you'll get one set of data and if you then search
> > > on the wordID for 'is', you'll have data that isn't in sort order
> > > on the disk.
> > assume the following ids.
> > the -> 100
> > is -> 150
> > 101 -> linux
> >
> > i want my tables to be sorted like the following. not just the word
> > the but all id's are sorted
> > 100
> > 100
> > 100
> > 100
> > 100
> > 100
> > 101
> > 101
> > 101
> > 101
> > 150
> > 150
> > 150
> >
> > from my knowledge of databases they sorted in random order thus we
> > have indexes pointing where the data is. from the upper example i am
> > going to read one big chunk of data from the disk but in the bottom
> > example i will read 100 then jump a buch of records and read next.
> >
> > 100
> > 101
> > 150
> > 101
> > 101
> > 100
> > 150
> >
> > where can i learn more about the compound index. create index
> > statement in ref manual doesn't mention it?
> >
> > >
> > > Now here's something that may help,
> > > Drop all of your indexes and create a single compound index where
> > > the first field is wordID.
> > >
> > > That may help you out...
> > >
> > >
> > > Sent via BlackBerry.
> > >
> > > -Mike Segel
> > > Principal
> > > MSCC
> > > 312 952 8175
> > >
> > >
> > > -----Original Message-----
> > > From: Nurullah Akkaya < nurullah_akkaya@yahoo.com>
> > > Date: Tue, 6 Feb 2007 11:14:02
> > > To:Derby Discussion <derby-user@db.apache.org>
> > > Subject: Re: keeping the table ordered
> > >
> > > It is not quite clear to me what you are trying to achieve.  Why do
> > > you want a sequential read?  Scanning the entire table of 100
> > > million records should take longer time than looking up a record
> > > using a index on wordid.  Have you retrieved the query plan and
> > > made sure the index on wordid is used?  Or are you talking about
> > > doing a lookup of many different wordids in sorted order?
> > >
> > >
> > >
> > > i did not meant sequential scanning of the whole table i meant disk
> > > i/o( bottom paragraph explains it )
> > > yes i checked the query plan and derby uses index to lookup records
> > > and index look up checks only two index pages. so i came to the
> > > conclusion that most of the time is lost making random i/o request
> > > for the data thats why i am trying to keep the table sorted. since
> > > sequential hard disk access is much faster than random i/o .
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Feb 6, 2007, at 8:09 AM, Michael Segel wrote:
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > What exactly are you trying to do?
> > > Based on the little snippet, it looks like this is an exercise to
> > > create a
> > > "google like" search on a series of documents.
> > >
> > >
> > > The problem is that your wordID, while an integer, is not going to
> > > be unique
> > > enough.
> > >
> > >
> > >
> > > wordId isn't unique at all each word in a document gets a
> > > corresponding posting entry i look up wordId for the word the then
> > > select all docId's containg the wordId. that posting list is
> > > basicly a big inverted list. what i am trying to do is keep the
> > > table sorted by wordId so insted of keeping values randomly on disk
> > > they are being written sequentialy to the file so that instead of
> > > doing random i/o i just do a sequential read from the hard drive. i
> > > don't want  sequential scanning of the whole table.
> > >
> > >
> > >
> > >
> > >
> > > For example, search your documents where the wordID is the integer
> > > look up for
> > > the word "the".
> > >
> > >
> > > Do you see the problem?
> > >
> > >
> > > --
> > > --
> > > Michael Segel
> > > Principal
> > > Michael Segel Consulting Corp.
> > > derby -=-@segel.com: <mailto:-@segel.com>
> > > (312) 952-8175 [mobile]
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
> >
>
> Craig Russell
> Architect, Sun Java Enterprise System http://java.sun.com/products/jdo
> 408 276-5638 mailto:Craig.Russell@sun.com <Craig.Russell@sun.com>
> P.S. A good JDO? O, Gasp!
>
>
>

Mime
View raw message