db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mamta Satoor" <msat...@gmail.com>
Subject Re: keeping the table ordered
Date Tue, 06 Feb 2007 18:08:05 GMT
To answer your question on compound index. It just means to define an index
which includes more than one column. eg
**
CREATE TABLE t1(c1 INT, c2 INT, c3 INT, c4 INT)
CREATE INDEX i1_2_4 ON t1(c1, c4, c2)
**
HTH,
Mamta


On 2/6/07, Nurullah Akkaya <nurullah_akkaya@yahoo.com> wrote:
>
>
> On Feb 6, 2007, at 11:25 AM, Michael Segel wrote:
>
> > Sorry to top post, on my crackberry...
> >
> > I think you missed my point.
> > Select the count of your documents that use the word 'the'.
> >
> > Ok so let's say that you want to search for all of the documents
> > that use the word 'the'.
> > You first lookup the integer representation of the word. Let's say
> > that its = 100.
> >
> > How many times is the value 100 going to be in your index?
> that varies with the document set with 2 million documents i have
> around 2.5 million 'the' entries.
>
> >
> > Ok?
> >
> > But to your other point... You see that your data is not
> > contiguous. Hmmm ok,so assuming that your primary index is wordID,
> > how do you handle documents that have multiple words? So if you
> > search on 'the' you'll get one set of data and if you then search
> > on the wordID for 'is', you'll have data that isn't in sort order
> > on the disk.
> assume the following ids.
> the -> 100
> is -> 150
> 101 -> linux
>
> i want my tables to be sorted like the following. not just the word
> the but all id's are sorted
> 100
> 100
> 100
> 100
> 100
> 100
> 101
> 101
> 101
> 101
> 150
> 150
> 150
>
> from my knowledge of databases they sorted in random order thus we
> have indexes pointing where the data is. from the upper example i am
> going to read one big chunk of data from the disk but in the bottom
> example i will read 100 then jump a buch of records and read next.
>
> 100
> 101
> 150
> 101
> 101
> 100
> 150
>
> where can i learn more about the compound index. create index
> statement in ref manual doesn't mention it?
>
> >
> > Now here's something that may help,
> > Drop all of your indexes and create a single compound index where
> > the first field is wordID.
> >
> > That may help you out...
> >
> >
> > Sent via BlackBerry.
> >
> > -Mike Segel
> > Principal
> > MSCC
> > 312 952 8175
> >
> >
> > -----Original Message-----
> > From: Nurullah Akkaya <nurullah_akkaya@yahoo.com>
> > Date: Tue, 6 Feb 2007 11:14:02
> > To:Derby Discussion <derby-user@db.apache.org>
> > Subject: Re: keeping the table ordered
> >
> > It is not quite clear to me what you are trying to achieve.  Why do
> > you want a sequential read?  Scanning the entire table of 100
> > million records should take longer time than looking up a record
> > using a index on wordid.  Have you retrieved the query plan and
> > made sure the index on wordid is used?  Or are you talking about
> > doing a lookup of many different wordids in sorted order?
> >
> >
> >
> > i did not meant sequential scanning of the whole table i meant disk
> > i/o( bottom paragraph explains it )
> > yes i checked the query plan and derby uses index to lookup records
> > and index look up checks only two index pages. so i came to the
> > conclusion that most of the time is lost making random i/o request
> > for the data thats why i am trying to keep the table sorted. since
> > sequential hard disk access is much faster than random i/o .
> >
> >
> >
> >
> >
> >
> > On Feb 6, 2007, at 8:09 AM, Michael Segel wrote:
> >
> >
> >
> >
> >
> >
> >
> >
> > What exactly are you trying to do?
> > Based on the little snippet, it looks like this is an exercise to
> > create a
> > "google like" search on a series of documents.
> >
> >
> > The problem is that your wordID, while an integer, is not going to
> > be unique
> > enough.
> >
> >
> >
> > wordId isn't unique at all each word in a document gets a
> > corresponding posting entry i look up wordId for the word the then
> > select all docId's containg the wordId. that posting list is
> > basicly a big inverted list. what i am trying to do is keep the
> > table sorted by wordId so insted of keeping values randomly on disk
> > they are being written sequentialy to the file so that instead of
> > doing random i/o i just do a sequential read from the hard drive. i
> > don't want  sequential scanning of the whole table.
> >
> >
> >
> >
> >
> > For example, search your documents where the wordID is the integer
> > look up for
> > the word "the".
> >
> >
> > Do you see the problem?
> >
> >
> > --
> > --
> > Michael Segel
> > Principal
> > Michael Segel Consulting Corp.
> > derby -=-@segel.com: <mailto:-@segel.com>
> > (312) 952-8175 [mobile]
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
>

Mime
View raw message