Return-Path: Delivered-To: apmail-db-derby-user-archive@www.apache.org Received: (qmail 57072 invoked from network); 6 Feb 2007 17:14:52 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 6 Feb 2007 17:14:52 -0000 Received: (qmail 95879 invoked by uid 500); 6 Feb 2007 17:14:57 -0000 Delivered-To: apmail-db-derby-user-archive@db.apache.org Received: (qmail 95859 invoked by uid 500); 6 Feb 2007 17:14:57 -0000 Mailing-List: contact derby-user-help@db.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: List-Id: Reply-To: "Derby Discussion" Delivered-To: mailing list derby-user@db.apache.org Received: (qmail 95847 invoked by uid 99); 6 Feb 2007 17:14:57 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Feb 2007 09:14:57 -0800 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: local policy) Received: from [206.190.53.31] (HELO smtp106.plus.mail.re2.yahoo.com) (206.190.53.31) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 06 Feb 2007 09:14:46 -0800 Received: (qmail 25305 invoked from network); 6 Feb 2007 17:14:25 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Received:X-YMail-OSG:Mime-Version:To:Message-Id:Content-Type:From:Subject:Date:X-Mailer; b=k58GpjVKDEv2IQHyCmo0Y3EpGEVYpglWtTUdyRjabNxvIVvvstdn95zP7hzaO7f3466QhVWb64gny75H7hOsvz+l4btBSyDy+0wbzfmuQotwXKGoPZY2Atk+B6STM0JP/e3RWsx78lyRix7xj48wEidfy0kGiT4mLihzpBehFS8= ; Received: from unknown (HELO ?192.1.1.100?) (nurullah_akkaya@85.97.120.170 with plain) by smtp106.plus.mail.re2.yahoo.com with SMTP; 6 Feb 2007 17:14:22 -0000 X-YMail-OSG: Hx4.SLUVM1m_Jt2ACRAEjVVEZxGBhocbJh4exJWoq8ImBG.ukXPDOf.p440P8RdEmkpp02o9dVXI2sMEk.wjRkWbo0lrDwAisvRV.OIRsx6owL5WS6ghy3KYSIOc1Jj.Ut4xqeYPKR8wEv4- Mime-Version: 1.0 (Apple Message framework v752.3) To: Derby Discussion Message-Id: Content-Type: multipart/alternative; boundary=Apple-Mail-109--798686069 From: Nurullah Akkaya Subject: Re: keeping the table ordered Date: Tue, 6 Feb 2007 11:14:02 -0600 X-Mailer: Apple Mail (2.752.3) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-109--798686069 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed > It is not quite clear to me what you are trying to achieve. Why do > you want a sequential read? Scanning the entire table of 100 > million records should take longer time than looking up a record > using a index on wordid. Have you retrieved the query plan and > made sure the index on wordid is used? Or are you talking about > doing a lookup of many different wordids in sorted order? > i did not meant sequential scanning of the whole table i meant disk i/ o( bottom paragraph explains it ) yes i checked the query plan and derby uses index to lookup records and index look up checks only two index pages. so i came to the conclusion that most of the time is lost making random i/o request for the data thats why i am trying to keep the table sorted. since sequential hard disk access is much faster than random i/o . On Feb 6, 2007, at 8:09 AM, Michael Segel wrote: >> >> > > What exactly are you trying to do? > Based on the little snippet, it looks like this is an exercise to > create a > "google like" search on a series of documents. > > The problem is that your wordID, while an integer, is not going to > be unique > enough. > wordId isn't unique at all each word in a document gets a corresponding posting entry i look up wordId for the word the then select all docId's containg the wordId. that posting list is basicly a big inverted list. what i am trying to do is keep the table sorted by wordId so insted of keeping values randomly on disk they are being written sequentialy to the file so that instead of doing random i/o i just do a sequential read from the hard drive. i don't want sequential scanning of the whole table. > > For example, search your documents where the wordID is the integer > look up for > the word "the". > > Do you see the problem? > > -- > -- > Michael Segel > Principal > Michael Segel Consulting Corp. > derby -=-@segel.com > (312) 952-8175 [mobile] > --Apple-Mail-109--798686069 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=ISO-8859-1
It is not quite clear to me what you are trying to = achieve.=A0 Why do you want a sequential read?=A0 Scanning the entire = table of 100 million records should take longer time than looking up a = record using a index on wordid.=A0 Have you retrieved the query plan and = made sure the index on wordid is used?=A0 Or are you talking about doing = a lookup of many different wordids in sorted = order?


i did not meant sequential = scanning of the whole table i meant disk i/o( bottom paragraph explains = it )
yes i checked the query plan and = derby uses index to lookup records and index look up checks only two = index pages. so i came to the conclusion that most of the time is lost = making random i/o request for the data thats why i am trying to keep the = table sorted. since sequential hard disk access is much faster than = random i/o .



On Feb 6, = 2007, at 8:09 AM, Michael Segel wrote:



What exactly are you trying to = do?
Based on the little snippet, it = looks like this is an exercise to create a
"google = like" search on a series of documents.

The problem = is that your wordID, while an integer, is not going to be = unique

wordId isn't unique at all each = word in a document gets a corresponding posting entry i look up wordId = for the word the then select all docId's containg the wordId. that = posting list is basicly a big inverted list. what i am trying to do is = keep the table sorted by wordId so insted of keeping values randomly on = disk they are being written sequentialy to the file so that instead of = doing random i/o i just do a sequential read from the hard drive. i = don't want=A0 sequential scanning of the whole table.



For example, search your documents where the wordID = is the integer look up for
the word = "the".

Do you see the problem?

--=A0
--
Michael Segel
Principal
Michael Segel = Consulting Corp.
derby -=3D-@segel.com
(312) = 952-8175 [mobile]


=A0= =A0 =A0=A0 =A0=A0 =A0=A0 =A0=A0 =A0=A0 =A0



=

= --Apple-Mail-109--798686069--