Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 81255 invoked from network); 3 Sep 2010 16:05:51 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 3 Sep 2010 16:05:51 -0000 Received: (qmail 24436 invoked by uid 500); 3 Sep 2010 16:05:49 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 24310 invoked by uid 500); 3 Sep 2010 16:05:48 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 24301 invoked by uid 99); 3 Sep 2010 16:05:48 -0000 Received: from Unknown (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Sep 2010 16:05:48 +0000 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=SPF_HELO_PASS,SPF_NEUTRAL,URI_HEX X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [216.139.236.158] (HELO kuber.nabble.com) (216.139.236.158) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Sep 2010 16:05:27 +0000 Received: from ben.nabble.com ([192.168.236.152]) by kuber.nabble.com with esmtp (Exim 4.63) (envelope-from ) id 1OrYlE-0003J3-Mn for java-user@lucene.apache.org; Fri, 03 Sep 2010 09:05:04 -0700 Date: Fri, 3 Sep 2010 09:05:04 -0700 (PDT) From: Alex vB To: java-user@lucene.apache.org Message-ID: <1283529904697-1413062.post@n3.nabble.com> Subject: Detailed file handling on hard disk MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hello everybody, I read the paper http://www2008.org/papers/pdf/p387-zhangA.pdf Performance of Compresses Inverted List Caching in Search Engines and now I am unsure how Lucene implements its structure on the hard disk. I am using Windos as OS and therefore I implemented FSDirectory based on Java.io.RandomAccessFile. How is the skipping in the .tis file realized? Do I use metadata at the beginning of each block too like in the mentioned paper above on page 388 (in the paper the metadata stores informations about how many inverted lists are in the block and where they start)? http://lucene.472066.n3.nabble.com/file/n1413062/Block_assignment.jpg Because I read in another article that I can seek to the correct position on the hard drive with the byte address using java.io.RandomAccessFile (which I can read from .tii-file in "IndexDelta"?). How do I find the correct position/location for my PostingList/Document? Do I need information/metadata about the blocks from the underlying file system? Or where can I find further informations about this stuff? :) Best regards Alex -- View this message in context: http://lucene.472066.n3.nabble.com/Detailed-file-handling-on-hard-disk-tp1413062p1413062.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org