Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 55268 invoked from network); 20 Dec 2006 15:29:59 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 20 Dec 2006 15:29:59 -0000 Received: (qmail 3103 invoked by uid 500); 20 Dec 2006 15:29:58 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 3070 invoked by uid 500); 20 Dec 2006 15:29:58 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 3059 invoked by uid 99); 20 Dec 2006 15:29:57 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Dec 2006 07:29:57 -0800 X-ASF-Spam-Status: No, hits=0.9 required=10.0 tests=FORGED_YAHOO_RCVD,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of lists@nabble.com designates 72.21.53.35 as permitted sender) Received: from [72.21.53.35] (HELO talk.nabble.com) (72.21.53.35) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Dec 2006 07:29:47 -0800 Received: from [72.21.53.38] (helo=jubjub.nabble.com) by talk.nabble.com with esmtp (Exim 4.50) id 1Gx3Nn-0003YU-63 for java-user@lucene.apache.org; Wed, 20 Dec 2006 07:29:27 -0800 Message-ID: <7991979.post@talk.nabble.com> Date: Wed, 20 Dec 2006 07:29:27 -0800 (PST) From: JT Kimbell To: java-user@lucene.apache.org Subject: Re: Help with jump from 1.4.3 to 2.0.0 In-Reply-To: <733777220612200539l5bf69df9udbd503d899201c0b@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Nabble-From: jtkimbell@yahoo.com References: <7949145.post@talk.nabble.com> <733777220612200539l5bf69df9udbd503d899201c0b@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org I've sent the code your way. I'm downloading eclipse right now so I can step through with its debugger once I get it all set up. However, I don't think I am using the same index for each of them, as this is all actually on 3 different machines. Machine A has 1.4.3 and I wrote that code on that machine. Machine B has 2.0.0 and I copied 1.4.3's code over and then 'fixed' it. Machine C has access to the necessary text files, and I just FTP them to the other machines when necessary, so the indexes are completely independent of each other. I just seem to get a null pointer exception when it reaches the August 2005 folder. I can catch the exception and continue on, but then I get none of those files indexed, so that's ~20 less that we should have indexed. I can't send anyone the actual files, but I could list the names of the files, perhaps that is throwing the indexer off? Are there any special characters that can do that? Also, I leave for a week-long vacation tomorrow, so I probably won't be able to reply or test things for a few days. Thanks so much, JT Gopikrishnan Subramani wrote: > > All I could suspect is perhaps you are trying to add documents to an index > that was originally created using Lucene 1.4.3. > > If trying to create a fresh index doesn't work, you could send me your > indexer code so I can take a look. > > -Gopi > > > On 12/19/06, JT Kimbell wrote: >> >> >> Hi, >> >> I'm working on learning Lucene for my job, and the book one of my >> professors >> purchased for myself and her is Lucene In Action, which is a good book >> but >> it is based on version 1.4.3 (I believe). I am beginning to grasp a lot >> of >> the basic concepts behind Lucene and have a basic searching and indexing >> program written on the said professor's server (which is running 1.4.3). >> However, on my server for work I am using 2.0.0 and it was agreed that it >> would be best that I use the newer version. My program ran fine using >> 1.4.3, but once I made a few changes to make it compatible with 2.0.0 it >> now >> returns a Null Pointer Exception about 80% of the way through. >> >> For some background on the files, they are all .txt files stored in a >> directory that has folders representing different years (e.g. 2005), >> within >> that there are month folders (August 2005) and those folders contain all >> the >> documents. When I catch the exception and print while File f my program >> is >> currently on, it says it is that August 2005 folder. My program is >> exactly >> the same except for updating Field to be compatible with 2.0.0 and the >> data >> is an exact copy of the other data. >> >> So I suppose I have two questions: >> >> 1) The relevant methods from the two programs are below, does anyone >> have >> any ideas why this isn't working, am I doing something wrong or assuming >> something I shouldn't? (If you need to see the full code with all >> comments >> for either program, let me know). >> >> 2) Is there a good tutorial or something online for version 2.0.0 just >> to >> help me understand it better? Do you have any tips? >> >> Version 1.4.3 Code >> //This method recursively calls itself when it finds a directory >> public void indexDirectory(IndexWriter writer, File dir) throws >> IOException{ >> File[] files = dir.listFiles(); >> >> for(int i = 0; i < files.length; i++){ >> File f = files[i]; >> if (f.isDirectory()){ >> indexDirectory(writer, f); >> }else if (f.getName().endsWith(".txt")){ >> indexFile(writer, f); >> } >> } >> } >> >> //This method indexes each individual file >> public void indexFile(IndexWriter writer, File f) throws >> IOException{ >> >> if(f.isHidden() || !f.exists() || !f.canRead()){ >> return; >> } >> >> Document doc = new Document(); >> doc.add(Field.Text("contents", new FileReader(f))); >> doc.add(Field.Keyword("filename", f.getCanonicalPath())); >> writer.addDocument(doc); >> } >> >> Version 2.0.0 Code >> //This method recursively calls itself when it finds a directory >> public void indexDirectory(IndexWriter writer, File dir) throws >> IOException{ >> File[] files = dir.listFiles(); >> >> for(int i = 0; i < files.length; i++){ >> File f = files[i]; >> try{ >> if (f.isDirectory()){ >> indexDirectory(writer, f); >> }else if (f.getName().endsWith(".txt")){ //Seems >> this is where it is first thrown... >> indexFile(writer, f); >> System.out.println(f); >> } >> }catch(NullPointerException npe){ >> npe.printStackTrace(System.out); >> System.out.println("File is: " + f); >> } >> } >> } >> >> //This method indexes each individual file >> public void indexFile(IndexWriter writer, File f) throws >> IOException{ >> >> if(f.isHidden() || !f.exists() || !f.canRead()){ >> return; >> } >> >> Document doc = new Document(); >> doc.add(new Field("contents", new FileReader(f))); >> doc.add(new Field("filename", f.getCanonicalPath(), >> Field.Store.YES, Field.Index.UN_TOKENIZED)); >> writer.addDocument(doc); >> } >> >> Thanks so much for any help you can give me. It seems strange to me that >> when I print File f, it prints out a directory name (August 2005), but >> got >> past the isDirectory statement and is now checking to see if it has a >> .txt >> extension. >> >> Thanks, >> >> JT >> -- >> View this message in context: >> http://www.nabble.com/Help-with-jump-from-1.4.3-to-2.0.0-tf2846591.html#a7949145 >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> > > -- View this message in context: http://www.nabble.com/Help-with-jump-from-1.4.3-to-2.0.0-tf2846591.html#a7991979 Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org