From general-return-171-apmail-lucene-general-archive=lucene.apache.org@lucene.apache.org Thu Sep 22 18:57:17 2005 Return-Path: Delivered-To: apmail-lucene-general-archive@www.apache.org Received: (qmail 9703 invoked from network); 22 Sep 2005 18:57:17 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 22 Sep 2005 18:57:17 -0000 Received: (qmail 39105 invoked by uid 500); 22 Sep 2005 18:57:16 -0000 Delivered-To: apmail-lucene-general-archive@lucene.apache.org Received: (qmail 39088 invoked by uid 500); 22 Sep 2005 18:57:16 -0000 Mailing-List: contact general-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@lucene.apache.org Delivered-To: mailing list general@lucene.apache.org Received: (qmail 39075 invoked by uid 99); 22 Sep 2005 18:57:16 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Sep 2005 11:57:16 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [69.55.225.129] (HELO ehatchersolutions.com) (69.55.225.129) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Sep 2005 11:57:23 -0700 Received: by ehatchersolutions.com (Postfix, from userid 504) id E47E313E2037; Thu, 22 Sep 2005 14:56:50 -0400 (EDT) Received: from [128.143.167.108] (d-128-167-108.bootp.Virginia.EDU [128.143.167.108]) by ehatchersolutions.com (Postfix) with ESMTP id 8FDE313E2034 for ; Thu, 22 Sep 2005 14:56:12 -0400 (EDT) Mime-Version: 1.0 (Apple Message framework v734) In-Reply-To: <20050922184705.23417.qmail@web53711.mail.yahoo.com> References: <20050922184705.23417.qmail@web53711.mail.yahoo.com> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: Erik Hatcher Subject: Re: How to create Index ? Date: Thu, 22 Sep 2005 14:56:08 -0400 To: general@lucene.apache.org X-Mailer: Apple Mail (2.734) X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on javelina X-Spam-Level: X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.0.1 X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Arpit - as was said below, the code is available from the Lucene in Action website (URL also below). Erik On Sep 22, 2005, at 2:47 PM, Arpit Sharma wrote: > Hi erik and others, > > Can you provide me the full code for Indexer program. > Will really appreciate it. > > THanks alot. > > --- Erik Hatcher wrote: > > >> Arpit, >> >> It looks like you've omitted the import statements >> from >> Indexer.java. The book omits import statements to >> conserve space, >> but they are important. The code is provided in its >> entirety at >> http://www.lucenebook.com >> >> In fact, you could build an index by running the >> code directly (read >> the README file and follow the instructions first) >> by typing "ant >> Indexer" and following the prompts. One of the >> prompts asks you >> where to put the index itself, and the next prompt >> asks for the >> directory of text files to index. >> >> Erik >> >> >> >> On Sep 19, 2005, at 10:34 PM, Arpit Sharma wrote: >> >> >>> I have put the .jar file in C:\lucene and I have >>> >> also >> >>> unzip it and have also put all the >>> >> directories(like >> >>> analysis,index,store) in C:\ lucene. >>> >>> Now how to create a index ? >>> all the text files are in C:\text directory. I >>> >> have >> >>> "lucene in action" book and with the help of it I >>> >> made >> >>> a Indexer.java program in C:\lucene and when I >>> >> tried >> >>> to compile it it is giving lot's of errors. >>> The code is fine(it is copy paste from the book). >>> >>> I am sure that there is some path problem. What >>> >> should >> >>> I do ? >>> >>> Thanks >>> >>> Here is the code of the Indexer.java:- >>> ---------------- >>> >>> /** * This code was originally written for >>> ** Erik's Lucene intro java.net article */ >>> >>> public class Indexer { >>> >>> public static void main(String[] args) throws >>> Exception { >>> >>> if (args.length != 2) { >>> throw new Exception("Usage: java " + >>> Indexer.class.getName() >>> + " "); >>> } >>> >>> File indexDir = new File(args[0]); >>> File dataDir = new File(args[1]); >>> >>> long start = new Date().getTime(); >>> int numIndexed = index(indexDir, dataDir); >>> long end = new Date().getTime(); >>> >>> System.out.println("Indexing " + numIndexed >>> >> + " >> >>> files took " >>> + (end - start) + " milliseconds"); >>> >>> } >>> >>> // open an index and start file directory >>> >> traversal >> >>> >>> >>> public static int index(File indexDir, File >>> >> dataDir) >> >>> >>> throws IOException { >>> if (!dataDir.exists() || >>> >> !dataDir.isDirectory()) { >> >>> >>> throw new IOException(dataDir >>> + " does not exist or is not a >>> >> directory"); >> >>> } >>> >>> IndexWriter writer = new >>> >> IndexWriter(indexDir, >> >>> >>> new StandardAnalyzer(), true); >>> writer.setUseCompoundFile(false); >>> >>> indexDirectory(writer, dataDir); >>> >>> int numIndexed = writer.docCount(); >>> >>> writer.optimize(); >>> writer.close(); >>> >>> return numIndexed; >>> } >>> >>> // recursive method that calls itself when it >>> >> finds >> >>> a directory >>> >>> private static void >>> >> indexDirectory(IndexWriter >> >>> writer, File dir) >>> throws IOException { >>> >>> File[] files = dir.listFiles(); >>> for (int i = 0; i < files.length; i++) { >>> File f = files[i]; >>> if (f.isDirectory()) { >>> indexDirectory(writer, f); >>> } else if >>> >> (f.getName().endsWith(".txt")) { >> >>> >>> indexFile(writer, f); >>> } >>> } >>> } >>> >>> // method to actually index a file using >>> >> Lucene >> >>> >>> private static void indexFile(IndexWriter >>> >> writer, >> >>> File f) >>> throws IOException { >>> >>> if (f.isHidden() || !f.exists() || >>> >> !f.canRead()) >> >>> { >>> return; >>> } >>> >>> System.out.println("Indexing " + >>> f.getCanonicalPath()); >>> >>> Document doc = new Document(); >>> doc.add(Field.Text("contents", new >>> FileReader(f))); >>> >>> doc.add(Field.Keyword("filename", >>> f.getCanonicalPath())); >>> writer.addDocument(doc); >>> } >>> } >>> >>> __________________________________________________ >>> Do You Yahoo!? >>> Tired of spam? Yahoo! Mail has the best spam >>> >> protection around >> >>> http://mail.yahoo.com >>> >>> >> >> >> > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com >