Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 832959F15 for ; Mon, 2 Apr 2012 19:49:43 +0000 (UTC) Received: (qmail 33424 invoked by uid 500); 2 Apr 2012 19:49:41 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 33372 invoked by uid 500); 2 Apr 2012 19:49:41 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 33364 invoked by uid 99); 2 Apr 2012 19:49:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Apr 2012 19:49:41 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [74.125.82.176] (HELO mail-we0-f176.google.com) (74.125.82.176) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Apr 2012 19:49:34 +0000 Received: by werc1 with SMTP id c1so2566137wer.35 for ; Mon, 02 Apr 2012 12:49:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding:x-gm-message-state; bh=16csGv7OIFPBIFO0+zZGm8/gixpaCuZbqNqTqjR0YeI=; b=JttUQ4S5Akls2X4N0+k0xZDCEWWeGIv0tT8e9AUXNdgYrSql31hIiAvPYFumgv+eKO eV8iaWXVJOuPniABdfjrLdzRTIQKjexT2m4atYtN8Wyr8eoY34kU/jgEsrSoPc4fDREF P3ghxWV9d1I+tmt4CBvTODFdJGhaKLHBN0ozFZHOxhkFceYgwvjUseIcTSvnS959SRL9 vp8oBVDkEc56avtYXQt9MuDKEb0hNlzGOZfDuFG/W1JRFciyUrLvyXSa5Utza9YBL3Eq wo8wVM74xV2UrX8YNVFBllTxpaFj3TfpbjxwPOAY/Bz5Iltt/aTNIVZczYqt+sKgRdwN Rw8A== Received: by 10.180.91.10 with SMTP id ca10mr28709232wib.17.1333396152811; Mon, 02 Apr 2012 12:49:12 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.133.158 with HTTP; Mon, 2 Apr 2012 12:48:52 -0700 (PDT) In-Reply-To: References: <002e01cd0bea$b1d85510$1588ff30$@thetaphi.de> <005401cd0bfc$5b495890$11dc09b0$@thetaphi.de> <005401cd0c46$1ea92ce0$5bfb86a0$@thetaphi.de> From: Michael McCandless Date: Mon, 2 Apr 2012 15:48:52 -0400 Message-ID: Subject: Re: TVD, TVX and TVF files To: java-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQm8eyDs34jqt7GZ71olryKtRdKvSfm09HLWXWp6y+9Z5iqeMIyW/jvDrAtUb5EvCIop4sD/ As far as I can see, you are not indexing term vectors in the code below? Your Fields don't have TermVector.*... Can you boil this down to a small test case showing the missing term vector files...? Mike McCandless http://blog.mikemccandless.com On Mon, Apr 2, 2012 at 1:28 PM, Luis Paiva wro= te: > Thank you for your help. > I still haven't found a solution yet. I'm copying all my code below. > > BTW, I'm working with lucene version 3.5.0 > > @Mike: Yes i do close it :) I have some files created, that are: .fdt, .f= dx, > .fnm, .frq, .nrm, .prx, .tii, .tis. > > Don't know why the files T* are not created. > > @Uwe: I think I'm not getting any compound files. Only those above. > > Anyone has the same issue? > > > > CODE --------------------------- xx ------------------------------- > > > package lucene; > > import java.io.*; > import java.util.ArrayList; > import org.apache.lucene.analysis.standard.StandardAnalyzer; > import org.apache.lucene.document.Document; > import org.apache.lucene.document.Field; > import org.apache.lucene.index.IndexWriter; > import org.apache.lucene.index.IndexWriterConfig; > import org.apache.lucene.store.FSDirectory; > import org.apache.lucene.util.Version; > > /** > =A0* This terminal application creates an Apache Lucene index in a folder= and > adds files into this index > =A0* based on the input of the user. > =A0*/ > public class TextFileIndexer { > > =A0private IndexWriter writer; > =A0private ArrayList queue =3D new ArrayList(); > > =A0public static void main(String[] args) throws IOException { > =A0 =A0System.out.println("Enter the path where the index will be created= : "); > > =A0 =A0BufferedReader br =3D new BufferedReader( > =A0 =A0 =A0 =A0 =A0 =A0new InputStreamReader(System.in)); > =A0 =A0String s =3D br.readLine(); > > =A0 =A0TextFileIndexer indexer =3D null; > =A0 =A0try { > =A0 =A0 =A0indexer =3D new TextFileIndexer(s); > =A0 =A0} catch (Exception ex) { > =A0 =A0 =A0System.out.println("Cannot create index..." + ex.getMessage())= ; > =A0 =A0 =A0System.exit(-1); > =A0 =A0} > > =A0 =A0//=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D > =A0 =A0//read input from user until he enters q for quit > =A0 =A0//=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D > =A0 =A0while (!s.equalsIgnoreCase("q")) { > =A0 =A0 =A0try { > =A0 =A0 =A0 =A0System.out.println("Enter the file or folder name to add i= nto the > index (q=3Dquit):"); > =A0 =A0 =A0 =A0System.out.println("[Acceptable file types: .xml, .html, .= html, > .txt]"); > =A0 =A0 =A0 =A0s =3D br.readLine(); > =A0 =A0 =A0 =A0if (s.equalsIgnoreCase("q")) { > =A0 =A0 =A0 =A0 =A0break; > =A0 =A0 =A0 =A0} > > =A0 =A0 =A0 =A0//try to add file into the index > =A0 =A0 =A0 =A0indexer.indexFileOrDirectory(s); > =A0 =A0 =A0} catch (Exception e) { > =A0 =A0 =A0 =A0System.out.println("Error indexing " + s + " : " + e.getMe= ssage()); > =A0 =A0 =A0} > =A0 =A0} > > =A0 =A0//=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D > =A0 =A0//after adding, we always have to call the > =A0 =A0//closeIndex, otherwise the index is not created > =A0 =A0//=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D > =A0 =A0indexer.closeIndex(); > =A0} > > =A0/** > =A0 * Constructor > =A0 * @param indexDir the name of the folder in which the index should be > created > =A0 * @throws java.io.IOException > =A0 */ > =A0TextFileIndexer(String indexDir) throws IOException { > =A0 =A0// the boolean true parameter means to create a new index everytim= e, > =A0 =A0// potentially overwriting any existing files there. > =A0 =A0FSDirectory dir =3D FSDirectory.open(new File(indexDir)); > > =A0 =A0StandardAnalyzer analyzer =3D new StandardAnalyzer(Version.LUCENE_= 34); > > =A0 =A0IndexWriterConfig config =3D new IndexWriterConfig(Version.LUCENE_= 34, > analyzer); > > =A0 =A0writer =3D new IndexWriter(dir, config); > =A0} > > =A0/** > =A0 * Indexes a file or directory > =A0 * @param fileName the name of a text file or a folder we wish to add = to > the index > =A0 * @throws java.io.IOException > =A0 */ > =A0public void indexFileOrDirectory(String fileName) throws IOException { > =A0 =A0//=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D > =A0 =A0//gets the list of files in a folder (if user has submitted > =A0 =A0//the name of a folder) or gets a single file name (is user > =A0 =A0//has submitted only the file name) > =A0 =A0//=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D > =A0 =A0addFiles(new File(fileName)); > > =A0 =A0int originalNumDocs =3D writer.numDocs(); > =A0 =A0for (File f : queue) { > =A0 =A0 =A0FileReader fr =3D null; > =A0 =A0 =A0try { > =A0 =A0 =A0 =A0Document doc =3D new Document(); > > =A0 =A0 =A0 =A0//=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D > =A0 =A0 =A0 =A0// add contents of file > =A0 =A0 =A0 =A0//=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D > =A0 =A0 =A0 =A0fr =3D new FileReader(f); > =A0 =A0 =A0 =A0doc.add(new Field("contents", fr)); > > > > =A0 =A0 =A0 =A0//=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D > =A0 =A0 =A0 =A0//adding second field which contains the path of the file > =A0 =A0 =A0 =A0//=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D > =A0 =A0 =A0 =A0doc.add(new Field("path", fileName, > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Field.Store.YES, > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Field.Index.NOT_ANALYZED)); > > =A0 =A0 =A0 =A0writer.addDocument(doc); > =A0 =A0 =A0 =A0System.out.println("Added: " + f); > =A0 =A0 =A0} catch (Exception e) { > =A0 =A0 =A0 =A0System.out.println("Could not add: " + f); > =A0 =A0 =A0} finally { > =A0 =A0 =A0 =A0fr.close(); > =A0 =A0 =A0} > =A0 =A0} > > =A0 =A0int newNumDocs =3D writer.numDocs(); > =A0 =A0System.out.println(""); > =A0 =A0System.out.println("************************"); > =A0 =A0System.out.println((newNumDocs - originalNumDocs) + " documents > added."); > =A0 =A0System.out.println("************************"); > > =A0 =A0queue.clear(); > =A0} > > =A0private void addFiles(File file) { > > =A0 =A0if (!file.exists()) { > =A0 =A0 =A0System.out.println(file + " does not exist."); > =A0 =A0} > =A0 =A0if (file.isDirectory()) { > =A0 =A0 =A0for (File f : file.listFiles()) { > =A0 =A0 =A0 =A0addFiles(f); > =A0 =A0 =A0} > =A0 =A0} else { > =A0 =A0 =A0String filename =3D file.getName().toLowerCase(); > =A0 =A0 =A0//=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D > =A0 =A0 =A0// Only index text files > =A0 =A0 =A0//=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D > =A0 =A0 =A0if (filename.endsWith(".htm") || filename.endsWith(".html") || > =A0 =A0 =A0 =A0 =A0 =A0 =A0filename.endsWith(".xml") || filename.endsWith= (".txt")) { > =A0 =A0 =A0 =A0queue.add(file); > =A0 =A0 =A0} else { > =A0 =A0 =A0 =A0System.out.println("Skipped " + filename); > =A0 =A0 =A0} > =A0 =A0} > =A0} > > =A0/** > =A0 * Close the index. > =A0 * @throws java.io.IOException > =A0 */ > =A0public void closeIndex() throws IOException { > =A0 =A0writer.close(); > =A0} > } > > END OF CODE --------------------------- xx ------------------------------= - > > > -----Mensagem original----- > De: Uwe Schindler [mailto:uwe@thetaphi.de] > Enviada: ter=E7a-feira, 27 de Mar=E7o de 2012 19:19 > Para: java-user@lucene.apache.org > Assunto: RE: TVD, TVX and TVF files > > Maybe you only see CFS files? If this is the case, your index is in compo= und > file format. In that case (the default), to get the raw files, disable > compound files in the merge policy! > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: uwe@thetaphi.de > >> -----Original Message----- >> From: Michael McCandless [mailto:lucene@mikemccandless.com] >> Sent: Tuesday, March 27, 2012 8:13 PM >> To: java-user@lucene.apache.org >> Subject: Re: TVD, TVX and TVF files >> >> The code seems OK on quick glance... >> >> Are you closing the writer? >> >> Are you hitting any exceptions? >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> On Tue, Mar 27, 2012 at 12:19 PM, Luis Paiva >> wrote: >> > Hey all, >> > >> > i'm in my first steps in Lucene. >> > I was trying to index some txt files, and my program doesn't construct >> > the term vector files. I would need these files. (.tvd, .tvx, .tvf) >> > >> > I'm attaching my code so anyone can help me. >> > Thank you all in advance! >> > >> > Sorry if i'm repeating the question, but i couldn't find the answer to > it. >> > >> > >> > public void indexFileOrDirectory(String fileName) throws IOException { >> > >> > =A0 =A0addFiles(new File(fileName)); >> > >> > =A0 =A0int originalNumDocs =3D writer.numDocs(); >> > =A0 =A0for (File f : queue) { >> > =A0 =A0 =A0FileReader fr =3D null; >> > =A0 =A0 =A0try { >> > =A0 =A0 =A0 =A0Document doc =3D new Document(); >> > >> > =A0 =A0 =A0 =A0fr =3D new FileReader(f); >> > =A0 =A0 =A0 =A0doc.add(new Field("contents", fr)); >> > >> > =A0 =A0 =A0 =A0doc.add(new Field("path", fileName, Field.Store.YES, >> > Field.Index.NOT_ANALYZED)); >> > >> > =A0 =A0 =A0 =A0String xpto =3D "xpto1 xpto2 xpto3"; >> > =A0 =A0 =A0 =A0doc.add(new Field("contents2", xpto, Field.Store.YES, >> > Field.Index.ANALYZED, Field.TermVector.YES)); >> > >> > =A0 =A0 =A0 =A0writer.addDocument(doc); >> > =A0 =A0 =A0 =A0System.out.println("Added: " + f); >> > =A0 =A0 =A0} catch (Exception e) { >> > =A0 =A0 =A0 =A0System.out.println("Could not add: " + f); >> > =A0 =A0 =A0} finally { >> > =A0 =A0 =A0 =A0fr.close(); >> > =A0 =A0 =A0} >> > >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> > For additional commands, e-mail: java-user-help@lucene.apache.org >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org