Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: <BEEDCFDC3DC64081B154D65A1F0154AF@PTASUSLUIS>
References: 
 <CAGZ1R32gvJBUK4K_BMRwKfc60C7bTvd_ZXSGD6R6z4wH6GFetA@mail.gmail.com>
 <002e01cd0bea$b1d85510$1588ff30$@thetaphi.de>
 <CAGZ1R30ArnGf4qWXyYbxeUPG9xr5G0kRGoFKmJ9Fo=fDAih_ng@mail.gmail.com>
 <005401cd0bfc$5b495890$11dc09b0$@thetaphi.de>
 <!&!AAAAAAAAAAAYAAAAAAAAANicQdQ+8txBonKtYBOlmGHCgAAAEAAAABr16RXdZO1Ms5AX8OsxrdUBAAAAAA==@mail.telepac.pt>
 <CAL8PwkaBROa7S3gbu-5g4iwHqG2mh7MnsevoeBdOUBmtNt-jag@mail.gmail.com>
 <005401cd0c46$1ea92ce0$5bfb86a0$@thetaphi.de>
 <BEEDCFDC3DC64081B154D65A1F0154AF@PTASUSLUIS>
From: Michael McCandless <lucene@mikemccandless.com>
Date: Mon, 2 Apr 2012 15:48:52 -0400
Message-ID: 
 <CAL8Pwkb52m09-AAvTvc8kqkoz8=zQdDANtu7qMxf=vaL1mcTyw@mail.gmail.com>
Subject: Re: TVD, TVX and TVF files
To: java-user@lucene.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

As far as I can see, you are not indexing term vectors in the code
below?  Your Fields don't have TermVector.*...

Can you boil this down to a small test case showing the missing term
vector files...?

Mike McCandless

http://blog.mikemccandless.com

On Mon, Apr 2, 2012 at 1:28 PM, Luis Paiva <luismpaiva@mail.telepac.pt> wro=
te:
> Thank you for your help.
> I still haven't found a solution yet. I'm copying all my code below.
>
> BTW, I'm working with lucene version 3.5.0
>
> @Mike: Yes i do close it :) I have some files created, that are: .fdt, .f=
dx,
> .fnm, .frq, .nrm, .prx, .tii, .tis.
>
> Don't know why the files T* are not created.
>
> @Uwe: I think I'm not getting any compound files. Only those above.
>
> Anyone has the same issue?
>
>
>
> CODE --------------------------- xx -------------------------------
>
>
> package lucene;
>
> import java.io.*;
> import java.util.ArrayList;
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.index.IndexWriterConfig;
> import org.apache.lucene.store.FSDirectory;
> import org.apache.lucene.util.Version;
>
> /**
> =A0* This terminal application creates an Apache Lucene index in a folder=
 and
> adds files into this index
> =A0* based on the input of the user.
> =A0*/
> public class TextFileIndexer {
>
> =A0private IndexWriter writer;
> =A0private ArrayList<File> queue =3D new ArrayList<File>();
>
> =A0public static void main(String[] args) throws IOException {
> =A0 =A0System.out.println("Enter the path where the index will be created=
: ");
>
> =A0 =A0BufferedReader br =3D new BufferedReader(
> =A0 =A0 =A0 =A0 =A0 =A0new InputStreamReader(System.in));
> =A0 =A0String s =3D br.readLine();
>
> =A0 =A0TextFileIndexer indexer =3D null;
> =A0 =A0try {
> =A0 =A0 =A0indexer =3D new TextFileIndexer(s);
> =A0 =A0} catch (Exception ex) {
> =A0 =A0 =A0System.out.println("Cannot create index..." + ex.getMessage())=
;
> =A0 =A0 =A0System.exit(-1);
> =A0 =A0}
>
> =A0 =A0//=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D
> =A0 =A0//read input from user until he enters q for quit
> =A0 =A0//=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D
> =A0 =A0while (!s.equalsIgnoreCase("q")) {
> =A0 =A0 =A0try {
> =A0 =A0 =A0 =A0System.out.println("Enter the file or folder name to add i=
nto the
> index (q=3Dquit):");
> =A0 =A0 =A0 =A0System.out.println("[Acceptable file types: .xml, .html, .=
html,
> .txt]");
> =A0 =A0 =A0 =A0s =3D br.readLine();
> =A0 =A0 =A0 =A0if (s.equalsIgnoreCase("q")) {
> =A0 =A0 =A0 =A0 =A0break;
> =A0 =A0 =A0 =A0}
>
> =A0 =A0 =A0 =A0//try to add file into the index
> =A0 =A0 =A0 =A0indexer.indexFileOrDirectory(s);
> =A0 =A0 =A0} catch (Exception e) {
> =A0 =A0 =A0 =A0System.out.println("Error indexing " + s + " : " + e.getMe=
ssage());
> =A0 =A0 =A0}
> =A0 =A0}
>
> =A0 =A0//=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D
> =A0 =A0//after adding, we always have to call the
> =A0 =A0//closeIndex, otherwise the index is not created
> =A0 =A0//=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D
> =A0 =A0indexer.closeIndex();
> =A0}
>
> =A0/**
> =A0 * Constructor
> =A0 * @param indexDir the name of the folder in which the index should be
> created
> =A0 * @throws java.io.IOException
> =A0 */
> =A0TextFileIndexer(String indexDir) throws IOException {
> =A0 =A0// the boolean true parameter means to create a new index everytim=
e,
> =A0 =A0// potentially overwriting any existing files there.
> =A0 =A0FSDirectory dir =3D FSDirectory.open(new File(indexDir));
>
> =A0 =A0StandardAnalyzer analyzer =3D new StandardAnalyzer(Version.LUCENE_=
34);
>
> =A0 =A0IndexWriterConfig config =3D new IndexWriterConfig(Version.LUCENE_=
34,
> analyzer);
>
> =A0 =A0writer =3D new IndexWriter(dir, config);
> =A0}
>
> =A0/**
> =A0 * Indexes a file or directory
> =A0 * @param fileName the name of a text file or a folder we wish to add =
to
> the index
> =A0 * @throws java.io.IOException
> =A0 */
> =A0public void indexFileOrDirectory(String fileName) throws IOException {
> =A0 =A0//=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D
> =A0 =A0//gets the list of files in a folder (if user has submitted
> =A0 =A0//the name of a folder) or gets a single file name (is user
> =A0 =A0//has submitted only the file name)
> =A0 =A0//=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D
> =A0 =A0addFiles(new File(fileName));
>
> =A0 =A0int originalNumDocs =3D writer.numDocs();
> =A0 =A0for (File f : queue) {
> =A0 =A0 =A0FileReader fr =3D null;
> =A0 =A0 =A0try {
> =A0 =A0 =A0 =A0Document doc =3D new Document();
>
> =A0 =A0 =A0 =A0//=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D
> =A0 =A0 =A0 =A0// add contents of file
> =A0 =A0 =A0 =A0//=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D
> =A0 =A0 =A0 =A0fr =3D new FileReader(f);
> =A0 =A0 =A0 =A0doc.add(new Field("contents", fr));
>
>
>
> =A0 =A0 =A0 =A0//=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D
> =A0 =A0 =A0 =A0//adding second field which contains the path of the file
> =A0 =A0 =A0 =A0//=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D
> =A0 =A0 =A0 =A0doc.add(new Field("path", fileName,
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Field.Store.YES,
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Field.Index.NOT_ANALYZED));
>
> =A0 =A0 =A0 =A0writer.addDocument(doc);
> =A0 =A0 =A0 =A0System.out.println("Added: " + f);
> =A0 =A0 =A0} catch (Exception e) {
> =A0 =A0 =A0 =A0System.out.println("Could not add: " + f);
> =A0 =A0 =A0} finally {
> =A0 =A0 =A0 =A0fr.close();
> =A0 =A0 =A0}
> =A0 =A0}
>
> =A0 =A0int newNumDocs =3D writer.numDocs();
> =A0 =A0System.out.println("");
> =A0 =A0System.out.println("************************");
> =A0 =A0System.out.println((newNumDocs - originalNumDocs) + " documents
> added.");
> =A0 =A0System.out.println("************************");
>
> =A0 =A0queue.clear();
> =A0}
>
> =A0private void addFiles(File file) {
>
> =A0 =A0if (!file.exists()) {
> =A0 =A0 =A0System.out.println(file + " does not exist.");
> =A0 =A0}
> =A0 =A0if (file.isDirectory()) {
> =A0 =A0 =A0for (File f : file.listFiles()) {
> =A0 =A0 =A0 =A0addFiles(f);
> =A0 =A0 =A0}
> =A0 =A0} else {
> =A0 =A0 =A0String filename =3D file.getName().toLowerCase();
> =A0 =A0 =A0//=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D
> =A0 =A0 =A0// Only index text files
> =A0 =A0 =A0//=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D
> =A0 =A0 =A0if (filename.endsWith(".htm") || filename.endsWith(".html") ||
> =A0 =A0 =A0 =A0 =A0 =A0 =A0filename.endsWith(".xml") || filename.endsWith=
(".txt")) {
> =A0 =A0 =A0 =A0queue.add(file);
> =A0 =A0 =A0} else {
> =A0 =A0 =A0 =A0System.out.println("Skipped " + filename);
> =A0 =A0 =A0}
> =A0 =A0}
> =A0}
>
> =A0/**
> =A0 * Close the index.
> =A0 * @throws java.io.IOException
> =A0 */
> =A0public void closeIndex() throws IOException {
> =A0 =A0writer.close();
> =A0}
> }
>
> END OF CODE --------------------------- xx ------------------------------=
-
>
>
> -----Mensagem original-----
> De: Uwe Schindler [mailto:uwe@thetaphi.de]
> Enviada: ter=E7a-feira, 27 de Mar=E7o de 2012 19:19
> Para: java-user@lucene.apache.org
> Assunto: RE: TVD, TVX and TVF files
>
> Maybe you only see CFS files? If this is the case, your index is in compo=
und
> file format. In that case (the default), to get the raw files, disable
> compound files in the merge policy!
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>> -----Original Message-----
>> From: Michael McCandless [mailto:lucene@mikemccandless.com]
>> Sent: Tuesday, March 27, 2012 8:13 PM
>> To: java-user@lucene.apache.org
>> Subject: Re: TVD, TVX and TVF files
>>
>> The code seems OK on quick glance...
>>
>> Are you closing the writer?
>>
>> Are you hitting any exceptions?
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Tue, Mar 27, 2012 at 12:19 PM, Luis Paiva <luismpaiva@mail.telepac.pt=
>
>> wrote:
>> > Hey all,
>> >
>> > i'm in my first steps in Lucene.
>> > I was trying to index some txt files, and my program doesn't construct
>> > the term vector files. I would need these files. (.tvd, .tvx, .tvf)
>> >
>> > I'm attaching my code so anyone can help me.
>> > Thank you all in advance!
>> >
>> > Sorry if i'm repeating the question, but i couldn't find the answer to
> it.
>> >
>> >
>> > public void indexFileOrDirectory(String fileName) throws IOException {
>> >
>> > =A0 =A0addFiles(new File(fileName));
>> >
>> > =A0 =A0int originalNumDocs =3D writer.numDocs();
>> > =A0 =A0for (File f : queue) {
>> > =A0 =A0 =A0FileReader fr =3D null;
>> > =A0 =A0 =A0try {
>> > =A0 =A0 =A0 =A0Document doc =3D new Document();
>> >
>> > =A0 =A0 =A0 =A0fr =3D new FileReader(f);
>> > =A0 =A0 =A0 =A0doc.add(new Field("contents", fr));
>> >
>> > =A0 =A0 =A0 =A0doc.add(new Field("path", fileName, Field.Store.YES,
>> > Field.Index.NOT_ANALYZED));
>> >
>> > =A0 =A0 =A0 =A0String xpto =3D "xpto1 xpto2 xpto3";
>> > =A0 =A0 =A0 =A0doc.add(new Field("contents2", xpto, Field.Store.YES,
>> > Field.Index.ANALYZED, Field.TermVector.YES));
>> >
>> > =A0 =A0 =A0 =A0writer.addDocument(doc);
>> > =A0 =A0 =A0 =A0System.out.println("Added: " + f);
>> > =A0 =A0 =A0} catch (Exception e) {
>> > =A0 =A0 =A0 =A0System.out.println("Could not add: " + f);
>> > =A0 =A0 =A0} finally {
>> > =A0 =A0 =A0 =A0fr.close();
>> > =A0 =A0 =A0}
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org