Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (athena.apache.org: domain of kasunp@opensource.lk
 designates 209.85.160.48 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAEY5pxWb=gp_f+_LdwCHa8oBBm+Fes0ryDhHT1So2ehW6omghA@mail.gmail.com>
References: 
 <CALGBXaMwj5H3zNCVgBSaW==58=bSv=M_WeHLZQxqFJWRpG24wA@mail.gmail.com>
 <CAEY5pxWb=gp_f+_LdwCHa8oBBm+Fes0ryDhHT1So2ehW6omghA@mail.gmail.com>
From: Kasun Perera <kasunp@opensource.lk>
Date: Fri, 11 May 2012 17:05:46 +0530
Message-ID: 
 <CALGBXaOgmcGTsyw4qGdpkqE_3HphCc8GPTpbwZig-+4LXo7EAQ@mail.gmail.com>
Subject: Re: Getting the frequencies by corresponding order of documents were
 indexed
To: java-user@lucene.apache.org
Content-Type: multipart/alternative; boundary=e89a8ffba1d76d2c4004bfc12420

--e89a8ffba1d76d2c4004bfc12420
Content-Type: text/plain; charset=ISO-8859-1

On Fri, May 11, 2012 at 4:52 PM, Ian Lea <ian.lea@gmail.com> wrote:

> Can't spot anything obviously wrong in your code and what you are
> trying to do should work.  Are you positive that what you think is the
> second doc is really being added second?  You only show one doc being
> added.  Are there already 7 docs in the index before you start?
>
>
>
Hi Ian

yes I'm sure 2nd doc is added second and I use debugger several times to
confirm it. If I index 10 documents, I'm getting 10 termFrequncy vectors
but their positions are changed. I gave doc #2 as example.  #5th
termfrequncy vector is correspond to doc and so on.

I figured out to overcome this but it may be not efficient. I stored
another field at indexing time, base on the content inside new field i'm
able to map the doc with its termfrequncy vector. Is there any other
efficient way? This may be a bug in Lucene?

Thanks

> --
> Ian.
>
>
> On Fri, May 11, 2012 at 8:58 AM, Kasun Perera <kasunp@opensource.lk>
> wrote:
> > I have collection of documents (say 10 documents)and i'm indexing them
> this
> > way, by storing the term vector
> >
> > StringReader strRdElt = new StringReader(content);
> >
> >
> >    Document doc = new Document();
> >
> >    String docname=docNames[docNo];
> >
> >    doc.add(new Field("doccontent", strRdElt, Field.TermVector.YES));
> >
> >    IndexWriter iW;
> >    try {
> >
> >        NIOFSDirectory dir = new NIOFSDirectory(new File(pathToIndex)) ;
> >
> >        iW = new IndexWriter(dir, new IndexWriterConfig(Version.LUCENE_35,
> >
> >                new StandardAnalyzer(Version.LUCENE_35)));
> >
> >        iW.addDocument(doc);
> >        iW.close();
> >
> >    }
> >
> > After Index all the documents, i'm getting the term-frequencies of each
> > document this way
> >
> >
> > IndexReader re = IndexReader.open(FSDirectory.open(new
> > File(pathToIndex)), true) ;
> > TermFreqVector termsFreq[];
> > for(int i=0;i<noOfDocs;i++){
> >        termsFreq[i] = re.getTermFreqVector(i, "doccontent");
> >
> >      }
> >
> > my problem is i'm not getting the termfreqncy vector correspondingly. Say
> > for 2nd document that I have indexed i'm getting it's corresponding
> > termfrequncies and terms at "termsFreq[9]"
> >
> > What is the reason for that?, how can I get the corresponding
> > termfrequncies by the order that I have indexed the documents?
> >
> >
> > --
> > Regards
> >
> > Kasun Perera
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Regards

Kasun Perera

--e89a8ffba1d76d2c4004bfc12420--