lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: Getting the frequencies by corresponding order of documents were indexed
Date Fri, 11 May 2012 12:50:21 GMT
What version of lucene are you using?  If not the latest, try that.
If you really think there is a lucene bug post a small self-contained
test case that demonstrates the problem.


--
Ian.


On Fri, May 11, 2012 at 12:35 PM, Kasun Perera <kasunp@opensource.lk> wrote:
> On Fri, May 11, 2012 at 4:52 PM, Ian Lea <ian.lea@gmail.com> wrote:
>
>> Can't spot anything obviously wrong in your code and what you are
>> trying to do should work.  Are you positive that what you think is the
>> second doc is really being added second?  You only show one doc being
>> added.  Are there already 7 docs in the index before you start?
>>
>>
>>
> Hi Ian
>
> yes I'm sure 2nd doc is added second and I use debugger several times to
> confirm it. If I index 10 documents, I'm getting 10 termFrequncy vectors
> but their positions are changed. I gave doc #2 as example.  #5th
> termfrequncy vector is correspond to doc and so on.
>
> I figured out to overcome this but it may be not efficient. I stored
> another field at indexing time, base on the content inside new field i'm
> able to map the doc with its termfrequncy vector. Is there any other
> efficient way? This may be a bug in Lucene?
>
> Thanks
>
>> --
>> Ian.
>>
>>
>> On Fri, May 11, 2012 at 8:58 AM, Kasun Perera <kasunp@opensource.lk>
>> wrote:
>> > I have collection of documents (say 10 documents)and i'm indexing them
>> this
>> > way, by storing the term vector
>> >
>> > StringReader strRdElt = new StringReader(content);
>> >
>> >
>> >    Document doc = new Document();
>> >
>> >    String docname=docNames[docNo];
>> >
>> >    doc.add(new Field("doccontent", strRdElt, Field.TermVector.YES));
>> >
>> >    IndexWriter iW;
>> >    try {
>> >
>> >        NIOFSDirectory dir = new NIOFSDirectory(new File(pathToIndex)) ;
>> >
>> >        iW = new IndexWriter(dir, new IndexWriterConfig(Version.LUCENE_35,
>> >
>> >                new StandardAnalyzer(Version.LUCENE_35)));
>> >
>> >        iW.addDocument(doc);
>> >        iW.close();
>> >
>> >    }
>> >
>> > After Index all the documents, i'm getting the term-frequencies of each
>> > document this way
>> >
>> >
>> > IndexReader re = IndexReader.open(FSDirectory.open(new
>> > File(pathToIndex)), true) ;
>> > TermFreqVector termsFreq[];
>> > for(int i=0;i<noOfDocs;i++){
>> >        termsFreq[i] = re.getTermFreqVector(i, "doccontent");
>> >
>> >      }
>> >
>> > my problem is i'm not getting the termfreqncy vector correspondingly. Say
>> > for 2nd document that I have indexed i'm getting it's corresponding
>> > termfrequncies and terms at "termsFreq[9]"
>> >
>> > What is the reason for that?, how can I get the corresponding
>> > termfrequncies by the order that I have indexed the documents?
>> >
>> >
>> > --
>> > Regards
>> >
>> > Kasun Perera
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
> --
> Regards
>
> Kasun Perera

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message