Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DA0239BD2 for ; Fri, 11 May 2012 11:36:35 +0000 (UTC) Received: (qmail 4690 invoked by uid 500); 11 May 2012 11:36:33 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 4586 invoked by uid 500); 11 May 2012 11:36:33 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 4572 invoked by uid 99); 11 May 2012 11:36:33 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 May 2012 11:36:33 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of kasunp@opensource.lk designates 209.85.160.48 as permitted sender) Received: from [209.85.160.48] (HELO mail-pb0-f48.google.com) (209.85.160.48) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 May 2012 11:36:27 +0000 Received: by pbbrq8 with SMTP id rq8so4290476pbb.35 for ; Fri, 11 May 2012 04:36:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:x-gm-message-state; bh=SMBdFspJsVYXjXsPGk17xI4CH34L4uLWfI0E6tERw5Q=; b=kT+RZFLnD1S1jmFJEUyPDxdI3SZ9XYX4LVzymV9bea4Dr2t2JVuWaJmGf7XzLrz6Ps RMoB8rbj6INNerKMwWu6b3xzw6CF+BPiReU3FwYxUCY/zBxziclWHruQBsNJfyS6+YtB D9Sa37qfPJhpxJAD1oHIhogE+khcvEB8wFa3Pw8okHLL1P8cu0sI9eyAx6XFsj+gnA+/ J6xniCyu4tFOoqtWTqPOvAYR56c4FA7np5PPFM11poUjXARi7MUE/wL26R2rbfCilNB9 ta4pEjy1GkOOEB9VKUgyAPlW2lj5VRAHWIY4UGoUsDWwyyXewyh8dJcrOJoIMbDLr82s gepw== Received: by 10.68.217.38 with SMTP id ov6mr1145830pbc.25.1336736167570; Fri, 11 May 2012 04:36:07 -0700 (PDT) MIME-Version: 1.0 Received: by 10.68.228.5 with HTTP; Fri, 11 May 2012 04:35:46 -0700 (PDT) In-Reply-To: References: From: Kasun Perera Date: Fri, 11 May 2012 17:05:46 +0530 Message-ID: Subject: Re: Getting the frequencies by corresponding order of documents were indexed To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=e89a8ffba1d76d2c4004bfc12420 X-Gm-Message-State: ALoCoQkQQ1gEVd2aBvIYrQ9Ix3XWt5Mdv4sfjdvJ0IOndsyU+z09Xnnd8v+QRYmXqCjqeMApj5o2 X-Virus-Checked: Checked by ClamAV on apache.org --e89a8ffba1d76d2c4004bfc12420 Content-Type: text/plain; charset=ISO-8859-1 On Fri, May 11, 2012 at 4:52 PM, Ian Lea wrote: > Can't spot anything obviously wrong in your code and what you are > trying to do should work. Are you positive that what you think is the > second doc is really being added second? You only show one doc being > added. Are there already 7 docs in the index before you start? > > > Hi Ian yes I'm sure 2nd doc is added second and I use debugger several times to confirm it. If I index 10 documents, I'm getting 10 termFrequncy vectors but their positions are changed. I gave doc #2 as example. #5th termfrequncy vector is correspond to doc and so on. I figured out to overcome this but it may be not efficient. I stored another field at indexing time, base on the content inside new field i'm able to map the doc with its termfrequncy vector. Is there any other efficient way? This may be a bug in Lucene? Thanks > -- > Ian. > > > On Fri, May 11, 2012 at 8:58 AM, Kasun Perera > wrote: > > I have collection of documents (say 10 documents)and i'm indexing them > this > > way, by storing the term vector > > > > StringReader strRdElt = new StringReader(content); > > > > > > Document doc = new Document(); > > > > String docname=docNames[docNo]; > > > > doc.add(new Field("doccontent", strRdElt, Field.TermVector.YES)); > > > > IndexWriter iW; > > try { > > > > NIOFSDirectory dir = new NIOFSDirectory(new File(pathToIndex)) ; > > > > iW = new IndexWriter(dir, new IndexWriterConfig(Version.LUCENE_35, > > > > new StandardAnalyzer(Version.LUCENE_35))); > > > > iW.addDocument(doc); > > iW.close(); > > > > } > > > > After Index all the documents, i'm getting the term-frequencies of each > > document this way > > > > > > IndexReader re = IndexReader.open(FSDirectory.open(new > > File(pathToIndex)), true) ; > > TermFreqVector termsFreq[]; > > for(int i=0;i > termsFreq[i] = re.getTermFreqVector(i, "doccontent"); > > > > } > > > > my problem is i'm not getting the termfreqncy vector correspondingly. Say > > for 2nd document that I have indexed i'm getting it's corresponding > > termfrequncies and terms at "termsFreq[9]" > > > > What is the reason for that?, how can I get the corresponding > > termfrequncies by the order that I have indexed the documents? > > > > > > -- > > Regards > > > > Kasun Perera > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > -- Regards Kasun Perera --e89a8ffba1d76d2c4004bfc12420--