Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 24407 invoked from network); 28 May 2009 11:08:43 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 28 May 2009 11:08:43 -0000 Received: (qmail 52247 invoked by uid 500); 28 May 2009 11:08:53 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 52165 invoked by uid 500); 28 May 2009 11:08:52 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 52155 invoked by uid 99); 28 May 2009 11:08:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 May 2009 11:08:52 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of anshumg@gmail.com designates 209.85.132.251 as permitted sender) Received: from [209.85.132.251] (HELO an-out-0708.google.com) (209.85.132.251) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 May 2009 11:08:44 +0000 Received: by an-out-0708.google.com with SMTP id b6so2446932ana.5 for ; Thu, 28 May 2009 04:08:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=h16QZ8uokDtLl8yM0gkcwFfoBwfw2PfaEXG0RD4IBVg=; b=tMRrN6xc5fagimzbv7rtsuAd55m8iPh4gFTc8lI8QOEwHRBFHWq9rUQd69Dm7Sj7Jc LlsduyhRTtJeATwpkWJXvrGd35mruxIqE6HjIFoVvXZUvj3SFmbBU+gcImVRgurvxEOr UCXfLOpITcWtUmJNb6i/zi9spL34H7Or2fVtc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=YFo7LTL6JyRI03Iq9kVdfe9cxN/y4XNaRSiA4cv1SNP84JgyIL03IG6RW4I5xvCbHy +V9rtS2eGRLtjq/d3CcYykFhAY/ysi3K33mFLWcb30bWptZaziXvRyFiWh5DI7tuBAFH TaiXrSzqMHx9Kh/VjcLLkdb1KRChTKk39AMa8= MIME-Version: 1.0 Received: by 10.100.198.3 with SMTP id v3mr1625684anf.56.1243508896271; Thu, 28 May 2009 04:08:16 -0700 (PDT) In-Reply-To: <11975c90905280352n3fec80b0we5c7b96e853024be@mail.gmail.com> References: <894870ec0905280322r11a49321i52495a61e530e7d4@mail.gmail.com> <11975c90905280352n3fec80b0we5c7b96e853024be@mail.gmail.com> Date: Thu, 28 May 2009 16:38:16 +0530 Message-ID: <867513fe0905280408yf7997a5i72036007e8ae0b94@mail.gmail.com> Subject: Re: Help Needed... From: Anshum To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=0016e645ba180ab7a5046af6fafe X-Virus-Checked: Checked by ClamAV on apache.org --0016e645ba180ab7a5046af6fafe Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Indexing/Storing are at developers discretion. You may choose to store or not store a field as per your requirement. -- Anshum Gupta Naukri Labs! http://ai-cafe.blogspot.com The facts expressed here belong to everybody, the opinions to me. The distinction is yours to draw............ On Thu, May 28, 2009 at 4:22 PM, Alexander Aristov < alexander.aristov@gmail.com> wrote: > you will need to develop parser and indexer. > > but remember that in current implementation content is not stored in lucene > index, > > indexed - yes nut not stored. > > Best Regards > Alexander Aristov > > > 2009/5/28 Gaurav Kumar > > > Hi everyone, > > > > I am doing a project using Lucene where i need to index HTML files. I am > > using Tika to parse HTML files. But i need to index files according to > > their > > tags which means that every text present in different HTML tag (like

> > ) should be stored in different fields. Can i do that. If yes how? > Also > > can i assign different weightage to the tokens present in different > fields. > > If yes how? > > > --0016e645ba180ab7a5046af6fafe--