Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 20684 invoked from network); 28 May 2009 10:53:35 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 28 May 2009 10:53:35 -0000 Received: (qmail 31005 invoked by uid 500); 28 May 2009 10:53:46 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 30962 invoked by uid 500); 28 May 2009 10:53:45 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 30952 invoked by uid 99); 28 May 2009 10:53:45 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 May 2009 10:53:45 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of alexander.aristov@gmail.com designates 209.85.218.227 as permitted sender) Received: from [209.85.218.227] (HELO mail-bw0-f227.google.com) (209.85.218.227) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 May 2009 10:53:36 +0000 Received: by bwz27 with SMTP id 27so5653237bwz.5 for ; Thu, 28 May 2009 03:53:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type; bh=rv/ubkyXiYxdTkIBv6fRVnlG4U40VMM4NSJ/2M9lQMk=; b=YvRkP2bfJCPPkKDQIbr+j69tsKTZeQ49QMgLzrSxu2JrCf7VK09JGdMJiGnYIIKVdj tcojcXNVWJ95al2LQvHvpMFXufhQ0VrSLeG5Jj3L8oTi90P4++ko1153BFEoAgsGJJ9U Toccb+GvNwx7O0XJZzOGCO/cvQFdztWRkBkps= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=d7jJy3za9mhYxlFDFFOcpP4gGPOlJRJJiNOzl5Fd4gX7jxMf4vxIRN2Fha2O/j0nGQ Bp57Ad+2H6WG3bxXT8fnt+J7AucN12jjDgCK5I/ISoLz30AESkErT8eFIM2uHiBOTTS2 6SaJN6ZFhths8VURS81v6FCIqvji1mMTHZjPg= MIME-Version: 1.0 Received: by 10.239.148.206 with SMTP id g14mr95953hbb.54.1243507995074; Thu, 28 May 2009 03:53:15 -0700 (PDT) In-Reply-To: <894870ec0905280322r11a49321i52495a61e530e7d4@mail.gmail.com> References: <894870ec0905280322r11a49321i52495a61e530e7d4@mail.gmail.com> From: Alexander Aristov Date: Thu, 28 May 2009 14:52:55 +0400 Message-ID: <11975c90905280352n3fec80b0we5c7b96e853024be@mail.gmail.com> Subject: Re: Help Needed... To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=001485f7c41452612e046af6c44c X-Virus-Checked: Checked by ClamAV on apache.org --001485f7c41452612e046af6c44c Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit you will need to develop parser and indexer. but remember that in current implementation content is not stored in lucene index, indexed - yes nut not stored. Best Regards Alexander Aristov 2009/5/28 Gaurav Kumar > Hi everyone, > > I am doing a project using Lucene where i need to index HTML files. I am > using Tika to parse HTML files. But i need to index files according to > their > tags which means that every text present in different HTML tag (like

> ) should be stored in different fields. Can i do that. If yes how? Also > can i assign different weightage to the tokens present in different fields. > If yes how? > --001485f7c41452612e046af6c44c--