lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Patrick Turcotte" <pat...@gmail.com>
Subject Re: How to create fields from a txt file for Lucene indexing?
Date Sat, 28 Oct 2006 00:44:34 GMT
Hi Eder,

If you are using Java 5, take a look at

java.util.Scanner to read your lines,
then use String
<http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html>[]
split(String <http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html>
regex) to split on column,
and read the first element of the array to decide what field you have.

Hope this helps.

Patrick


On 10/27/06, Eder <ers.c@bol.com.br> wrote:
>
>
> Hi, Grant
>
> Sorry for writing for ya... I'm a newbie in Lucene using. Could you give
> me
> a practical example for parsing a file? I tried to comprehend the
> luceneweb
> demo, but it's very complicated..
>
> I'd thank ya a lot!
>
> Eder
>
>
> ----- Original Message -----
> From: "Grant Ingersoll" <gsingers@apache.org>
> To: <general@lucene.apache.org>
> Sent: Friday, October 27, 2006 10:43 AM
> Subject: Re: How to create fields from a txt file for Lucene indexing?
>
>
> You need to read in the file and parse it according to your business
> rules (just like you would read in any file in your system) and then
> create the appropriate Fields.
>
> -Grant
> On Oct 26, 2006, at 11:56 PM, Eder wrote:
>
> > Hi all
> >
> > I'd like to create fields based in a txt.file, like the foollowing
> > example:
> >
> > File1.txt
> > Author: Eder
> > Description: Indexing txt files in Lucene Tutorial
> > Category: Software Development
> >
> > File2.txt
> > Author: Cecilia
> > Title: Preventioning Fever
> > Category: Health y Wellness
> >
> > So, I'd like to create the fields "Author", "Description", "Title"  and
> > "Category" by reading the files. If I got the texts, I would
> do  something
> > like:
> >
> > Document doc = new Document( );
> > doc.add(New field("Author","Eder"));
> >
> > But this info is in txt files, so how can I read the file and get  the
> > data?
> >
> >
> > Great Hugh,
> >
> > Eder Rebou├žas dos Santos
> > Salvador / BA - Brasil
>
> --------------------------
> Grant Ingersoll
> Sr. Software Engineer
> Center for Natural Language Processing
> Syracuse University
> 335 Hinds Hall
> Syracuse, NY 13244
> http://www.cnlp.org
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message