lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Karman <pe...@peknet.com>
Subject Re: [lucy-user] Input format to Lucy
Date Thu, 21 Feb 2013 22:55:32 GMT
Anil Pachuri wrote on 2/21/13 3:22 PM:
> 
> 
> Hi,
> 
> Does Lucy have a utility to accept raw XML files as input? I have 50 XML files and I
need to index selected fields in them using Lucy.
> 

If you install SWISH::Prog::Lucy from CPAN, you get the swish3 tool installed
which will index XML (and HTML et al) files for Lucy. You can specify which XML
elements you want treated as Lucy fields with a configuration file. For example:

# a document like
<doc>
  <foo>bar</foo>
</doc>

# a config file like
MetaNames foo
PropertyNames foo

# and then index the file like:

% swish3 -F lucy -c configfile -i doc.xml

# and search like:

% swish3 -q foo:bar

The configuration docs are at:

http://swish-e.org/docs/swish-config.html

You might also want to look at Dezi, which does the same thing with a
server/client setup. http://dezi.org/



> Also, is there any general perl utility to merge multiple XML files or convert these
into tabular format?

CPAN has many XML handling tools. I'm sure there's something there that will do
most or all of what you want.


-- 
Peter Karman  .  http://peknet.com/  .  peter@peknet.com

Mime
View raw message