lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kelvin Tan" <kel...@relevanz.com>
Subject Re: Re:_HTML_parser
Date Wed, 24 Apr 2002 10:26:17 GMT
Otis, what's the final conclusion you've arrived at regarding the HTML
filter/parsing?

I have pretty much the same requirements as you do right now (extract text,
and obtain the title).

Kelvin

----- Original Message -----
From: "Otis Gospodnetic" <otis_gospodnetic@yahoo.com>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Monday, April 22, 2002 12:27 AM
Subject: Re:_HTML_parser


> Laura,
>
> http://marc.theaimsgroup.com/?l=lucene-user&w=2&r=1&s=Spindle&q=b
>
> Oops, it's JoBo, not MoJo :)
> http://www.matuschek.net/software/jobo/
>
> Otis
>
> --- "lucene@libero.it" <lucene@libero.it> wrote:
> > Hi Otis,
> >
> > thanks for your reply. I have been looking for Spindle and Mojo for 2
> >
> > hours but I don't found anything.
> >
> > Can you help me? Wher can I find something?
> >
> > Thanks for your help and time
> >
> >
> > Laura
> >
> >
> >
> >
> > > Laura,
> > >
> > > Search the lucene-user and lucene-dev archives for things like:
> > > crawler
> > > spider
> > > spindle
> > > lucene sandbox
> > >
> > > Spindle is something you may want to look at, as is MoJo (not
> > mentione
> > d
> > > on lucene lists, use Google).
> > >
> > > Otis
> > >
> > > > Did someone solve the problem to spider recursively a web pages?
> > >
> > > > > >While trying to research the same thing, I found the
> > > > following...here
> > > > 's a
> > > > > >good example of link extraction.....
> > > > >
> > > > > Try http://www.quiotix.com/opensource/html-parser
> > > > >
> > > > > Its easy to write a Visitor which extracts the links; should
> > take
> > > > abou
> > > > t ten
> > > > > lines of code.
> > >
> > >
> > > __________________________________________________
> > > Do You Yahoo!?
> > > Yahoo! Games - play chess, backgammon, pool and more
> > > http://games.yahoo.com/
> > >
> > > --
> > > To unsubscribe, e-mail:   <mailto:lucene-user-
> > unsubscribe@jakarta.apache.org>
> > > For additional commands, e-mail: <mailto:lucene-user-
> > help@jakarta.apache.org>
> > >
> > >
>
>
> __________________________________________________
> Do You Yahoo!?
> Yahoo! Games - play chess, backgammon, pool and more
> http://games.yahoo.com/
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message