lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ranjan K. Baisak" <ranjanbai...@yahoo.com>
Subject RE: Hi Experts
Date Wed, 29 Mar 2006 07:02:15 GMT
I am using HTMLparser to parse all html pages and to
get required information out of that.
Let me tell you my crawler.
I have to search for all pages of group website(e.g.
www.group.com and it contians links to
news.groups.com, forum.news.com etc...).So I will get
all links of group website using Parser and download
all html pages.
Once download is completed, I will use html parser to
parse and get required information. This is based on
business requirement.
Then I would create Lucene index using Lucene.
Insearch I am using lucene api as well as Lucene
highlighter utility to highlight search tokens.

I used to refresh my index on everyday. This is based
on crone job in Spring.

Hope it would help you for your requirement.
Incase you need any help please IM me
ranjanbaisak@yahoo.com

-R

--- "Babu, KameshNarayana (GE, Research, consultant)"
<kameshnarayana.babu@ge.com> wrote:

> 
> "
> -----Original Message-----
> From: Ranjan K. Baisak
> [mailto:ranjanbaisak@yahoo.com]
> Sent: Wednesday, March 29, 2006 12:06 PM
> To: java-user@lucene.apache.org
> Subject: Re: Hi Experts
> 
> 
> For internet searching Nutch is the best tool. But
> however as you dont want to use cygwin then you need
> to use Lucene in following way.
> You need to download whole page and create an index
> out of that page. Then use lucene to search offline
> content than online.
> I have used lucene in this way and I have really got
> a
> good result to search my intranet.
> I am not sure whether this would meet your
> requirement.
> 
> BTW have you tried google api.
> 
> - R"
> 
> 
> Dear Rajan, thanks for your mail.  Tats really a
> good idea. I can use Lucene after downloading the
> webpage in the machine. Which tool (open source) you
> suggest me to download the webpage to the local
> machine? Or Which (open source) tool you are using
> to do that?.
> 
> Regards
> Kamesh
> 
> --- "Babu, KameshNarayana (GE, Research,
> consultant)"
> <kameshnarayana.babu@ge.com> wrote:
> 
> > Hi Experts,
> > 
> > Iam  a new bie. Iam suppose to select a search
> > engine for my project, which should search from
> the
> > given URL and display the result.
> > 
> > I should use the search engine in windows OS only.
> I
> > should not use any other external tool like CGYWIN
> > which is used in NUTCH. 
> > 
> > Any expet guidance would be much appreciated .
> > 
> > Thanks 
> > 
> > Kamesh
> > 
> >
>
---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> > java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail:
> > java-user-help@lucene.apache.org
> > 
> > 
> 
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail:
> java-user-help@lucene.apache.org
> 
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail:
> java-user-help@lucene.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message