lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Lu" <chris...@gmail.com>
Subject Re: Meta- search descriptions
Date Tue, 23 Oct 2007 18:22:49 GMT
Since you only try to index your client's pages, I think it should be doable
to use regular expressions or similar to find out the meta info. Or you can
ask your clients to expose some XML or RSS that you can process more easily.

But still, accessing database directly will save you tons of time to parse
out the data.

-- 
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes

On 10/23/07, Cool Coder <techcool.kumar@yahoo.com> wrote:
>
> >Why not index their database directly?
> I should have provided about this in my first mail. Anyway, clients are
> ready to allow for indexing their DB, but they have some confidential data
> as well as information about their clients and all data are so much tightly
> coupled, it is difficult for them to allow any third-party tool to index
> their DB. And of course, this is the last option, in case I could not able
> to develop a robust indexing meachanism.
>   Now, with all these difficulties, is it possible to develop a robust
> indexer? I would appreciate your input/suggestion. It does not matter how
> relevant but I would appreciate if you can give me your opinion on this.
>
>   - BR
>   Chris Lu <chris.lu@gmail.com> wrote:
>     Why not index their database directly?
>
>
> --
> Chris Lu
> -------------------------
> Instant Scalable Full-Text Search On Any Database/Application
> site: http://www.dbsight.net
> demo: http://search.dbsight.com
> Lucene Database Search in 3 minutes:
>
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
>
>
> On 10/23/07, Cool Coder wrote:
> >
> > I was just looking into couple of search engines like indeed.com or
> > bixee.com and I really got surprised the accuracy of information they
> have
> > built in their indexes and also they provide for search result.
> > I have same sort of requirement to build indexes for all my cleints site
> > and provide search capability. WHile indexing a page, parser should know
> the
> > format/structure of the page, then only it would be possible to index a
> page
> > accurately. If site changes their content structure quickly then
> > crawler/indexer also has to change the meta-info i.e. format about the
> > page.
> >
> > I am basically developing a way of indexing my client pages to provide
> > search capability with accurate information (like there are number of
> > products in a clients page and I need to get all product data and index
> > accordingly). Hence I need some sort of Indexing which will depend upon
> meta
> > search information (Basically describe the content of pages) like the
> way I
> > have described above and indexer will work based on meta search
> information.
> >
> > Can anybody suggest me whether this is possible or not.
> >
> > regards,
> > BR
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam? Yahoo! Mail has the best spam protection around
> > http://mail.yahoo.com
>
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message