Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 13494 invoked from network); 23 Oct 2007 18:23:32 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 23 Oct 2007 18:23:32 -0000 Received: (qmail 88284 invoked by uid 500); 23 Oct 2007 18:23:12 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 88170 invoked by uid 500); 23 Oct 2007 18:23:12 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 88158 invoked by uid 99); 23 Oct 2007 18:23:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Oct 2007 11:23:12 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of chris.lu@gmail.com designates 64.233.182.188 as permitted sender) Received: from [64.233.182.188] (HELO nf-out-0910.google.com) (64.233.182.188) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Oct 2007 18:23:13 +0000 Received: by nf-out-0910.google.com with SMTP id d3so1176891nfc for ; Tue, 23 Oct 2007 11:22:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; bh=LQFihw/qWawqXCDSJh4add/be1t3PPvGt9MPrOsgQHo=; b=bFlRUB4nY5/ySYv0jTZDB1lQPfo7bXQDRiMLM9IgQLObPetGcpw1caRMcrWvO6wxJ6sLOJ4xcZbF7A3tVAmLUtnOGBBilnKXXkCTdUVdTn/QXi8oxT8xDFLHRxzBVzgflrCAIIwN210KOml/hCCEvlCc4lHfWHCxdqLaPfDEnc4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=XXELYcY6VtB9KH5imBn0Vu0+d/KwduUNQW/EGjRmokTG4AoCg/jClU6j/QNjrmVg1o1dj/nCvuB+EDl8/TkKhjbNak6qDTZWsBqGsX4lPt4e9MJFAbetDG2+bSQ0VctPdpFOI3kfDSlQOuT+/Kp9Ab8TuiIxMJIFxEMeUYph/vg= Received: by 10.78.122.16 with SMTP id u16mr4329906huc.1193163769798; Tue, 23 Oct 2007 11:22:49 -0700 (PDT) Received: by 10.78.140.9 with HTTP; Tue, 23 Oct 2007 11:22:49 -0700 (PDT) Message-ID: <6e3ae6310710231122k3daa5c25gd38cd7fb6c7cf4a4@mail.gmail.com> Date: Tue, 23 Oct 2007 11:22:49 -0700 From: "Chris Lu" To: java-user@lucene.apache.org Subject: Re: Meta- search descriptions In-Reply-To: <57669.58876.qm@web59204.mail.re1.yahoo.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_2661_14676035.1193163769774" References: <6e3ae6310710230925q23199796md9c4bf1423dcd71b@mail.gmail.com> <57669.58876.qm@web59204.mail.re1.yahoo.com> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_2661_14676035.1193163769774 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Since you only try to index your client's pages, I think it should be doable to use regular expressions or similar to find out the meta info. Or you can ask your clients to expose some XML or RSS that you can process more easily. But still, accessing database directly will save you tons of time to parse out the data. -- Chris Lu ------------------------- Instant Scalable Full-Text Search On Any Database/Application site: http://www.dbsight.net demo: http://search.dbsight.com Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes On 10/23/07, Cool Coder wrote: > > >Why not index their database directly? > I should have provided about this in my first mail. Anyway, clients are > ready to allow for indexing their DB, but they have some confidential data > as well as information about their clients and all data are so much tightly > coupled, it is difficult for them to allow any third-party tool to index > their DB. And of course, this is the last option, in case I could not able > to develop a robust indexing meachanism. > Now, with all these difficulties, is it possible to develop a robust > indexer? I would appreciate your input/suggestion. It does not matter how > relevant but I would appreciate if you can give me your opinion on this. > > - BR > Chris Lu wrote: > Why not index their database directly? > > > -- > Chris Lu > ------------------------- > Instant Scalable Full-Text Search On Any Database/Application > site: http://www.dbsight.net > demo: http://search.dbsight.com > Lucene Database Search in 3 minutes: > > http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes > > > On 10/23/07, Cool Coder wrote: > > > > I was just looking into couple of search engines like indeed.com or > > bixee.com and I really got surprised the accuracy of information they > have > > built in their indexes and also they provide for search result. > > I have same sort of requirement to build indexes for all my cleints site > > and provide search capability. WHile indexing a page, parser should know > the > > format/structure of the page, then only it would be possible to index a > page > > accurately. If site changes their content structure quickly then > > crawler/indexer also has to change the meta-info i.e. format about the > > page. > > > > I am basically developing a way of indexing my client pages to provide > > search capability with accurate information (like there are number of > > products in a clients page and I need to get all product data and index > > accordingly). Hence I need some sort of Indexing which will depend upon > meta > > search information (Basically describe the content of pages) like the > way I > > have described above and indexer will work based on meta search > information. > > > > Can anybody suggest me whether this is possible or not. > > > > regards, > > BR > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam protection around > > http://mail.yahoo.com > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com ------=_Part_2661_14676035.1193163769774--