Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 18015 invoked from network); 3 Oct 2008 16:51:36 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 3 Oct 2008 16:51:36 -0000 Received: (qmail 69049 invoked by uid 500); 3 Oct 2008 16:51:27 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 69020 invoked by uid 500); 3 Oct 2008 16:51:27 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 69007 invoked by uid 99); 3 Oct 2008 16:51:27 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Oct 2008 09:51:27 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=RCVD_IN_BL_SPAMCOP_NET,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [206.190.38.57] (HELO web50303.mail.re2.yahoo.com) (206.190.38.57) by apache.org (qpsmtpd/0.29) with SMTP; Fri, 03 Oct 2008 16:50:24 +0000 Received: (qmail 36408 invoked by uid 60001); 3 Oct 2008 16:49:58 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:MIME-Version:Content-Type:Message-ID; b=x7ZDWpTeDNehLYlm9hbS+nb1kep+dx6NjqDYelTwwhN+pXAHKRSLWx0P3rhfFv0Y21owBYOR5doY09Bhkwk5zADboy7qYpuy4EvQIGxlJUGVDUN00+fz3dpNRPZ24YuW/KhNuAW/6fbofkAv65Y9CeHFVaU0AM0vzViOPIHOISU=; X-YMail-OSG: ZTy_zzwVM1n24jYR2bJBwJMX0Ok4ZMJfzSCouRE6iMORG4CtKiSLTWatVIMICj_BijEvBtnbjpJCDfe3gKNhsjAJlh_CO.U9SIVzm4ND3zTfmS3BqtP8_kQu2uwHQu_EkBq7Drjd6pdPYqZEkHGALjJxM4hiAUyXl3criEuWiiti9Ndu Received: from [167.206.188.3] by web50303.mail.re2.yahoo.com via HTTP; Fri, 03 Oct 2008 09:49:57 PDT X-Mailer: YahooMailRC/1096.40 YahooMailWebService/0.7.218.2 Date: Fri, 3 Oct 2008 09:49:57 -0700 (PDT) From: Otis Gospodnetic Subject: Re: Extracting Dates To: java-user@lucene.apache.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Message-ID: <999046.36339.qm@web50303.mail.re2.yahoo.com> X-Virus-Checked: Checked by ClamAV on apache.org David, this is not really a Lucene issue. Here is some Perl code that you could either use or rewrite in Java if you need it in Java: http://search.cpan.org/dist/Date-Extract/ Tika won't help with this, and I believe UIMA itself with not help either, although there may be components for date extraction that plug into UIMA. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: David Lee > To: java-user@lucene.apache.org > Sent: Thursday, October 2, 2008 7:18:22 PM > Subject: Extracting Dates > > What should I use if I want to try to extract events (dates/times) out of an > HTML page? I looked at Tika since it's a parsing project. Am I on the right > track or is there something better to use? It also seems like Apache UIMA is > kind of doing that, but I'm not sure. I thought since a lot of these > projects are associated to lucene, someone might know. > > David Lee --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org