lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andi Vajda <a...@osafoundation.org>
Subject Re: PHP-Lucene Integration
Date Tue, 05 Apr 2005 17:58:46 GMT

As an alternative, you could also take the approach taken for PyLucene: 
compile the Java code with GCJ and generate bindings for Python with SWIG.
SWIG supports a number of languages in addition to Python such as Ruby, PHP, 
Perl, and a bunch more.

For more information, see:
   http://pylucene.osafoundation.org
   http://www.python.org/pycon/2005/papers/27/paper.txt
   http://www.swig.org
   http://gcc.gnu.org/java

As a matter of fact, a team of people is working on such a construction for 
Ruby at the moment.

Andi..


On Tue, 5 Apr 2005, Giovanni Novelli wrote:

> As Lucene native language is Java it should be more natural to access its
> functionalities through JSP; anyway the idea of accessing Lucene
> functionalities seems interesting as PHP is perhaps most widely deployed
> server side scripting language.
> I think that the way to provide access to Lucene API in PHP development
> should be more general and clean as possible, so in my opinion the natural
> way should be based on a single layer that interoperates with Lucene API: an
> Apache module. Then should be needed a PHP API to call such module from PHP.
>
> Having an Apache module for Lucene as a component of Lucene project should
> allow the spread of Lucene not only in PHP development arena.
>
> On Mar 27, 2005 5:49 AM, Owen Densmore <owen@backspaces.net> wrote:
>>
>> Thanks all for the interesting responses. Sorry for being a bit late
>> in responding!
>>
>> -- Owen
>>
>> Owen Densmore - http://backspaces.net - http://redfish.com -
>> owen@backspaces.net
>>
>> Begin forwarded message:
>>
>>> From: "Philippe Ombredanne" <pombredanne@nexb.com>
>>> Subject: RE: PHP-Lucene Integration
>>>
>>> Owen,
>>> very interesting!
>>> Anything (code) you can share?
>>
>> Hi Philippe. We will definitely make our code available. I suspect,
>> however, it is not terribly interesting. But if simply useful as a
>> "case study" that would still be good.
>>
>>> From: Dawid Weiss <dawid.weiss@cs.put.poznan.pl>
>>> Subject: Re: PHP-Lucene Integration
>>>
>>> Your implementation and ideas sound very interesting, Owen. Can we see
>>> the system anywhere in public (and play with it?)
>>
>> We'll send a link to the site fairly soon. We're having our final
>> review tomorrow, and should have a good idea when we can let folks look
>> at it.
>>
>>>> We are hoping the institute can afford to have us work on true
>>>> clustering techniques such as Carrot2 uses. (Thanks to Dawid and all
>>>> the Poznan University folks who's papers were so stimulating!)
>>>
>>> You are very welcome. We are also academic, so in the feeling of
>>> brotherhood we might help you set up a demo on-line clustering server
>>> free of charge. There really is not better clustering technique than
>>> the one devised to a particular problem and it seems like you found
>>> that niche. Although it's always worth experimenting with other stuff
>>> just for the sake of comparison. Just let me know if you're interested
>>> (if we can access the 'feed' of those plain search results I can set
>>> up the clustering demo in a few minutes, really).
>>
>> This would be really great! Indeed, we'd like to help SFI to be a bit
>> more involved with exploring their collection with innovative, research
>> oriented methods.
>>
>> Some of the staff at SFI are excited by DSpace, for example, and we'd
>> be interested in helping them explore its use in the lucene/clustering
>> context. That, and their use of Dublin Core for cataloging their
>> future work might be of general interest here in the mail list too.
>>
>>>> We did do a
>>>> quick LSA SVD on a random set of the papers to see what the
>>>> performance (both CPU and good clustering) would be like. Our
>>>> results are encouraging, and I think the frequent phrases approach
>>>> would be best for this collection.
>>>
>>> It is always going to be challanging if you attempt to cluster the
>>> entire collection, you know. I'm (or rather: I will be) working on
>>> algorithm's extensions to deal with full text documents.
>>
>> We're mainly using Abstracts and other meta data (Title, Authors, Key
>> phrases, Abstracts, Dates, and so on). These are reasonably small:
>> Abstracts are 150 words on the average over the current 1122 document
>> collection. If we include the title and key phrases, we get 172
>> words/doc.
>>
>> I suspect we could safely limit the abstracts to the first few
>> sentences too, getting us to a much smaller number. Indeed, if we
>> tossed the abstracts altogether, and used just titles and key phrases,
>> we're down to less than 20 words/doc! I bet simply using reasonable
>> preprocessing we could get small enough "snippets" as to be workable.
>>
>>> Dawid
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message