lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Moray McConnachie <mmcco...@oxford-analytica.com>
Subject RE: Token declared final ?
Date Thu, 25 Mar 2004 16:35:47 GMT
http://web.media.mit.edu/~hugo/montytagger/ adapts the Brill tagger for Java
- unsure what other changes are there - any good to you?

Yours,
Moray

------------------------------------
Moray McConnachie, IT Manager
Oxford Analytica http://www.oxan.com 

> -----Original Message-----
> From: Thimal Jayasooriya [mailto:thimal@cs.york.ac.uk]
> Sent: 23 March 2004 18:03
> To: Lucene Developers List
> Subject: Re: Token declared final ?
> 
> 
> Hi Doug,
> 
> That's brilliant :) I didn't want to use an existing field because I 
> wasn't sure if there was anything that relied explicitly on type 
> returning the default "word". There might be a few cases 
> where I would 
> have liked to store multiple tags (for words with slightly ambiguous 
> meanings), but I can sort that out. Thanks for the pointer 
> and also for 
> taking the time to explain.
> 
> As a general matter, would anyone else be interested in having POS 
> information for Tokens ? I use one library which isn't open 
> sourced for 
> tagging (QTag), but I'd be happy to contribute the interface code if 
> anyone feels they could use it.
> 
> More info on the tools I use can be found here : 
> http://www-users.cs.york.ac.uk/~thimal/tools.php
> If you have or know of an open source tagger, I'd be keen on 
> making my 
> code play nicely with it too :)
> 
> Regards,
> Thimal
> 
> Doug Cutting wrote:
> 
> > The 'type' field of Token would be a good place for Part-of-Speech. 
> > Does that work for you?  If not, perhaps we should make 
> Token non-final.
> >
> > As has been discussed before, Lucene uses final for two 
> reasons.  The 
> > first is historical: long ago it used to make things faster by 
> > permitting javac to inline things.  The second is that some classes 
> > are not designed to be subclassed, e.g., subclassing Field 
> or Document 
> > will generally cause more confusion than it will simplify an 
> > application. The problem is sometimes determining which 
> case is which.
> >
> > Doug
> >
> > Thimal Jayasooriya wrote:
> 
> <snipped parts of the original mail>
> 
> >> When I looked at the source for Token 
> >> (org.apache.lucene.analysis.token), however, I found that 
> it has been 
> >> declared final. I had intended to subclass Token to also 
> keep a POS 
> >> marker and use it later within the Analyzer. Could someone please 
> >> give me some information on why Token was declared as final ? I am 
> >> sure I've missed something, but I can't see what it is.. 
> Alternately, 
> >> does it makes more sense to store the POS information 
> elsewhere ? I 
> >> would probably need it at index time only.
> >>
> 
> -- 
> Thimal Jayasooriya,
> Department of Computer Science,
> The University of York
> http://www.cs.york.ac.uk/~thimal/
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message