Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 30879 invoked from network); 31 May 2007 01:34:08 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 31 May 2007 01:34:08 -0000 Received: (qmail 71976 invoked by uid 500); 31 May 2007 01:34:06 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 71918 invoked by uid 500); 31 May 2007 01:34:06 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 71905 invoked by uid 99); 31 May 2007 01:34:06 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 May 2007 18:34:06 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (herse.apache.org: local policy) Received: from [208.97.132.74] (HELO spunkymail-a16.dreamhost.com) (208.97.132.74) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 May 2007 18:34:00 -0700 Received: from [192.168.0.2] (adsl-074-229-189-244.sip.rmo.bellsouth.net [74.229.189.244]) by spunkymail-a16.dreamhost.com (Postfix) with ESMTP id 49A3C7CD71 for ; Wed, 30 May 2007 18:33:38 -0700 (PDT) Mime-Version: 1.0 (Apple Message framework v752.2) In-Reply-To: <46575FC2.2040604@alias-i.com> References: <5109939E-4067-4529-9883-9A83B2AE03DB@apache.org> <46575FC2.2040604@alias-i.com> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: Grant Ingersoll Subject: Re: Documentation Brainstorming Date: Wed, 30 May 2007 21:33:33 -0400 To: java-dev@lucene.apache.org X-Mailer: Apple Mail (2.752.2) X-Virus-Checked: Checked by ClamAV on apache.org Been meaning to get back on this, as there are some good ideas/points in here. On May 25, 2007, at 6:14 PM, Bob Carpenter wrote: > >> So, this is an open call for ideas on how we can improve our >> docs. Here are some areas I think need improving: > > Before I start suggesting improvements, let > me qualify them all by saying I'm only > taking the time to do this because I love > Lucene and use it all the time. > No need to explain your motives, we're all working towards making Lucene better > > Web Site Redesign > ------------------ > I'd like to add a request for a top-level site > redesign. I find it very difficult to find > anything on the site. This isn't just a Lucene > problem, it's partly an Apache problem. I believe > what most people want is a top-level intro to the > projects and then a pointer to where to download > and/or read hello-world getting-started docs. > (This is, for instance, how Tomcat and MySQL set > up their home pages and sites.) > > I just went to the Lucene site and still > can't figure out where to download the latest > Lucene. I start at http://lucene.apache.org/ > and get a nav choice of "who we are" > and "buy stuff" and "subprojects". > So I click on subrprojects, > which opens up a menu and then I click on > "java" (because I know that there are more > versions of Lucene than the Java version and > there's nothing else labeled just Lucene). > I then get a choice of Features, Who We Are, > Powered by Lucene, Documentation, Resources, > Site Versions, and Related Projects. > I guess the right answer is "Resources" > then "releases", then I leave the nav for the > page itself and click "downloads and releases" > but hey, I'm already there, so I have to go > into the text and click on "Apache Mirrors". > I then select a mirror and it gives me a huge > list to select from. The README gives me no > hint as to what's the latest stable version, > and each version has (old) written next > to its description. So, would you prefer the menu items be expanded by default? Also, what about the content of the actual pages outside of the menus? For instance, on the Top Level site, there are brief blurbs about what each of the projects are and on the Lucene Java site, the top level entry points to a "free download" and the news items generally say what release is the latest. You are right, however, there is no clear links to getting started, etc. By the same token, though, it does take a bit of reading to find, there is no clear "download latest" button like on MySql or other sites like that. I also think we should remove older news items, maybe put in a sunset policy of 1 year or something. > > Ask an coworker who doesn't use Lucene to > try to find the javadocs, a hello world > tutorial, and the download on the Lucene > site. (Yes, I'm suggesting a usability test.) Usability, good! :-) > > Altogether, the design should waste less > whitespace. Compare an Apache page to > something like a MySQL page to see the > difference. > I tend to like more whitespace, I find MySQL to be pretty cluttered visually, although it is much more efficient. > > Class, Method, Construction, Member Doc > --------------------------------------- > > The biggest issue in the doc for me is that > most methods, packages, classes, etc. are > hardly documented at all. For instance, the > very first class in the 2.1 alphabetical list: > > http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/ > javadoc/org/apache/lucene/gdata/servlet/handler/ > AbstractAccountHandler.html > > has 7 methods, 6 of which are undocumented > and 1 of which has inherited redundant doc. > There's an uncommented field, an uncommented > constructor, and there's no class doc. > ugh. > It's also out of date. Someone finally fixed the > infinite-loop design of Analyzer, but the class doc > has a big warning that you must implement one > of the methods. But now there's only the > abstract tokenStream() method which must be implemented > and a getPositiveIncrementGap() method (which is > a useful addition, by the way). Can you enter a bug for this? And maybe a patch? > > It also doesn't help that there are classes > with non-descriptive names like Among, which > have no doc at all. > > I'd rather see each jar get its own javadoc, > or at the very least, indicate which jar each > class is defined in for the ones that aren't > part of the core. > Yeah, I don't like that all the contribs are built in together. What do others think? I would vote for separating them out. > > Reader Schmeader > ---------------- > > This is actually an API, not a doc issue, though the > doc around this needs work as is, too. > > I don't understand why Readers are used in analyzers. > Using them presents several problems. First, since > Analyzer.tokenStream() doesn't throw an IOException, > all exceptions must be caught somewhere inside. Second, > it's not clear who closes the reader or how long the > analyzer will hold it open. Every time I've used Lucene, > I wind up having strings or char sequences or char array > slices that I need to embed in a Reader. That's because > I invariably have to parse out the bits of documents > I want to index in various fields. Finally, wrapping a > char sequence or char array slice in a reader is a rather > inefficient way to implement a sequence of chars. Can we > at least introduce a method that takes a CharSequence or > even just a String and deprecate the one with Reader? > Or at least provide an alternative for the usual case > of not having a reader. Maybe I'm just missing something > here, but I don't think it's scaling to streaming input > that'd overflow memory. > This, I believe, is due to the fact that some Fields can be constructed with Readers. The relevant code in DocumentWriter (around line 195) is: // the field does not have a TokenStream, // so we have to obtain one from the analyzer if (stream == null) { Reader reader; // find or make Reader if (field.readerValue() != null) reader = field.readerValue(); else if (field.stringValue() != null) reader = new StringReader(field.stringValue()); else throw new IllegalArgumentException ("field must have either String or Reader value"); // Tokenize field and add to postingTable stream = analyzer.tokenStream(fieldName, reader); } However, you do present interesting use cases. Also remember that some of these APIs have been around for a while and may very well benefit from some updating. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org