lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Don Gilbert <>
Subject ANN: LuceGene bioinformatics application updated
Date Mon, 28 Feb 2005 17:24:33 GMT
LuceGene release 1.4  is available now at

LuceGene is an open-source document/object search and retrieval system
specially tuned for bioinformatics text databases and documents.  It is
similar in concept to the commercial SRS package (Sequence Retrieval
System). LuceGene is written in Java, built with the open-source Lucene
package []

This release includes an easy to use demonstration. Pop it into a Tomcat
web server and run.

LuceGene adds these bioinformatics methods to Lucene:

 * Indexing adaptors for formats such as XML, PDF Documents,
 Biosequences, Spreadsheets, HTML, and others, with fine tuning by data

 * Configurations for bio-data include UniProt/Swiss-Prot, Fasta and
 GenBank sequences, BIND protein interactions, BLAST outputs,
 Medline and others.

 * Support for batch-list look-ups and searches by ID, gene names, etc.

 * Web interface with paged results, batch downloads, search
 refinement and search-linking among data libraries.

 * Web Services support with a SOAP interface.

 * Output support for data-field selection and formats such as
 Spreadsheet, XML, HTML, and others.

It can take as little as a few hours engineering time to add new
databank parsing, making it a cost-effective way to use many
bioinformatics data sets.

LuceGene is speedy with big data sets: indexing and searching the
UniProt library of 1.7 million sequences with LuceGene is comparable to
using SRS. Gene Annotation object search and retrieval with LuceGene is
10x to 20x faster than using a Postgres Chado database.

-- Don Gilbert
Genome Informatics Lab
Indiana University, Bloomington IN

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message