lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jérôme Charron <jerome.char...@gmail.com>
Subject Re: Indexing documents without an extension..
Date Mon, 05 Sep 2005 08:43:12 GMT
> 
> Is there any means within lucene ..to index a particular document wich has 
> not an extension specified ?...
> i.e -- I need to index a document named something like "DOC"( not "DOC.doc
> ")

You can take a look at the Nutch MimeType resolver (
http://lucene.apache.org/nutch/apidocs/org/apache/nutch/util/mime/package-summary.html
)
It solves mime types using a file extension repository, and can uses magic 
numbers for some mime types in order to retrieve the document mime-type from 
its content (without the extension).
This nutch utility has no dependency on nutch code, and it could be a good 
idea to move it to lucene code (or perhaps to an utility library common to 
both lucene and nutch...)

Regards

Jérôme

-- 
http://motrech.free.fr/
http://www.frutch.org/

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message