lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jérôme Charron <>
Subject Re: Indexing documents without an extension..
Date Mon, 05 Sep 2005 08:43:12 GMT
> Is there any means within lucene index a particular document wich has 
> not an extension specified ?...
> i.e -- I need to index a document named something like "DOC"( not "DOC.doc
> ")

You can take a look at the Nutch MimeType resolver (
It solves mime types using a file extension repository, and can uses magic 
numbers for some mime types in order to retrieve the document mime-type from 
its content (without the extension).
This nutch utility has no dependency on nutch code, and it could be a good 
idea to move it to lucene code (or perhaps to an utility library common to 
both lucene and nutch...)




  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message