lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Using Lucene to store document
Date Tue, 09 Nov 2004 18:58:54 GMT
It is difficult to give a general answer.  You can certainly store the
whole XML in the Lucene index, just don't tokenize it.  The HEAD
version of Lucene even has some compression that you may find handy. 
On the other hand, storing XML in the FS would allow you to store XML
files wherever you wanted, even on separate disk(s).  If these are lots
of parallel searches/reads, this can be handy.  If you want to be able
to see XML files without going through the index, this can also be
handy.  So, it depends on how you like it, but both approaches are
doable.

Otis


--- Nhan Nguyen Dang <ndnhan2003@yahoo.com> wrote:

> Hi all,
> I'm using Lucene to index XML document/ file (may be millions of
> documents in future, each about 5-10KB)
> Beside the index for searching, I want to use Lucene to store whole
> document content with UnIndexed fields -content field(instead of
> store each document in a XML file). All the document content will be
> stored on a separate index. Each time I want to get access to a
> document, I will let Lucene retrieve it.
>  
> I am consider this issue with another one "Use file system to store
> document content in separate XML document" means, 400K document ill
> be stored in 400K XML file in file system.
>  
> Purpose of this is that I can access each document rapidly. Can any
> body who has experience with this problem before give me advise which
> method is suitable ? Is this better to collect all documents to an
> Lucene index or store them separately in file system ?
>  
> Thanks,
> Dang Nhan
> 
> 
> 
> 
> 			
> ---------------------------------
> Do you Yahoo!?
>  Check out the new Yahoo! Front Page. www.yahoo.com


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message