lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karsten F." <karsten-luc...@fiz-technik.de>
Subject Re: Lucene vs. Database
Date Wed, 01 Oct 2008 09:46:54 GMT

Hi agatone,

I agree with markharw00 that highlighting is the main reason to store fields
in lucene.
I want to remind Sascha Fahl that the stored field in lucene are not inside
the inverted index-structure.
The implemention of stored fields is very simple: 
A (.fdt)-file with the pairs "field-name"/"field-value" in order of the
documents with a map "documentID" --> "first pair in file".
("Stored fields" in
http://lucene.apache.org/java/2_3_2/fileformats.html#Fields )
You can search with no stored fields at all.
I agree with chrislusf that you should store least data in lucene as
possible.
If you store large byte-arrays for "full view" you possible will have a lot
more IO even for hit-lists which does not use this byte-array. (you would
have to use FieldSelector, but still with FieldSelector  a hard-drive don't
like to skip this field-data (= seek data)).

So if you have no highlighting at all, you could store a map "lucene
document id"(int) --> "database id"(hopefully also type int) in main memory,
and convert each lucene search result-list to a small select statement.
This is completely ok.
Lucene is very good in searching not in storing data.

Take a look to thread
http://www.nabble.com/Using-lucene-as-a-database...-good-idea-or-bad-idea--to18703473.html

In my company we decided to use lucene as storage. But we have now to
index-directories: one for searching and showing hit lists, the other as
storage with ony two fields: "key" & "data".
Performance tests shows that reading the storage is between 5 and 2 times
slower then a solution with database (this was OK for our use-case).

Best regards
  Karsten


agatone wrote:
> 
> Hi, 
> I asked this question already on "lucene-general" list but also got
> advised to ask here too.
> 
> I'm working on a project that has big database in the background (some
> tables have about 1500000 rows). We decided to use Lucene for "faster"
> search. Our search works similar as all searches: you write search string,
> get list of hits with detail link. But there is dilemma if we should store
> more data into index than it's needed. 
> 
> One side of developing team insists that we should use lucene index as
> somekind of storage for data so when you get hit, you go onto details and
> then again use lucene to find document that matches the selected ID and
> take the data from Lucene index. So in the end you end with copying
> complete database tables into the lucene index.
> 
> Other side insists on storing to index only data that is displayed
> directly to the user when showing the search results list and needed for
> search criteria. When you go onto details, you have the matching ID so you
> can pickup that row from database by that ID rather than search it inside
> Lucene index. 
> 
> Can someone please describe drawbacks and advantages of both approaches.
> Actually can someone write down what's the actual profit, where and when
> of the Lucene itself in real production env. 
> 
> IT would be great if there is anyone who could write his experience with
> indexing and searching large amount of data.
> 
> 
> Thank you
> 

-- 
View this message in context: http://www.nabble.com/Lucene-vs.-Database-tp19755932p19757274.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message