lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Catalin Mititelu <catalinmitit...@yahoo.com>
Subject Re: Big Ducument Indexing Limit?
Date Fri, 15 Sep 2006 11:52:17 GMT
One more hint for 2) and 3): use SimpleAnalyzer on your xml (give up at XmlContentExtractor).
In this manner you can index all "words" from xml file at lower case (tag name, attribute
name, attribute value and content). 
Of course, you should use the same analyzer for searching.

Simon Willnauer <simon.willnauer@googlemail.com> wrote: On 9/15/06, aslam bari  wrote:
> Dear Mititelu,
>   Thanks for reply. Can you help me on some samll issue related to it.
>   1) I am new to Lucene. Can you tell me where is this DEFAULT_MAX_FIELD_LENGTH variable
available and how to set it and for my purpose like 6-10MB file, how much i should set.
DEFAULT_MAX_FIELD_LENGTH is a constant of IndexWriter.
to set your own value use IndexWriter.html#setMaxFieldLength(int)

JavaDoc:
http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexWriter.html#setMaxFieldLength(int)

>   2) how can i index all the words of XML file as Case Insensitive means either in lowercase
or in Uppercase. So that i can search case insensitive.
How your text is processed depends on the analyzer you use for
indexing a certain field. A lot of the available analyzers do
lowercase processing using a lowercasefilter like the
(whitespaceanalyzer)
see (Analysis) at:
http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html


You should have a look at
>   3) I am using XmlContentExtractor. Actually it extract only Content of the xml tag.
If i want every thing (including XmlTag or contents or properties - attributes ) of xml file
to be indexed, where should i do change the code.

I'm not 100% sure but isn't XmlContentExtractor a part or jakarta
slide? If so please contact the slide mailing list. But there are many
xml apis available to extract every content of your document. If you
really want to index the whole xml file just read the file using java
io anyway I would not suggest doing that at all.

best regards simon
>
>   Thanks...
> Catalin Mititelu  wrote:
>   Yes. The default max limit for indexing tokens is 10,000.
> Look here http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexWriter.html#DEFAULT_MAX_FIELD_LENGTH
>
> aslam bari wrote: Dear all,
> I am trying to index a Xml file which has 6MB size. Does lucene support the big document
size. What is the limit of lucene Max file size to index.
> Because when i check and trying to search in the indexed file. I am not able to get all
the results. It gives me some results but not others. I think Lucene has done partially indexed
and left the remaining part which it could not processed. IS IT RIGHT?. If not , then what
will be the problem.
> Thanks...
>
>
> ---------------------------------
> Find out what India is talking about on - Yahoo! Answers India
> Send FREE SMS to your friend's mobile from Yahoo! Messenger Version 8. Get it NOW
>
>
> ---------------------------------
> Stay in the know. Pulse on the new Yahoo.com. Check it out.
>
>
> ---------------------------------
>  Find out what India is talking about on  - Yahoo! Answers India
>  Send FREE SMS to your friend's mobile from Yahoo! Messenger Version 8. Get it NOW
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



 		
---------------------------------
How low will we go? Check out Yahoo! Messenger’s low  PC-to-Phone call rates.
Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message