lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Natarajan.T" <nataraj...@crimsonlogic.co.in>
Subject RE: Search Help in word doc
Date Tue, 19 Oct 2004 10:09:50 GMT
Ok, Thanks a lot...

-----Original Message-----
From: Cocula Remi [mailto:rcocula@sopragroup.com] 
Sent: Tuesday, October 19, 2004 3:14 PM
To: Lucene Users List
Subject: RE: Search Help in word doc

In my case, search.
But probably that the best is to do it at indexing time.


-----Message d'origine-----
De : Natarajan.T [mailto:natarajant@crimsonlogic.co.in]
Envoyé : mardi 19 octobre 2004 11:41
À : 'Lucene Users List'
Objet : RE: Search Help in word doc


Are you doing this functionality under indexing part or search part

-----Original Message-----
From: Cocula Remi [mailto:rcocula@sopragroup.com] 
Sent: Tuesday, October 19, 2004 2:37 PM
To: Lucene Users List
Subject: RE: Search Help in word doc

This sample code changes undesired characters into underscores.


Document doc = ....

char[] cs = doc.get("content").toCharArray();
StringBuffer sb = new StringBuffer();
for (int j=0;j< Array.getLength(cs);j++)
{
	if (!Character.isISOControl(cs[j]))
	{
		sb.append(cs[j]);
	}
	else
	{
		sb.append(" _ ");
	}
}

System.out.println(sb.toString());

-----Message d'origine-----
De : Natarajan.T [mailto:natarajant@crimsonlogic.co.in]
Envoyé : mardi 19 octobre 2004 11:06
À : 'Lucene Users List'
Objet : RE: Search Help in word doc


Hi Remi,

	Thanks for your response...
	Pls send me the jar name with sample code.....

Thanks,
Natarajan.



-----Original Message-----
From: Cocula Remi [mailto:rcocula@sopragroup.com] 
Sent: Tuesday, October 19, 2004 2:26 PM
To: Lucene Users List
Subject: RE: Search Help in word doc


Seen that.
I use the Character.isISOControl() function to identify and remove these
characters.


-----Message d'origine-----
De : Natarajan.T [mailto:natarajant@crimsonlogic.co.in]
Envoyé : mardi 19 octobre 2004 10:37
À : lucene-user@jakarta.apache.org
Objet : Search Help in word doc


Hi FFI,

 

I am indexing multiple documents like (word,excel,html,ppt,pdf) at the
time of indexing there is no problem.....

 

My search results contents(description) comes with small Boxes(this is
happening only word documents)

 

I think this is happening because of some special characters
like(bullets and symbols....)

 

How can I rectify this problem???

 

Regards,

Natarajan.

 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message