Return-Path: Delivered-To: apmail-jakarta-oro-dev-archive@apache.org Received: (qmail 50765 invoked from network); 24 Jan 2002 20:32:12 -0000 Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131) by daedalus.apache.org with SMTP; 24 Jan 2002 20:32:12 -0000 Received: (qmail 18460 invoked by uid 97); 24 Jan 2002 20:32:03 -0000 Delivered-To: qmlist-jakarta-archive-oro-dev@jakarta.apache.org Received: (qmail 18411 invoked by uid 97); 24 Jan 2002 20:32:02 -0000 Mailing-List: contact oro-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "ORO Developers List" Reply-To: "ORO Developers List" Delivered-To: mailing list oro-dev@jakarta.apache.org Received: (qmail 18339 invoked from network); 24 Jan 2002 20:32:02 -0000 Message-Id: <200201242034.g0OKY6g21868@gandalf.savarese.org> X-Mailer: exmh version 2.2 06/23/2000 with nmh-1.0.4 To: "ORO Developers List" Subject: Re: Qusetion In-reply-to: Your message of "Wed, 23 Jan 2002 07:23:46 MST." Mime-Version: 1.0 Content-Type: multipart/mixed ; boundary="==_Exmh_-1730570960" Date: Thu, 24 Jan 2002 15:34:06 -0500 From: "Daniel F. Savarese" X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N --==_Exmh_-1730570960 Content-Type: text/plain; charset=us-ascii In message , Dave writes: > >Hardeep Singh wrote: >> Setting to ISO-LATIN-1throws the UnsupportedEncodingException > >Maybe try "8859_1" in place of "ISO-LATIN-1" I finally got around to running several tests today. I wrote a test program and searched various binary files with several different regular expressions. The default encoding worked, US-ASCII resulted in the described index out of bounds exceptions, but ISO-8859-1 was fine. US-ASCII is a 7-bit character set, so there's probably some issue there. But ISO-8859-1 is 8-bit. I had no problems running matches against jar files, executables, mp3's, or jpg's. So if you're still having problems, please post some sample code and input where I or someone else can download it and try it out. I've attached my test program. Should I add it to the awk examples in CVS? daniel --==_Exmh_-1730570960 Content-Type: text/plain ; name="strings.java"; charset=us-ascii Content-Description: strings.java Content-Disposition: attachment; filename="strings.java" import java.io.*; import org.apache.oro.text.regex.*; import org.apache.oro.text.awk.*; public final class strings { public static final class StringFinder { /** * Default string expression. Looks for at least 4 contiguous * printable characters. Differs slightly from GNU strings command * in that any printable character may start a string. */ public static final String DEFAULT_PATTERN = "[\\x20-\\x7E]{3}[\\x20-\\x7E]+"; Pattern pattern; AwkMatcher matcher; public StringFinder(String regex) throws MalformedPatternException { AwkCompiler compiler = new AwkCompiler(); pattern = compiler.compile(regex, AwkCompiler.CASE_INSENSITIVE_MASK); matcher = new AwkMatcher(); } public StringFinder() throws MalformedPatternException { this(DEFAULT_PATTERN); } public void search(Reader input, PrintWriter output) throws IOException { MatchResult result; AwkStreamInput in = new AwkStreamInput(input); while(matcher.contains(in, pattern)) { result = matcher.getMatch(); output.println(result); } output.flush(); } } public static final void main(String args[]) { String regex = StringFinder.DEFAULT_PATTERN; String filename, encoding = "ISO-8859-1"; StringFinder finder; Reader file = null; if(args.length < 1) { System.err.println("usage: strings file [pattern] [encoding]"); return; } filename = args[0]; if(args.length > 1) regex = args[1]; if(args.length > 2) encoding = args[2]; try { finder = new StringFinder(regex); file = new InputStreamReader(new FileInputStream(filename), encoding); finder.search(file, new PrintWriter(new OutputStreamWriter(System.out))); } catch(Exception e) { e.printStackTrace(); return; } } } --==_Exmh_-1730570960 Content-Type: text/plain; charset=us-ascii -- To unsubscribe, e-mail: For additional commands, e-mail: --==_Exmh_-1730570960--