hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dick King (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-1758) processing escapes in a jute record is quadratic
Date Wed, 22 Aug 2007 18:04:30 GMT
processing escapes in a jute record is quadratic

                 Key: HADOOP-1758
                 URL: https://issues.apache.org/jira/browse/HADOOP-1758
             Project: Hadoop
          Issue Type: Bug
          Components: record
    Affects Versions: 0.13.0
            Reporter: Dick King
            Priority: Blocker

The following code appears in hadoop/src/c++/librecordio/csvarchive.cc :

static void replaceAll(std::string s, const char *src, char c)
  std::string::size_type pos = 0;
  while (pos != std::string::npos) {
    pos = s.find(src);
    if (pos != std::string::npos) {
      s.replace(pos, strlen(src), 1, c);

This is used in the context of replacing jute escapes in the code:

void hadoop::ICsvArchive::deserialize(std::string& t, const char* tag)
  t = readUptoTerminator(stream);
  if (t[0] != '\'') {
    throw new IOException("Errror deserializing string.");
  t.erase(0, 1); /// erase first character
  replaceAll(t, "%0D", 0x0D);
  replaceAll(t, "%0A", 0x0A);
  replaceAll(t, "%7D", 0x7D);
  replaceAll(t, "%00", 0x00);
  replaceAll(t, "%2C", 0x2C);
  replaceAll(t, "%25", 0x25);


Since this replaces the entire string for each instance of the escape sequence, practically
anything would be better.  I would propose that within deserialize we allocate a char * [since
each replacement is smaller than the original], scan for each %, and either do a general hex
conversion in place or look for one of the six patterns, and after each replacement move down
the unmodified text and scan for the % fom that starting point.


This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message