A Post About Analyzing PHP

PHP is a server-side scripting language designed for web development but also used as a general-purpose programming language. As of January 2013, PHP was installed on more than 240 million websites (39% of those sampled) and 2.1 million web servers. Originally created by Rasmus Lerdorf in 1994, the reference implementation of PHP (powered by the Zend Engine) is now produced by The PHP Group. While PHP originally stood for Personal Home Page, it now stands for PHP: Hypertext Preprocessor, which is a recursive acronym.

When developing compilers and interpreters, their source code and its testing procedure are demanded to comply with especially strict quality and reliability requirements. However, there are still some suspicious fragments found in the PHP interpreter's source code.

In this article, we are going to discuss the results of the check of the PHP interpreter by PVS-Studio 5.18.

Picture 1

Identical conditional expressions

V501 There are identical sub-expressions '!memcmp("auto", charset_hint, 4)' to the left and to the right of the '||' operator. html.c 396

static enum
entity_charset determine_charset(char *charset_hint TSRMLS_DC)
{
  ....
  if ((len == 4) /* sizeof (none|auto|pass) */ && //<==
    (!memcmp("pass", charset_hint, 4) ||
     !memcmp("auto", charset_hint, 4) ||          //<==
     !memcmp("auto", charset_hint, 4)))           //<==
  {
       charset_hint = NULL;
      len = 0;
  }
  ....
}

The conditional expression contains a few calls of the 'memcmp' function with identical arguments. The comment /* sizeof (none|auto|pass) */ suggests that the "none" value should be passed into one of the functions.

Always false condition

V605 Consider verifying the expression: shell_wrote > - 1. An unsigned value is compared to the number -1. php_cli.c 266

PHP_CLI_API size_t sapi_cli_single_write(....)
{
  ....
  size_t shell_wrote;
  shell_wrote = cli_shell_callbacks.cli_shell_write(....);
  if (shell_wrote > -1) {  //<==
    return shell_wrote;
  }
  ....
}

This comparison is an evident error. The value '-1' turns into the largest value of the 'size_t' type, so the condition will always be false, thus making the entire check absolutely meaningless. Perhaps the 'shell_wrote' variable used to be signed earlier but then refactoring was done and the programmer forgot about the specifics of operations over unsigned types.

Incorrect condition

V547 Expression 'tmp_len >= 0' is always true. Unsigned type value is always >= 0. ftp_fopen_wrapper.c 639

static size_t php_ftp_dirstream_read(....)
{
  size_t tmp_len;
  ....
  /* Trim off trailing whitespace characters */
  tmp_len--;
  while (tmp_len >= 0 &&                  //<==
    (ent->d_name[tmp_len] == '\n' ||
     ent->d_name[tmp_len] == '\r' ||
     ent->d_name[tmp_len] == '\t' ||
     ent->d_name[tmp_len] == ' ')) {
       ent->d_name[tmp_len--] = '\0';
  }
  ....
}

The 'size_t' type, being unsigned, allows one to index the maximum number of array items possible under the current application's bitness. The (tmp_len >= 0) check is incorrect. In the worst case, the decrement may cause an index overflow and addressing memory outside the array's boundaries. The code executing correctly is most probably thanks to additional conditions and correct input data; however, there is still the danger of a possible infinite loop or array overrun in this code.

Difference of unsigned numbers

V555 The expression 'out_buf_size - ocnt > 0' will work as 'out_buf_size != ocnt'. filters.c 1702

static int strfilter_convert_append_bucket(
{
  size_t out_buf_size;
  ....
  size_t ocnt, icnt, tcnt;
  ....
  if (out_buf_size - ocnt > 0) { //<==
    ....
    php_stream_bucket_append(buckets_out, new_bucket TSRMLS_CC);
  } else {
    pefree(out_buf, persistent);
  }
  ....
}

It may be that the 'else' branch executes more rarely than it should as the difference of unsigned numbers is almost always larger than zero. The only exception is when the operands are equal. Then the condition should be changed to a more informative version.

Pointer dereferencing

V595 The 'function_name' pointer was utilized before it was verified against nullptr. Check lines: 4859, 4860. basic_functions.c 4859

static int user_shutdown_function_call(zval *zv TSRMLS_DC)
{
  ....
  php_error(E_WARNING, "....", function_name->val);  //<==
  if (function_name) {                               //<==
    STR_RELEASE(function_name);
  }
  ....
}

Checking a pointer after dereferencing always alerts me. If a real error occurs, the program may crash.

Another similar issue:

  • V595 The 'callback_name' pointer was utilized before it was verified against nullptr. Check lines: 5007, 5021. basic_functions.c 5007

Insidious optimization

V597 The compiler could delete the 'memset' function call, which is used to flush 'final' buffer. The RtlSecureZeroMemory() function should be used to erase the private data. php_crypt_r.c 421

/*
 * MD5 password encryption.
 */
char* php_md5_crypt_r(const char *pw,const char *salt, char *out)
{
  static char passwd[MD5_HASH_MAX_LEN], *p;
  unsigned char final[16];
  ....
  /* Don't leave anything around in vm they could use. */
  memset(final, 0, sizeof(final));  //<==
  return (passwd);
}

The 'final' array may contain private password information which is then cleared, but the call of the 'memset' function will be removed by the compiler. To learn more why it may happen and what is dangerous about it, see the article "Overwriting memory - why?" and the description of the V597 diagnostic.

Other similar issues:

  • V597 The compiler could delete the 'memset' function call, which is used to flush 'final' buffer. The RtlSecureZeroMemory() function should be used to erase the private data. php_crypt_r.c 421
  • V597 The compiler could delete the 'memset' function call, which is used to flush 'output' buffer. The RtlSecureZeroMemory() function should be used to erase the private data. crypt.c 214
  • V597 The compiler could delete the 'memset' function call, which is used to flush 'temp_result' buffer. The RtlSecureZeroMemory() function should be used to erase the private data. crypt_sha512.c 622
  • V597 The compiler could delete the 'memset' function call, which is used to flush 'ctx' object. The RtlSecureZeroMemory() function should be used to erase the private data. crypt_sha512.c 625
  • V597 The compiler could delete the 'memset' function call, which is used to flush 'alt_ctx' object. The RtlSecureZeroMemory() function should be used to erase the private data. crypt_sha512.c 626
  • V597 The compiler could delete the 'memset' function call, which is used to flush 'temp_result' buffer. The RtlSecureZeroMemory() function should be used to erase the private data. crypt_sha256.c 574
  • V597 The compiler could delete the 'memset' function call, which is used to flush 'ctx' object. The RtlSecureZeroMemory() function should be used to erase the private data. crypt_sha256.c 577
  • V597 The compiler could delete the 'memset' function call, which is used to flush 'alt_ctx' object. The RtlSecureZeroMemory() function should be used to erase the private data. crypt_sha256.c 578

Can we trust the libraries we use?

Third-party libraries do make a large contribution to project development allowing one to reuse already implemented algorithms, but their quality should be checked as carefully as that of the basic project code. I will cite just a few examples from third-party libraries to meet the article's topic and simply muse over the question of our trust in third-party libraries.

The PHP interpreter employs plenty of libraries, some of them slightly customized by the authors for their needs.

libsqlite

V579 The sqlite3_result_blob function receives the pointer and its size as arguments. It is possibly a mistake. Inspect the third argument. sqlite3.c 82631

static void statInit(....)
{
  Stat4Accum *p;
  ....
  sqlite3_result_blob(context, p, sizeof(p), stat4Destructor);
  ....
}

I guess the programmer wanted to get the size of the object, not the pointer. So it should have been sizeof(*p).

pcrelib

V501 There are identical sub-expressions '(1 << ucp_gbL)' to the left and to the right of the '|' operator. pcre_tables.c 161

const pcre_uint32 PRIV(ucp_gbtable[]) = {
  (1<<ucp_gbLF),
  0,
  0,
  ....
  (1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark)|(1<<ucp_gbL)|    //<==
    (1<<ucp_gbL)|(1<<ucp_gbV)|(1<<ucp_gbLV)|(1<<ucp_gbLVT), //<==

   (1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark)|(1<<ucp_gbV)|
     (1<<ucp_gbT),
  ....
};

The expression calculating one array item contains the repeating (1<<ucp_gbL) statement. Judging by the code following this fragment, one of the ucp_gbL variables was meant to be named ucp_gbT, or it is just an unnecessary one.

PDO

V595 The 'dbh' pointer was utilized before it was verified against nullptr. Check lines: 103, 110. pdo_dbh.c 103

PDO_API void pdo_handle_error(pdo_dbh_t *dbh, ....)
{
  pdo_error_type *pdo_err = &dbh->error_code;  //<==
  ....
  if (dbh == NULL || dbh->error_mode == PDO_ERRMODE_SILENT) {
    return;
  }
  ....
}

In this fragment, in the very beginning of the function, a received pointer is dereferenced and then is checked for being null.

libmagic

V519 The '* code' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 100, 101. encoding.c 101

protected int file_encoding(...., const char **code, ....)
{
  if (looks_ascii(buf, nbytes, *ubuf, ulen)) {
    ....
  } else if (looks_utf8_with_BOM(buf, nbytes, *ubuf, ulen) > 0) {
    DPRINTF(("utf8/bom %" SIZE_T_FORMAT "u\n", *ulen));
    *code = "UTF-8 Unicode (with BOM)";
    *code_mime = "utf-8";
  } else if (file_looks_utf8(buf, nbytes, *ubuf, ulen) > 1) {
    DPRINTF(("utf8 %" SIZE_T_FORMAT "u\n", *ulen));
    *code = "UTF-8 Unicode (with BOM)";                     //<==
    *code = "UTF-8 Unicode";                                //<==
    *code_mime = "utf-8";
  } else if (....) {
    ....
  }
}

The character set was twice written into the variable. One of these statements is redundant and may cause incorrect program behavior somewhere later.

Conclusion

Despite that PHP has existed for a long time already and is pretty famous, there are still a few suspicious fragments to be found in its basic code and the third-party libraries it employs, although a project like that is very likely to be checked by various analyzers.

Using static analysis regularly will help you save much time you can spend on solving more useful tasks.