PHP Autoloading: Local File Inclusion by Design

In the year 2009, PHP 5.3 was released, bringing with it major new features like namespaces and lambda functions. At the time, there was nothing like Python’s PEPs for PHP, but that was all about to change with the introduction of the PHP Standards Working Group, later renamed Framework Interoperatability Group. The first PSR (PHP Standards Recommendations) released was PSR-0: Autoloading Standard. The standard defined a way to automatically load the correct file containing a class, regardless of which vendor packaged the given code. This was later superseded by PSR-4: Autoloader, which had the same purpose and a similar recommended implementation. These PSR’s paved the way for widespread Local File Inclusion issues within PHP applications, given the right (or wrong) user code.

The PHP-FIG website gives several example implementations of PSR-4 which are useful to examine. The first example, https://www.php-fig.org/psr/psr-4/examples/#closure-example, is the simplest to review.

The function spl_autoload_register (tip: you can always visit php.net/functionname to view its documentation) is used to tell the PHP engine which function to call if the code attempts to use a class which hasn’t yet been loaded. This function, a closure in the example code, will be called automatically, and be expected to read in the file which contains the class definition. As you might expect, an attacker will be able to use this to their advantage.

The code verifies the class namespace is known to the autoloader, Foo\Bar in this case, and if so, attempts to load a file based on the class name. A class of Foo\Bar\Baz would attempt to load the file Baz.php. Similarly, a class of Foo\Bar\Boop\Bap would attempt to load the file Boop\Bap.php.

The PHP documentation is precise in defining what a valid class name looks like. Expressed as a regular expression: ^[a-zA-Z_\x80-\xff][a-zA-Z0-9_\x80-\xff]*$. In practice, almost all classes are simply alphanumeric (with the first character not being a number). With this restriction, it seems unlikely an attacker could coerce any autoloading to occur maliciously. Most LFI is not going to occur within a vendor library folder of PHP code. If an attacker were able to write files to that directory, they could probably attack the code in a more direct way, such as overwriting an existing class definition.

While this requirement is verified strictly during parsing of PHP code, if a class is instantiated at run-time with a dynamic name, the requirement can be bypassed.

Consider the following examples:

// Example 1
$className = $_GET['className']; // Class name from the user
$object = new $className();
// Example2
if (class_exists($_GET['className'])) {
    echo "It exists";
} else {
    echo "Invalid class name";
}

In both examples, PHP will invoke the autoload in an attempt to find whether the class exists. If an attacker were to specify a malicious class name, such as one containing ../, the autoloader would attempt to load a PHP file from a user specified directory.

There’s another place that loads user specified classes that might come to mind:

// Example 3
unserialize($_GET['data']);

Unfortunately, each of these examples has problems. To investigate further, we’ll dive into the code of PHP itself.

To begin, let’s review the unserialize code in PHP. The latest release as I write this is 7.3.8. The unserialize source can be found in ./ext/standard/var_unserializer.c, which can be viewed easily on GitHub. The code is not the most simple to read, not in the least for lacking comments, but by stepping through it, we arrive at line 1217. If you recall the definition of a valid class name, this set of characters should be familiar. The code verifies, during deserialization, that any given class name is valid, and refuses to continue if it is not. By reviewing the Git history, we can find the commit that introduced this verification: https://github.com/php/php-src/commit/ff8055fc5c9750482aac7a25a074aae0b1e64706. GitHub helpfully tells us which releases contain this commit, and it doesn’t look good as an attacker — it’s been present since the first release of PHP 5.1.0, which was released in 24 Nov 2005. To the best of my knowledge, there isn’t a distribution around that supports 5.1.0, not even CentOS!

Still, this tells us that initially, the protection didn’t exist, and it was only added at a later stage, which bodes well for finding other places to exploit the issue. Let’s turn our attention to the other examples provided.

Both Example 1 and Example 2 end up calling the same code: zend_lookup_class_ex. Once again, viewing the code from the latest release on GitHub: ./Zend/zend_execute_API.c, line 1217. Skimming this code takes us to an ominous comment.

/* Verify class name before passing it to __autoload() */

Once again, rummaging through the Git history, we find some luck. The commit that adds verification to this function was only added in 2014, covering PHP release 5.4.24 and 5.5.8 onward. In my experience, PHP 5.3 is still common enough that you are likely to come across servers using it. For example, CentOS 7 (the latest version) still uses PHP 5.4.16, without any backported patches to resolve the issue.

There’s one last function that results in the autoloader being run, but it’s unlikely to be useful in practice: spl_autoload_call. This function will directly trigger the autoload in user-land PHP without any of the engine protections we’ve seen before now.

Here is a non-exhaustive list of ways to trigger zend_lookup_class:

class_exists
method_exists
new $_GET['user_input']();
SoapServer::setClass
DOMDocument::registerNodeClass
get_parent_class
is_a
get_class_vars
get_class_methods
Many more!

As you can tell, if you happen upon a PHP 5.3 or earlier installation, you have many ways to trigger the autoloader with a malicious class name. You can always check for more functions by searching the PHP codebase for zend_lookup_class or zend_lookup_class_ex yourself.

Exploiting can be extremely easy in the simplest case, but require some trickery in more complex cases. For a complete exploitation example, see CVE-2019–19373 (PHP unserialization in Squiz Matrix CMS).