A New Way of PHP Security Thinking

A New Way of Thinking About PHP Security

Develop a new perspective

如果我們不改變我們的方向，我們很可能最終走向我們的方向 -If we do not change our direction, we are likely to end up where we are headed

When it comes to thinking about defensive security programming in PHP, it helps to first address some common misconceptions, and then adopt some new thoughts about the actual problem domain space of correct PHP/MySQL/HTML/JavaSCript data processing.

Some common notions floating around the net are:

This variable is "safe" because strip_tags() cleaned it.

This input is "clean" because mysql_real_escape_string(addslashes(strip_tags()))

This is "safe" because SQL Injection was prevented.

These assumptions are misleading at best and deceptive at worst because they are not adequately addressing the problems or the proper remediation. This negatively affects design and coding decisions.

Crazy

Security doesn’t mean completely safe. It means steps were implemented to add protection, making a breach more difficult. The word, secure, does not mean cannot be broken into ever. Instead, it means not wide open. It means that processes have been put into place to reduce threat levels and increase protection. While it is pleasant to think that this is finally tamper proof, it simply just isn’t true.

Consider adopting a new mindset regarding this problem space. Instead of thinking "Clean, Safe, and Done", think "Reducing Attack Vectors", "Reduced Threats", "Less Vulnerable", "Higher Degrees of Protection". These are more accurate descriptions of the defense design process and implementations. As such, this is more helpful to the programming mindset. Using prepared statements for database queries, not storing passwords, but instead, encrypting, then storing password hashes greatly raises the security safety bar with much higher degrees of protection. Changing the way GET is processed can usually reduce the number of attack vectors so the app is less vulnerable.

The battle of web security centers largely around the battle of escape characters. The problem is that escape character interpretation changes depending on the parsing engine currently engaged. Every web application consists of several parsing engines, the PHP engine, the MySQL parser, the browser HTML parser, and the browser JavaScript parser. The data is constantly going in and out of all of them.

Web exploits are technical exploits. So to defend against them requires one to be technically correct. PHP by nature is loose regarding type specificity, and not very pedantic. This presents the express need to be very pedantic.

Technically, it is safe to:

Escape a UTF-8 variable out into a MySQL database UTF-8 column type using PDO opened with charset UTF-8 with pdo->quote(variable).

There is no other technical 'safety' implied here. This process does not make the variable safe for an HTML parser.

Technically, it is safe to:

Display a UTF-8 variable out into UTF-8 HTML using echo htmlentities (variable, ENT-QUOTES, "UTF-8");

Again, there is no other technical 'safety' implied here. The variable under this process is not safe for a MySQL parser.

Better

The term "escape out into" is used specifically to describe the process that the variable is going out of the PHP parser and into the MySQL parser, or out of the PHP parser and into the browser HTML parsing engine. NOTE* This is, despite common usage as such, why mysql_real_escape_string() is not a PHP variable input cleaner. It is a MySQL aware, character set connection knowledgeable, input preserver for strings.

The reason safety is achieved in each particular case is that characters sets are matched and correct escaping is performed based on the criteria for the appropriate parsing engine. Outside of these particular cases, it is not known what could happen, which opens a potential security hole. This is why a variable cannot be assumed to be safe in any other setting or condition. That is the battle of the data context.

The next battle is the battle of the attack vector. Every input, and every output is a potential attack vector. This includes, $_POST, $_GET, $_REQUEST, $_COOKIE, $_SERVER, $_FILES, $_ENV, and $_SERVER. It also includes any untrusted data the application obtains from database queries, and HTTP requests made to 3rd parties. For the moment, the discussion will focus on POST and GET.

A POST request is no safer than a GET request. Both are direct input attack vectors into an application. The difference is that they are completely different attack vectors. If one simply eliminated the processing of all GET requests from ones application, the resulting effect is the closure of that attack vector and the elimination of that category of attack. The net security result is that total threats are reduced, and the application is less vulnerable. It does not make the application “safe”. Now POST has to be dealt with.

If one makes all GET requests truly a read only operation, for static HTML pages, that also has the effect of closing off certain attack vectors and decreases threats. It does not make it completely "safe". If the only data modifications are made through POST requests, defensive programming has a chance of increased effectiveness because the attack vectors are reduced. Fewer attack vectors reduce defensive programming complexity, which is helpful. If one chooses to look at it like this, there is a beneficial reason to the practice of defensive programming in disabling the use of the $_REQUEST array, and only use $_GET for read only requests and $_POST for write modifications. The problem with the $_REQUEST array is that it merges two completely different attack vectors, into the same, single attack vector. Source distinction is lost, and a developer loses some direct control over the defense strategy.

In real applications, read only requests are fully dangerous. Read only requests still dynamically assemble the data to be delivered based on untrusted user input. Therefore, input must be properly filtered and validated, then properly escaped out into the database before lookup, and properly escaped out into the browser for viewing. This process can be simplified from a filtering standpoint when the intent is clear from the request type. Same with a POST request to modify data. Request type makes processing intent clear. Intent makes design and implementation clearer.

There are many heated debates about REST (Representational State Transfer) architecture, and how to properly use the HTTP specification when implementing GET and POST. The purpose here is not to end that argument. The purpose is to introduce some additional notions that aid the process of defensive programming.

One reality that seems to fuel the debate is the blurred lines of certain requests. For example, in this book, a GET request is used with a code to activate an account. Is that a read only intent or a write modification? You may decide for yourself, it could be argued either way. The choice here was not academic. It was made, in this case, because of the email delivery requirement and how a GET request link works so well for this case. The only goal in any endeavor is to achieve the best result for the consumer.

In most cases, this book strives for a clear intent of request, $_GET for read, $_POST for writes, and makes explicit use of each. $_REQUEST is discarded by completely by unsetting the array so it cannot ever be used.

Lastly, with this new mindset, is the new notion of data always in transit, and that data is never “Done”. Remove from your mind the concept of "clean all input then done". Instead, get into the mindset of,

"Filter input, Escape output" when required.

PHP Security Context Escaping for Input and Output

The notion of "escaping when required" is important because it is a true fact in any application, at any point in time, data will be either at rest inside a variable, or in transit, headed to a different parsing engine.

Keep in mind the notion that data in variables are held in stasis, frozen, until something acts upon them. As long as it's just a variable, it's harmless. A dangerous attack string could come in through a GET request, sit inert inside the $_GET array container, and if it is never accessed, it does no harm. The potential for harm is only determined by what parsing engine acts upon it. Different actions have different determinations. This is why output context is so critically important.

For example, the following logic is perfectly acceptable.

To begin, filter/validate an incoming variable according to business criteria.
This has nothing to do with security. This process, at this stage, is to ensure that there is a user name with a max limit of 40 alphabetic UTF-8 characters, so that the name fits, without truncation, inside the 40 character limit UTF-8 table column. Destruction and/or rejection of user data according to application rules is perfectly acceptable here. The decision about what is good data is the choice of the designer.
After this validation is done, this variable is held in stasis
It doesn't need to be "cleaned" as it is not known how it should be cleaned. But technically, right now, the data is safe. It isn't doing anything. At this point, it is the job of the code to protect and preserve the data that validation accepted.

NOTE* remember that escaping is preserving. It is not filtering, which is destructive, and which is another common misconception. Escaping preserves the variable into the next context. For example, O’Reilly needs to go into the database, and come back out as O’Reilly. Escaping is what accomplishes this. Filtering would be destructive since it would remove the single quote, resulting in OReilly, which would be unwanted in most cases.

Once the decision has been made to take action, escape the output according to the context.

Escape out into the database
Now it is known where the data is going, and the determination is made by character set, set by the opened database connection and MySQL commands and encoding. The data must be escaped according to these rules in order to be effective. The goal of the code here is to preserve the result of the previous business decision. Data destruction is not acceptable here.
NOTE* This is one reason why addslashes() does not equal pdo->quote() or mysql_real_escape_string(). addslashes() does not know about the database character set requirements.
Escape out into HTML
Now the display destination is known. Data must escaped for the HTML entities and the character set declared to the browser in the HTML header. The target parsing engine is the browser HTML parser here, not a SQL parser.
Escape out into a URL
Data is now going to the browser URL parser, which is not the same as the browsers HTML parser and has completely different control sequences. Again, data needs to be preserved.
Escape out into JavaScript
then into URL link. Data is going first to the JavaScript engine parser, then to browsers HTML parser.

To repeat, with PHP specifics applied, a typical data processing sequence could look like:

Validate incoming $_POST string as an integer via ctype_digit () Destroy/reject any data not acceptable to business rules
Hold the variable till needed
Escape into database for saving, via pdo->quote(). Preserve whatever the variable currently is. Destruction of data is not acceptable here
Retrieve data back into an inert variable and hold the variable
Escape out into HTML, via htmlentities($var, ENT_QUOTES, “UTF-8”), as part of HTML table data element. Preservation, not destruction, is the goal
Escape out into HTML hyperlink via applying htmlentities(urlencode())

Purchase on Amazon

Techniques for doing this in PHP, HTML and MySQL, is the focus of the rest of this book.