Wednesday, September 10, 2014

What is the 'Input' in Input Validation?

What is the 'Input' in Input Validation?
By Justin C Miller
Posted 9/10/2014

You are a programmer. You've heard of those nasty words like SQL Injection and Cross-Site Scripting (XSS) and Buffer Overflow. You've learned that a key step to preventing them is to perform Input Validation. You sorta know how to do that with things like ...
- Strongly Typed data
- Whitelisting
- Regular Expressions
- Max and Min length checks
You might even be a bit further ahead in the game and know about Security focused encoding libraries like WPL, Coverity Security Library, or the OWASP Reform library.

But an even more important question that I think a lot of developers overlook is the "What INPUT do I apply this validation to?"

The simple answer is ALL INPUT.

The more complicated question becomes, "What is input?"

You always validate the input of a TEXTBOX. If you are a web developer you probably also know that people like to tinker with QUERY STRING PARAMETERS so you validate them. Is that it? Are we done? Is my application safe?

Hopefully your school, or your PCI auditor, or your teammates taught you better. There's a lot more work to do.

Some of the simpler ones to remember involve websites. COOKIES for example are easy to modify (for Internet Explorer all you need is Notepad) so you better validate them. URLS themself should also be validated before used since they're freely modifiable by any user in the address bar of their browser. Why validate? Imagine if I used a cookie value to query the database for a user record. Sounds ripe for SQL Injection. Or imagine if I displayed the current Url onto the screen of an error page. Sounds like a chance for using XSS to inject an iframe.

If you've ever heard of tools like TamperData, Burp, or Fiddler you'll know that HIDDEN FIELDS on your website can also be modified so you better validate them. And those tools let you modify HTTP HEADERS like REFERRER and USER AGENT so you better validate them before they're used. Not convinced its necessary? Imagine you base a SQL query off a hidden field or recording the referrer in a table for analytics. Sounds like SQL injection is in play. Are you checking the user agent to validate what browsers you support and then displaying the user agent if you rejected them? Sounds like XSS could occur.

Other ones that might get overlooked include CHECKBOXES, RADIO BUTTONS, and DROP DOWNS. Client side it kinda seems like these are non-editable. But truth is if you have any of those free tampering tools, you can change the values. Why care? I'm guessing you are querying the database based on their values so SQL injection is in play and this you better validate them.

In a Windows Form application you might need a DATAGRID that gives the user the feel they are directly editing a SQL table. You better perform input validation here since you touching the database.

VIEW STATE on a web page can be messed with thus you better validate and treat it as input to prevent attacks.

I also would not trust SESSION STATE data. Who knows where it came from or how it got filled. If you're querying you database based on information you had saved in the session, I'd be sure to validate it and prevent SQL Injection.

Another crazy one I've seen before that is worth considering is HTML CONTROL ATTRIBUTES. For example, if you're writing any of your code's logic based on what is in the control's CLASS attribute, be careful ... you might open the door to one of these attacks and this you better validate and treat it as input.

Another one I've seen that is commonly overlooked is WEB SERVICE REQUEST PARAMETERS, XML, etc. To me a call to a web service is the equivalent of a user filling in a bunch of textboxes and hitting submit, thus you better validate it to prevent attacks. Otherwise an attacker may try to cause a buffer overflow and get their malicious code to run on your server.

I've been talking a lot about websites, but there are reasons to be concerned with internal attacks by employees or attacks that can occur once the bad guy got inside your network or even attacks coming in from 3rd parties and their system integrations. Some examples of where this comes into play include a DATA IMPORT you might do from a CSV, or from an FTP'd file, or from data brought in by a job or SSIS package. Imagine if the feed you received and saved into your database table was information you displayed on your e-Commerce product page of your website. Sounds like a ripe opportunity to perform some XSS and inject and iframe to my evil site. Better validate!

s What if your program queries a user-accessible folder and READS IN FILES it finds? I wouldn't trust that data ... It could be used cause a buffer overflow and write over the program counter and execute malicious code.

If you are into writing scripts and command line programs don't forget to validate the COMMAND LINE PARAMETERS and FLAGS because they could be used for buffer overflows.

Have an open mind and remember that input comes in many forms, all of which should be validated. Remember all applications including Web apps, Windows apps, scripts, web services, and many more all require input validation. Remember when coding that no user can be trusted, not a guest, not an employee, not IT staff, not an administrator ... So validate all their input every time! Remember also that no data should be trusted, doesn't matter if it was inputted on a screen, read in from a CSV file, or is sent to you as XML in a nightly job ... It should all be validated.

In summary here is a good start to a list of what input requires validation and protection against things such as SQL Injection and XSS and Buffer Overflow
- Form data such as:
--- Textbox
--- Checkbox
--- Radio button
--- Drop down
--- Data Grids
- Hidden Fields
- Script Parameters
- Command Line Arguments
- Data from a file (e.g. CSV)
- Data from another source
- Data Imported (e.g. SSIS packages)
- Website specifics such as:
--- URLs
--- Query Strings
--- Cookies
--- Session State
--- View State
--- Http Headers such as:
------ User agent
------ Referrer
--- Html attributes such as:
------ Class
------ Style
- Web Service request parameters

Happy coding!

Copyright © 2014, this post cannot be reproduced or retransmitted in any form without reference to the original post.


  1. Great post, Justin!

    With the increasing use of APIs to serve as the backend for content serving a multitude of presentation layers, I need for validating posted data is sometimes overlooked as a malicious user is going to bypass the UI altogether.

    Fortunately both .NET and Java (other stacks to...) provide great tools for model validation to prevent issues when sending data to the server in POST/PUT/PATCH calls.

    .NET provides them via the System.ComponentModel.DataAnnotations namespace and JAX-RS implementations have the hibernator validators to use as well:

    Limiting the posted content to valid content, along with scrubbing the dangerous fields (usually text fields) is key to preventing attacks like XSS.