To understand URL parsing, it’s important to understand a typical configuration of a website hosting infrastructure. Websites are typically deployed upon multiple servers that consist of a front end, a mid-tier, and a backend. The frontend usually consists of several web servers and normally some type of load balancer. The mid-tier is usually reserved for application servers running middleware such as NodeJS or Tomcat, and the backend is normally a database platform.
The complexity varies depending on the size and scope of the website, and it is URL Parsing that manages the traffic through the entire stack – it determines where and when to send traffic to an attached server. All traffic must be authenticated because URL parsing can send traffic to any resource. Any exploits or development bugs run the risk of exposing internal resources to multiple vulnerabilities in particular to Server Side Request Forgery (SSRF).
URL parsing can lead attackers to gain full visibility to the internal network and determine what server ports are open to being used in a cross-site port attack (XSPA). The vulnerability could even allow attackers to perform Denial of Service (DOS) attacks and in some circumstances enable Remote Code Execution (RCE).
Inconsistent Parsing Rules Create Vulnerabilities
Vulnerabilities have been discovered in several URL parsers that are popular with application developers. Numerous programming languages have their own URL parser library extensions built-in, or they can import compatible third-party plugins. Due to the various options available, researchers from Claroty and Synk suggest that vulnerabilities are present because there is no single consensus or standardization with URL parsers, as each goes about parsing differently and handles errors differently.
URL Parsing is incorporated into several third-party plugins and weaknesses were found in Flask, Video.js, Belledonne, Nagios XI, and Clearance. These plugins are written in different programming languages and they all have parse URL libraries, however, each plugin goes about it slightly differently. Weaknesses have been discovered in 16 different URL parsing libraries compatible with Python⁰0, Java, DotNet, PHP, and NodeJS.
URL parsing is present in nearly every web server configuration conceivable – millions of websites and servers use them throughout their production environments. If a vulnerability was exploited, system engineers would have to scramble to fix the issue just like the recent and impactful Log4J exploit.
How Poor Parsing Leads to SSRF
As we have already mentioned, each URL parser handles requests differently and researchers have identified five core differentiators that increase the risk of being vulnerable to attack:
- Scheme Confusion: The scheme is how the parser structures the input and interrupts the input data. Different parses interrupt the data in different ways creating inconsistencies.
- Slash Confusion: This occurs when too many or too few slashes are parsed, for example: http:///www vs http:/www. Again the way the parser interrupts this incorrect data varies significantly.
- Backslash Confusion: Just as per slash confusion but with backslashes in the parse. Too many or too few \ causes inconsistencies in how the data is parsed.
- URL Encoded Data Confusion: Some URLs contain non-printable characters such as % which can cause errors when parsing data.
- Scheme Mix-Up: This implies confusion surrounding how to handle URLs that do not have a scheme embedded – all URLs should have been constructed with a built-in scheme.
If your chosen URL parser encounters these types of inconsistencies, it’s possible to expose a vulnerable code block. The risk is even greater if developers use more than one URL parser in their code, or use a parser that they are not familiar with. The most likely vulnerability to cause Server-Side Request Forgeries is the slash confusion.
If a URL has an additional forward slash / in the address, the URL parsers can handle this differently, some will drop the additional forward slash and work, others will bugcheck and throw out no-host errors.
Protecting Against SSRF Attacks
SSRF is just one type of vulnerability discovered from URL parsing bugs, and although the risk of code execution is relatively low, there still is a real danger. There are several recommendations to follow to reduce the threat. The first is to ensure that you use as few parsers as possible, and if you do use one make sure that all of the developers are on board and understand the advantages and disadvantages of your choice.
Consider offloading a parsed URL across a microservice platform by simply configuring a URL parser within a front-end microservice, all other services can connect via the microservice thus reducing the attack surface and limiting the chance of using multiple parsers.
Another technique to protect against SSRF is to whitelist the IP or DNS names required by the application and only route traffic point-to-point where it is required for the application to function. Blacklists can be used but are not recommended.
Consider disabling any unused URL schemas by the application, this greatly reduces the attack surface. Disable schemas such as file:// dict:// or gopher://. If you combine this approach with enforcing local authentication you will create a strong internal security layer. For local services understand how the database authenticates with the application, lock it down to predefined IP access only, create a strong set of credentials (even for local traffic) and then protect with KMS keys.
Did you know that you can also subscribe to a third-party security platform to protect against SSRF? In particular, a web application firewall (WAF) is capable of preventing SSRF and a whole host of other threats. Importantly any zero-day threats can be patched out in minutes with a WAF.
Real-time attacks can be detected using RASP (Runtime Application Self-Protection) software – these tools automatically detect the rogue behavior of the URL parser and kill the process automatically. Other tools are available for advanced bot protection in case of an SSFR compromise and can provide additional protection against DDoS where attack traffic is blocked at the network edge.
Another important way to protect yourself is to leverage services that provide analytics of current threats to the web application. It is important to log absolutely everything across the application stack, firewalls, servers, applications, networking, and so on. This will generate huge volumes of data that can be used to train machine learning models that will potentially reveal patterns and detect risks within any application.