Taint Analysis

Qryon's taint analysis engine tracks untrusted data from sources to sinks, detecting vulnerabilities that span multiple functions and files.

What is Taint Analysis?

Taint analysis tracks the flow of untrusted ("tainted") data through your code. It identifies when data from external sources (like user input) reaches sensitive operations (like SQL queries) without proper sanitization.

Key Concepts

Term	Definition	Examples
Source	Origin of untrusted data	`req.body`, `request.args`, `stdin`
Sink	Security-sensitive operation	`db.query()`, `shell()`, DOM methods
Sanitizer	Function that makes data safe	`escape()`, `parseInt()`, validators
Propagator	Transfers taint to new values	`concat()`, `slice()`, `+`

How It Works

1. Source Identification

Qryon identifies data sources that introduce untrusted data:

// JavaScript/TypeScript sources
req.params.id          // URL parameters
req.query.search       // Query string
req.body.username      // POST body
req.headers['x-token'] // HTTP headers
req.cookies.session    // Cookies

# Python sources
request.args.get('id')    # Flask query params
request.form['username']  # Form data
request.json              # JSON body
os.environ.get('INPUT')   # Environment vars

// Java sources
request.getParameter("id")    // Servlet params
request.getHeader("token")    // Headers
System.getenv("INPUT")        // Environment

2. Taint Propagation

Taint flows through operations that use or transform the data:

const id = req.params.id;           // id is tainted
const upper = id.toUpperCase();      // upper is tainted
const query = "SELECT * FROM " + id; // query is tainted
const parts = id.split('-');         // parts[0], parts[1] are tainted

3. Sink Detection

Qryon alerts when tainted data reaches dangerous operations:

// VULNERABLE: Tainted data in SQL query
const id = req.params.id;
db.query(`SELECT * FROM users WHERE id = '${id}'`);

// Finding: sql-injection - Tainted data flows to db.query()

4. Sanitizer Recognition

Qryon understands when sanitization makes data safe:

const id = req.params.id;

// SAFE: Sanitized with parseInt
const numId = parseInt(id, 10);
db.query('SELECT * FROM users WHERE id = ?', [numId]);

// SAFE: Using parameterized query
db.query('SELECT * FROM users WHERE id = ?', [id]);

// SAFE: Using allowlist validation
if (ALLOWED_IDS.includes(id)) {
  db.query(`SELECT * FROM users WHERE id = '${id}'`);
}

Cross-File Analysis

Qryon tracks tainted data across file boundaries using import resolution and call graph analysis:

// routes/users.js
import { findUser } from '../services/userService';

app.get('/user/:id', (req, res) => {
  const user = findUser(req.params.id);  // Taint flows to findUser
  res.json(user);
});

// services/userService.js
import { db } from '../db';

export function findUser(id) {        // id is tainted from caller
  // VULNERABLE: Tainted data reaches SQL sink
  return db.query(`SELECT * FROM users WHERE id = '${id}'`);
}

Interprocedural Flow Report

[HIGH] sql-injection
  Flow: req.params.id -> findUser(id) -> db.query()

  Step 1: routes/users.js:4
    Source: HTTP request parameter

  Step 2: services/userService.js:4
    Propagation: Function parameter

  Step 3: services/userService.js:6
    Sink: SQL query

Taint Sources by Language

JavaScript/TypeScript

// Express.js
req.params, req.query, req.body, req.headers, req.cookies

// Node.js
process.argv, process.env, fs.readFileSync()

// Browser
window.location, document.URL, document.cookie
document.getElementById().value, localStorage.getItem()

Python

# Flask
request.args, request.form, request.json, request.headers

# Django
request.GET, request.POST, request.body

# General
sys.argv, os.environ, input(), open().read()

Java

// Servlet
request.getParameter(), request.getHeader(), request.getCookies()

// Spring
@RequestParam, @PathVariable, @RequestBody, @RequestHeader

// General
System.getProperty(), System.getenv(), Scanner.next()

Configuring Taint Analysis

# rma.toml

[taint]
# Enable cross-file analysis
interprocedural = true

# Maximum call depth for tracking
max_depth = 10

# Custom sources
[[taint.sources]]
pattern = "getUntrustedInput()"
languages = ["javascript", "typescript"]

# Custom sinks
[[taint.sinks]]
pattern = "dangerousOperation($ARG)"
languages = ["javascript", "typescript"]
sink_arg = "$ARG"

# Custom sanitizers
[[taint.sanitizers]]
pattern = "sanitize($INPUT)"
languages = ["javascript", "typescript"]
sanitizes = "$INPUT"

Viewing Taint Flows

# Show all taint flows
rma scan . --show-flows

# Interactive TUI (recommended)
rma scan --interactive
# Press 'f' to view cross-file flows

# JSON output with flow details
rma scan . --format json --show-flows | jq '.findings[].flow'

Limitations

Dynamic code: Dynamic code generation and requires
Reflection: Java reflection, Python getattr
Callbacks: Complex callback chains may lose taint
External libraries: Taint may not propagate through non-analyzed code

Best Practices

Use parameterized queries instead of string interpolation
Validate early - sanitize input at the boundary
Use typed parsers - parseInt(), JSON.parse()
Allowlist validation - prefer allowlists over blocklists
Context-aware encoding - HTML encode for HTML, URL encode for URLs