What are the security risks of `pickle`, `eval`, and `exec`, and how do you avoid them?

Detailed Answer

Why unpickling untrusted data is a full remote-code-execution risk

import pickle
import os

class Exploit:
    def __reduce__(self):
        return (os.system, ("echo pwned; rm -rf /tmp/demo",))

payload = pickle.dumps(Exploit())

# Anywhere this runs on untrusted input:
pickle.loads(payload)   # actually executes os.system(...) during unpickling!

__reduce__ is a legitimate protocol pickle uses to know how to reconstruct an object — but it can name any callable, and unpickling calls it. There is no way to "sandbox" pickle.loads() against a maliciously crafted payload; the official docs state plainly: never unpickle data received from an untrusted or unauthenticated source. This is the reason cache backends, message queues, or APIs that use pickle for convenience are a known attack surface if any external input can reach them.

`eval()`/`exec()`: running arbitrary source directly

user_input = "__import__('os').system('rm -rf /')"
eval(user_input)   # executes it -- catastrophic if user_input is attacker-controlled

eval (expressions) and exec (statements) execute Python source text directly, with the full power of the language — passing any externally-influenced string to either is effectively giving that input author full code execution in your process.

The safe alternatives

import json
import ast

# Safe: parsing structured data
data = json.loads(user_json_string)              # only produces JSON-compatible values

# Safe: parsing a Python LITERAL (not arbitrary code)
value = ast.literal_eval("[1, 2, {'a': True}]")    # only literals -- no function calls, no imports
ast.literal_eval("__import__('os').system('x')")     # raises ValueError -- not a literal, rejected

json.loads only ever produces plain data (dicts, lists, strings, numbers, booleans, None) — it cannot execute anything. ast.literal_eval is a genuinely safe, restricted subset of eval that parses only Python literals (numbers, strings, tuples, lists, dicts, booleans, None) and explicitly rejects anything resembling a function call or attribute access.

For serialization across a trust boundary, avoid pickle entirely

# Instead of pickling to send data between services / store in a shared cache:
import json
data = json.dumps({"user_id": 1, "action": "login"})

# For richer/faster binary serialization with the same "no code execution" safety:
# msgpack, protobuf, or a schema-validated format (pydantic models -> JSON)

pickle is appropriate only for trusted, same-process or same-organization data you fully control (e.g., caching your own computed Python objects to local disk) — never for data crossing a trust boundary (received from a network request, a third-party queue, user uploads, or any source you don't fully control end to end).

Interview-ready summary: pickle.loads() on untrusted data is equivalent to arbitrary code execution, because a crafted payload's __reduce__ can invoke any callable during deserialization — never unpickle data you don't fully trust. eval/exec on any externally-influenced string is the same class of risk. Use json/ ast.literal_eval for safe parsing, and JSON/msgpack/protobuf instead of pickle for any data that crosses a trust boundary.

What are the security risks of `pickle`, `eval`, and `exec`, and how do you avoid them?

Quick Answer

Detailed Answer

Why unpickling untrusted data is a full remote-code-execution risk

`eval()`/`exec()`: running arbitrary source directly

The safe alternatives

For serialization across a trust boundary, avoid pickle entirely

Related Resources

What are the security risks of `pickle`, `eval`, and `exec`, and how do you avoid them?

Quick Answer

Detailed Answer

Why unpickling untrusted data is a full remote-code-execution risk

eval()/exec(): running arbitrary source directly

The safe alternatives

For serialization across a trust boundary, avoid pickle entirely

Related Resources

`eval()`/`exec()`: running arbitrary source directly