Python Offensive Security 101: Python Fundamentals

Notes based on the 'Introduction to Python for Offensive Security' course by Red Team Leaders (https://courses.redteamleaders.com/courses/ce82d160-2ef7-4f2c-8b15-e5ea742b1877 ).

Data Types

Summary

Arbitrary-Precision Integers int

Python integers have unlimited precision. No 32/64-bit limits.

Common Security Uses:

Addresses, offsets, opcode fields
Encoding binary values (IP, ports)
Cryptographic math (RSA, ECC)
Payload and packet manipulation

Advantages:

Native support for big numbers

Pitfalls:

You must know how many bytes to allocate in .to_bytes()
Large values may crash when packed into small byte fields

EXAMPLE:

n = 0x1337
n.bit_length()              # → 13
n.to_bytes(2, 'big')        # → b'\x13\x37'
int.from_bytes(b'\x13\x37', 'big')  # → 4919

Binary Floating Point (IEEE 754) float

64-bit binary floats, prone to rounding errors.

Common Security Uses:

Packet timestamps
Timing-based attacks
Network latency measurements

Pitfalls:

Tiny rounding errors can ruin time-sensitive exploits

EXAMPLE:

0.1 + 0.2   # → 0.30000000000000004 ❌

from math import isclose
isclose(0.1 + 0.2, 0.3)  # → True 

# Precise Alternatives
from decimal import Decimal
Decimal("0.1") + Decimal("0.2")  # → 0.3 (exact)

from fractions import Fraction
Fraction(1, 3) + Fraction(1, 3)  # → 2/3

Booleans bool

Subclass of int. True == 1, False == 0.

Common Uses:

Network state checks
Execution flags
Guard conditions in exploits

Pitfalls:

None may look falsy but may mean "no value" (not False)

EXAMPLE:

#Falsy values
if 0, if "", if [], if {}, if None  # All are False

# Custom truth in clases
def __bool__(self):
    return self.is_alive

Immutable Byte Sequences bytes

Raw bytes (0–255), immutable.

Common Security Uses:

Shellcode
Network protocol crafting
Hashing, signatures, binary blobs

Pitfalls:

Cannot modify without creating a copy

EXAMPLE:

b = b"\x90\x90\xcc"
b.hex()                     # → '9090cc'
bytes.fromhex("9090cc")     # → b"\x90\x90\xcc"

Mutable Byte Sequences bytearray

Like bytes, but modifiable in-place.

Common Security Uses:

Shellcode patching
Dynamic payloads
Manual field injection in binary headers

Advantages:

You can edit without reallocating

Pitfalls:

Not hashable → cannot use as dictionary keys

EXAMPLE:

payload = bytearray(b"\x90" * 10)
payload[0] = 0xcc  # Replace first NOP with INT3

Unicode Text str

Unicode strings, used for readable text manipulation.

Common Security Uses:

Shell command composition
Fuzzing input
Regex on logs or files
Filenames, domains, paths

Pitfalls:

String concatenation with + in loops is slow

EXAMPLE:

# Useful Functions
s.startswith("GET")
s.endswith(".php")
s.find("admin")
s.replace("rm", "echo")
s.split(" ")
"".join(["a", "b", "c"])

# Decode Network Input
text = b"\xe2\x9c\x94".decode("utf-8", errors="replace")

Binary Packing & Unpacking struct

Format numbers into raw byte sequences, or extract them.

Common Security Uses:

Building headers (DNS, TCP/IP, etc.)
Parsing binary formats
Exploit payload structure

Pitfalls:

Incorrect byte order = broken payload

Format codes:

! → Network (big-endian)
H → 2-byte unsigned short
I → 4-byte unsigned int
B → 1 byte

EXAMPLE:

import struct
pkt = struct.pack("!H", 0x1337)      # → b'\x13\x37'
val = struct.unpack("!H", pkt)[0]    # → 4919

Quick Function Reference Table

Function

Data Type

Use Case

int(x, base)

int

Convert string to int (e.g., hex)

int.from_bytes(b, endian)

int ↔ bytes

Bytes → Integer

n.to_bytes(n_bytes, endian)

int ↔ bytes

Integer → Bytes

hex(n)

int

Int → hex string

bytes.fromhex(s)

bytes

Hex string → Bytes

b.hex()

bytes

Bytes → Hex string

bytearray(b)

bytearray

Mutable byte buffer

struct.pack(fmt, ...)

struct

Pack data to binary format

struct.unpack(fmt, b)

struct

Unpack binary to values

isclose(a, b)

float

Approximate float comparison

Decimal("0.1")

decimal

Exact decimal arithmetic

Fraction(1, 3)

fraction

Rational numbers

s.find(sub)

str

Find substring

s.replace(a, b)

str

Replace substrings

" ".join(list)

str

Join list of strings

binascii.hexlify(b)

bytes

Manual hex encoding

base64.b64encode(b)

bytes

Encode payload in base64

hashlib.sha256(b).hexdigest()

hashlib

SHA-256 hash for binary

Mini CheatSheet Snippets

# Int ↔ Bytes
i = int.from_bytes(b"\xde\xad", 'big')
b = i.to_bytes(2, 'big')

# Shellcode from hex
shellcode = bytes.fromhex("fc4883e4f0e8")

# DNS packet header
import struct
header = struct.pack("!HHHHHH", 0x1337, 0x0100, 1, 0, 0, 0)

# Patch shellcode
buf = bytearray(b"\x90" * 8)
buf[0] = 0xcc  # Change first NOP to INT3

Data Structures

Summary

Lists list

Lists = mutable, ordered containers, great for short-lived target batches, scan results, and small queues. Don’t use them as huge persistent stores.

Basic Ops

ports = [21, 22, 80, 443]   # create
ports.append(8080)          # add
ports[0] = 2121             # modify
del ports[1]                # delete
len(ports)                  # count items

Filtering & Transforming

open_ports = [p for p in ports if is_open(p)]     # fast filter
targets = [(ip,p) for ip in hosts for p in ports] # combine
g = (p for p in all_ports if is_open(p))          # generator (less memory)

Stacks & Queues

# Stack (LIFO)
stack.append(x); x = stack.pop()

# Queue (FIFO)
from collections import deque
q = deque(); q.append(x); x = q.popleft()

Use deque instead of list.pop(0) for performance.

Slicing & Copying

first3 = ports[:3]             # shallow copy
rev = ports[::-1]              # reverse
ports.sort()                   # in-place sort
sorted_ports = sorted(ports)   # new list

Performance & Memory

Each slot stores a pointer (~8 bytes).
Use array, numpy, or generators for large numeric data.
list.clear() reuses memory instead of reallocating.

Security & Concurrency

Not thread-safe → use queue.Queue for parallel scans.
Don’t keep secrets in lists. Clear (clear() / del) sensitive data after use.
Serialize results carefully (avoid leaking credentials or tokens).

EXAMPLE:

ports = [22, 80, 443]

# filter open ports
open_ports = [p for p in ports if is_open(p)]

# Example output (if 22 and 443 are open):
# open_ports = [22, 443]

Tuples tuple

Tuples = immutable, ordered containers. Great for storing coordinates, IP/port pairs, or any read-only sequences in scans and pipelines.

Basic Ops

coords = (10.0, 20.5)
ip_port = ("10.0.0.5", 22)
len(coords)                  # 2
coords[0]                    # 10.0

Immutable → cannot append, delete, or change elements.

Common Use Cases

Keys in dicts or sets → hashable and safe
Return multiple values from a function: return ip, port, status
Read-only sequences → e.g., default headers, config tuples

Unpacking Tricks

host, port = ip_port           # simple unpack
a, *middle, z = range(10)      # starred unpacking

Handy for extracting first/last elements or splitting ranges in scanners

Namedtuple / Dataclass

from collections import namedtuple
Result = namedtuple("Result", "ip port status")
r = Result("10.0.0.5", 80, "open")
r.ip                                              # "10.0.0.5"
r.port                                            # 80

Gives attribute access
Immutable & memory-light
Perfect for storing scan results, port statuses, or target info

EXAMPLE

def scan_target(ip, ports):
    results = []
    for p in ports:
        status = "open" if is_open(ip,p) else "closed"
        results.append(Result(ip, p, status))
    return results

res = scan_target("10.0.0.5", [22,80,443])

# Example output (repr of namedtuples):
# [
#   Result(ip='10.0.0.5', port=22, status='open'),
#   Result(ip='10.0.0.5', port=80, status='closed'),
#   Result(ip='10.0.0.5', port=443, status='open')
# ]

Dictionaries dict

Dicts = mutable, key-value stores . Ideal for mapping services, ports, hosts, scan results.

Basic Ops

services = {"ssh": 22, "http": 80, "https": 443}
services["dns"] = 53                                 # add/update
port = services.get("ftp", 21)                       # get with default

Iteration View

for key, value in services.items():
    print(key, value)
# items(), keys(), values() are dynamic views reflecting mutations

Useful Variants

from collections import defaultdict, Counter

# Auto-create lists: group hosts by ASN
hosts_by_asn = defaultdict(list)
hosts_by_asn[123].append("10.0.0.1")

# Count frequencies: e.g., top user-agents
ua_count = Counter(["curl","curl","python-requests"])

# OrderedDict (order guaranteed + move_to_end)
from collections import OrderedDict
od = OrderedDict()
od["first"] = 1
od["second"] = 2
od.move_to_end("first")  # move key to end

Security Note

Hash-flood attacks: huge sets of colliding keys can slow lookups to O(n).
CPython 3.3+ uses random hash seeds to mitigate this, still relevant when fuzzing protocols or parsing untrusted input.

EXAMPLE:

services = {"ssh":22,"http":80,"https":443}
services["dns"] = 53
print(services.items())

# Output (order may vary):
# dict_items([('ssh', 22), ('http', 80), ('https', 443), ('dns', 53)])

Sets set

Sets = unordered, mutable collections for fast membership and deduplication. Ideal for pruning wordlists and tracking visited hosts.

Basic Ops

seen = {"10.0.0.5","10.0.0.7"}
seen.add("10.0.0.8")
"10.0.0.5" in seen                # True
seen.discard("1.2.3.4")           # no error if missing
# seen.remove(x) raises if x not present

Useful methods

seen.pop()    # remove arbitrary item
seen.clear()  # empty set
A.issubset(B); A.issuperset(B)
len(seen)

Comprehension & math ops

uniq = {x for x in wordlist if valid(x)}
intersection = A & B
difference   = B - A
union        = A | B
symmetric    = A ^ B

Use cases (offsec)

Prune huge wordlists (uniq = set(wordlist))
Track visited nodes in graph/bfs/dfs
Fast membership checks before expensive probes

Perf & security notes

Membership = O(1) average.
Sets use hashing like dicts → vulnerable conceptually a hash-collision flood (same mitigations as dicts).
Prefer sets for memory-light dedup of moderate-size lists; for huge datasets consider Bloom filters or disk-backed DB.

Characteristics Summary

Characteristic

list

tuple

dict

set

Mutability

Yes

Ordering

Yes

Allows Duplicates

Yes

Keys unique, values can duplicate

Hashable

Yes

Keys only

Yes

Typical Use (offsec)

Target batches, scan results, stacks/queues

IP/port pairs, coordinates, read-only constants

Map service→port, results keyed by host/IP

Dedup targets, fast membership checks, visited nodes

Control Flow

Boolean Contexts

Any object can be tested for truth:

False: False, None, 0, 0.0, "", [], {}, set()
True: everything else

if items: process(items)  # executes if list non-empty

Conditional Expressions

if code == 200:
    status = "open"
elif code == 408:
    status = "timeout"
else:
    status = "closed"

Loops

`while`

Evaluates before each iteration
while ... else executes else if loop exits normally (no break)

ports = [22, 80, 443, 8080]
target = "10.0.0.5"
i = 0

while i < len(ports):
    port = ports[i]
    if is_open(target, port):
        print(f"{target}:{port} -> OPEN")
        break                                  # stop at first open port (optional)
    i += 1
else:
    print(f"No open ports found on {target}")   # only runs if no break

`for`

Iterates over any iterable
Supports enumerate, zip, unpacking
for ... else executes else if loop completes normally

for user in users:
    if user.id == target:
        found = user; break
else:
    found = None

Loop Control

Staement

Effect

break

Exit innermost loop immediately

continue

Skip rest of iteration, continue loop

pass

Do nothing (placeholder)

Comprehensions

Build lists, sets, dicts, generators concisely

# List comprehension: build list of target ports to scan
ports = [22, 80, 443, 8080]
open_ports = [p for p in ports if is_open("10.0.0.5", p)]
# Example output (if 22 and 443 are open):
# open_ports = [22, 443]

# Set comprehension: deduplicate discovered hosts
hosts = ["10.0.0.1","10.0.0.2","10.0.0.1"]
uniq_hosts = {h for h in hosts}
# uniq_hosts = {'10.0.0.1','10.0.0.2'}

# Dict comprehension: invert service→port mapping
services = {"ssh":22,"http":80,"https":443}
port_to_service = {v:k for k,v in services.items()}
# port_to_service = {22:'ssh', 80:'http', 443:'https'}

# Generator comprehension: lazy evaluation for scanning huge IP ranges
targets = [f"10.0.0.{i}" for i in range(1,255)]
gen_scan = (ip for ip in targets if is_alive(ip)) 
# yields live hosts one by one

Own local scope; avoids leaking loop vars

Pattern Matching (`match ... case`) 3.10+

Multi-way branch with destructuring

match command.split():
    case ["GET", path]: handle_get(path)
    case ["POST", path, data]: handle_post(path, data)
    case _: print("Unknown")

Functions

Definition

In Python, functions are defined with def followed by a name. The docstring describes what the function does and can be viewed using help(). The return statement sends a value back; if not used, the function returns None.

def check_vulnerability(ip):
    """Check if the IP is vulnerable."""
    return f"IP {ip} is vulnerable."

Argument Handling

Python uses pass-by-object-reference, meaning that if a function modifies a mutable object, the change is reflected outside the function. Reassigning the parameter does nothing; only in‑place mutations affect the caller.

def add_finding(report):
    report.append("Potential vulnerability detected")

pentest_report = []
add_finding(pentest_report)
# pentest_report → ["Potential vulnerability detected"]

Types of Calls

Positional: good for simple, fast operations (e.g., quick scan routines).
Keyword: clearer when optional parameters matter.
Mixed: combines both styles.

def scan_port(host, port, verbose=False):
    ...

scan_port("10.0.0.5", 22)                         # positional
scan_port(host="10.0.0.5", port=22, verbose=True) # keyword
scan_port("10.0.0.5", port=80)                    # mixed

Default Argument Values

Default values are evaluated once, when the function is defined. Using mutable defaults can cause unexpected shared state between calls.

Bad example:

def store_target(ip, targets=[]):
    targets.append(ip)
    return targets

Good example:

def store_target(ip, targets=None):
    if targets is None:
        targets = []
    targets.append(ip)
    return targets

Variable-Length Parameters

*args collects extra positional arguments (e.g., multiple hosts).
**kwargs collects keyword metadata (e.g., tool name, scan type).

def log_event(event, *hosts, **details):
    msg = event + " -> " + ", ".join(hosts)
    if details:
        msg += f" [{details}]"
    print(msg)

Calling with unpacking:

log_event("Scan completed", *["10.0.0.1", "10.0.0.2"], tool="nmap")

Positional-Only and Keyword-Only Parameters

Positional-only helps avoid accidental keyword usage.
Keyword-only ensures clarity for functions that require explicit naming.

# The / means: these parameters cannot be used by name.
def normalize_port(a, b, /):
    ...
    return a, b

# The * means: everything that comes after must be called by name.
def connect_to_service(*, host, port):
    ...

Advanced Functions

Type Annotations and Type Hints

Annotations document expected types but do not enforce them at runtime. They help IDEs, linters, and static analyzers.

from typing import Iterable, List

def flatten_results(results: Iterable[List[str]]) -> list[str]:
    """Flatten nested scan results."""
    return [item for group in results for item in group]

flatten_results([["open:80"], ["open:443"]])
# → ['open:80', 'open:443']

First-Class and Higher-Order Functions

Functions can be passed around like any other value—useful for scan pipelines, dispatch tables, or callbacks.

def twice(f, x):
    return f(f(x))

def increment(n):
    return n + 1

print(twice(increment, 3))
# → 5

Lambda Expressions

Anonymous, single-expression functions. Good for simple sorting, filtering or scoring logic.

sort_by_severity = lambda finding: finding["severity"]

sort_by_severity({"id": 1, "severity": 5})
# → 5

Closures

A closure remembers the environment in which it was created—useful for counters, tracking state, or small in-memory registries.

def make_counter():
    count = 0
    def next_id():
        nonlocal count
        count += 1
        return f"result_{count}"
    return next_id

c = make_counter()
c(), c(), c()
# → ('result_1', 'result_2', 'result_3')

Decorators

Decorators wrap a function to extend behavior—perfect for timing scans, adding logging, or enforcing preconditions.

from functools import wraps
import time

def timing(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        start = time.perf_counter()
        result = func(*args, **kwargs)
        elapsed = time.perf_counter() - start
        print(f"{func.__name__} took {elapsed:.3f}s")
        return result
    return wrapper

@timing
def run_port_scan():
    time.sleep(0.1)  # simulation
    return "scan done"

run_port_scan()
# Output approx.:
# run_port_scan took 0.100s
# 'scan done'

Generator Functions

Generators produce values lazily, ideal for processing large lists of hosts or ports without consuming a lot of memory.

def host_range(start, end):
    for i in range(start, end):
        yield f"10.0.0.{i}"

list(host_range(1, 4))
# → ['10.0.0.1', '10.0.0.2', '10.0.0.3']

Recursion

Python supports recursion, but the stack depth is limited (~1000). For deep structures (e.g., large nested JSON scan results), iterative solutions are safer.

Useful Standard Library Tools

functools.lru_cache – cache repeated lookups (e.g., DNS lookups, parsing).
functools.partial – pre‑configure a function (e.g., fix a port or timeout).
itertools – lazy pipelines for processing scan data.
contextlib.contextmanager – create with-style context managers (e.g., temporary connections).

Introspection

Inspect functions at runtime—a must for plugin systems, dynamic dispatch or auto‑generating CLI tools.

import inspect

def analyze(host, retries=3):
    return "ok"

inspect.signature(analyze)
# → (host, retries=3)

analyze.__defaults__
# → (3,)

Performance Notes

Local variables are faster than globals.
Default arguments are evaluated once, so avoid expensive defaults.

def merge_results(a, b, _merge=dict.update):
    _merge(a, b)
    return a

merge_results({"open": [80]}, {"open": [443]})
# → {'open': [443]}

Modules & Packages

In offensive security, Python scripts frequently evolve from small proof-of-concepts into full operator toolkits. As the codebase grows, modules and packages become essential for maintaining clarity, reusability, and operational reliability.

Modules

A module is a single .py file containing names such as variables, classes, and functions.

Importing Modules

Importing a full module: import module binds the entire module object.

import encoder

Importing specific functions: from module import x binds only the referenced names.

from recon.dns_enum import brute_force_subdomains

Aliasing for cleanner code: as provides an alias, often useful when avoiding naming collisions in large frameworks.

import c2_client as c2

`all` and `import *`

__all__ defines the list of names that a module intentionally exposes when someone uses:

from module import *

Example:

__all__ = ["PublicClass", "public_fn"]

With this, only PublicClass and public_fn are imported.

Without __all__, import * imports every name not starting with an underscore, which may expose internal helpers unintentionally.

Using __all__ makes the module’s public API explicit and controlled.

Modules Execution Context

Every Python module has a specific execution context, and two dunder variables are automatically defined:

__name__: indicates how the module is being executed.
- When the file is run directly: __name__ == "__main__".
- When the file is imported: __name__ becomes the module’s name (e.g., "my_module").
__package__: indicates which package the module belongs to.
- Inside a package, it stores the package name.
- At the top level, it is an empty string.

This enables the script guard, a pattern that ensures certain code (e.g., test routines or demos) runs only when the file is executed directly, and not when imported as a module.

def _demo():
    ...

if __name__ == "__main__":
    _demo()

Module Search Path

Python resolves imports using the Module Search Path, visible with:

import sys
print(sys.path)

Search order:

Script directory
ZIP files in sys.path
Standard library
PYTHONPATH entries

sys.path can be edited, but virtual environments or proper packaging are preferred.

Packages and Sub-packages

A package is a directory with an optional __init__.py file.

It can contain:

modules
sub-packages
nested hierarchies (e.g., c2/http/handlers.py)

Packages enable the creation of larger, organized red-team frameworks instead of single-file scripts.

mypkg/
    __init__.py
    core.py
    utils/
        __init__.py
        helpers.py

Key points:

Absolute imports (mypkg.core) are clearer and recommended (PEP 328).
Relative imports (using .) only work inside packages.
Without __init__.py, Python creates a namespace package (PEP 420), useful for large plugin-style projects.

from mypkg.core import Engine      # absolute import
from .helpers import slugify       # relative import (inside utils)

The `init.py` File

__init__.py defines what a package exposes when imported. It can:

Specify public exports with __all__
Re-export useful names at the package level
Stay lightweight (avoid heavy work on import)

# mypkg/__init__.py
from .core import Engine, Version
__all__ = ["Engine", "Version"]

Resource Files Inside Packages

Python packages can include resource files (e.g., wordlists, templates, payloads). Since Python 3.9, the recommended way to load them is:

from importlib.resources import files

data = files("mypkg").joinpath("data/wordlist.txt").read_text()

This method works even if the package is distributed as a zip or a wheel, making resource access reliable and portable.

Installing External Packages

Python packages are installed with pip:

pip install requests              # installs a package
pip freeze > requirements.txt     # outputs all installed versions into a file
pip install -r requirements.txt   # installs everything listed in the file

requirements.txt captures all installed versions.

For reproducible builds—especially important in security tooling—use exact versions:

requests==2.32.3

Distributing a Python Package

A pyproject.toml defines the metadata and configuration needed to package and publish a Python project:

[project]
name = "mypkg"
version = "0.1.0"
description = "Sample package"
authors = [{ name = "You", email = "you@example.com" }]
requires-python = ">=3.9"
dependencies = ["requests>=2.31"]

[project.scripts]
mypkg = "mypkg.cli:main"     # exposes a CLI command

console_scripts creates a cross-platform command-line entry point (e.g., running mypkg launches mypkg.cli.main()).

Build and publish workflow:

python -m pip install build twine   # installs packaging tools
python -m build                     # generates wheel + source distribution
python -m twine upload dist/*       # uploads the files to PyPI

Each command plays a different role:

install build/twine → prepares the environment
build → packages your project
upload → publishes it to a repository

Virtual Environments

A virtual environment creates an isolated Python interpreter with its own site-packages. This avoids dependency conflicts between projects and ensures repeatable setups.

Creation and activation:

python3 -m venv .venv
source .venv/bin/activate        # Linux/macOS
.\.venv\Scripts\activate         # Windows

Once activated, all pip installs go inside .venv/, not the system Python.

Why it matters:

Keeps tools and libraries separated per project
Avoids version collisions (e.g., offensive-security scripts needing older libs)
Ensures reproducible environments when sharing code or deploying
Prevents breaking system-wide Python packages

File Handling

File handling is a core capability in Python, and it becomes especially important in offensive security tooling, where scripts frequently need to read wordlists, store results, process logs, or manipulate extracted data.

Open Function

open() is the standard interface for working with files in Python. It controls how the file is accessed (whether it's read, written, appended, or opened in binary mode) and also handles encoding and buffering.

fh = open("report.txt", mode="r", encoding="utf-8", newline="")
data = fh.read()
fh.close()

Function signature:

open(file, mode="r", buffering=-1, encoding=None, errors=None, newline=None)

Common modes

Mode

Meaning

"r"

read (default)

"w"

write, truncating existing file

"a"

append (create if missing)

"x"

create exclusively, fail if file exists

"+"

update (read + write)

add "b"

binary mode ("rb", "wb", etc.)

Manually closing files is required to flush buffers and release OS handles.

The preferred pattern is the context manager form:

# Logging results from a simple offensive scan
with open("scan_log.txt", "a", encoding="utf-8") as log:
    log.write("Found open port 445 on target\n")

Within a with block, __enter__ opens the file and __exit__ ensures it closes cleanly—even if an exception occurs.

Text vs Binary

Text mode decodes bytes into str using the specified encoding (defaults to the platform’s encoding).
Binary mode ("rb", "wb") returns raw bytes with no decoding applied.

with open("image.jpg", "rb") as img:
    raw = img.read(1024)

Binary mode is appropriate for non-text data (images, executables, packets, compressed files) or in scenarios where exact byte preservation is required.

Patterns

Reading Patterns

Python provides several file-reading patterns, each suited to different performance and memory needs.

Method

Description

fh.read()

Reads the entire file into memory. Not suitable for very large files.

fh.readline()

Reads one line; the returned string includes the trailing newline.

fh.readlines()

Reads all lines into a list.

for line in fh:

Lazy iteration; memory-efficient for large files.

fh.read(size)

Reads size bytes; returns fewer when EOF is reached.

Example pattern (simple tail implementation):

def tail(path, n=10):
    # Opens the file and reads all lines into a list.
    # If the file contained, for example:
    #   line1
    #   line2
    #   line3
    # Calling tail(path, n=2) would return:
    #   ["line2\n", "line3\n"]
    with open(path, encoding="utf-8") as f:
        return f.readlines()[-n:]

Writting Patterns

Writing uses the same distinction as reading: text mode writes str, while binary mode writes raw bytes. This is common in offensive tooling when generating payloads, saving extracted artifacts, or storing structured results.

Method

Description

fh.write(data)

Writes a string (text mode) or bytes (binary mode). Returns the number of characters/bytes written.

fh.writelines(seq)

Writes a sequence of strings or bytes without adding newlines automatically.

fh.flush()

Forces buffered content to be written to disk immediately.

fh.close()

Flushes buffers and releases the file handle. Automatically handled by with.

Buffered writing is usually faster; unbuffered mode (buffering=0) should be used only when strict real-time writing is needed.

with open("payload.bin", "wb") as out:
    # Writes sixteen NOP bytes (\x90) to the file.
    out.write(b"\x90" * 16)

    # The binary file payload.bin now contains:
    #   90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90

File Positioning

File positioning allows precise control over where the next read or write occurs. This is useful when handling logs, binary blobs, or structured data extracted during offensive operations.

seek() moves the file pointer, while tell() reports its current position.

Offsets are measured from:

the start of the file (whence=0),
the current position (whence=1), or
the end of the file (whence=2).

In text mode, seeking is limited to positions previously returned by tell(). Binary mode provides full byte-level control.

with open("sample.bin", "rb") as f:
    f.seek(0, 2)        # Move to end of file
    size = f.tell()     # File size in bytes

    f.seek(0)           # Rewind to the beginning

The `pathlib` Module

pathlib.Path provides an object-oriented interface for working with filesystem paths. It replaces many os.path calls and produces cleaner, safer code (especially helpful when handling logs, wordlists, payloads, or extracted data during offensive operations).

Common `Path` Methods

Method

Purpose

.exists()

Checks if the file or directory exists

.stat()

Returns metadata such as size, timestamps, and permissions

.iterdir()

Iterates over directory contents

.glob(pattern)

Finds files matching patterns (e.g., "*.txt")

.read_text() / .write_text()

Read/write text files

.read_bytes() / .write_bytes()

Read/write binary files

.unlink()

Deletes a file

.rename(target)

Renames or moves a file

.mkdir(parents=True, exist_ok=True)

Creates directories

.resolve()

Returns the absolute, canonical path

Example

from pathlib import Path

# Construct a platform-agnostic path
report = Path("reports") / "2025" / "summary.csv"

if report.exists():
    size = report.stat().st_size
    print(f"{size} bytes")

# Ensure directory structure exists
report.parent.mkdir(parents=True, exist_ok=True)

# Write a CSV header (useful when generating reports or loot output)
report.write_text("header,value\n")

This approach keeps path handling clean, avoids string concatenation errors, and works consistently across operating systems.

Directory and Metadata Operations

Python provides several modules for interacting with the filesystem beyond simple file reads and writes. os, shutil, and stat expose functions for listing directories, copying files, modifying permissions, and manipulating metadata (operations frequently used in offensive tooling when collecting evidence, staging payloads, or archiving output).

Common Functions

Module / Function

Purpose

os.listdir(path)

Lists directory contents

os.makedirs(path, exist_ok=True)

Creates directory trees safely

os.chmod(path, mode)

Changes file permissions

shutil.copy2(src, dst)

Copies a file while preserving metadata (mtime, permissions)

shutil.copytree(src, dst)

Recursively copies an entire directory

shutil.rmtree(path)

Recursively deletes a directory tree

shutil.make_archive(base, format, root_dir)

Creates zip/tar archives

stat.S_IRUSR, stat.S_IXUSR

Permission flags (read, execute for owner)

Example

import os, shutil, stat

# List files in the current directory
files = os.listdir(".")

# Ensure a backup directory exists
os.makedirs("backup", exist_ok=True)

# Copy a file while preserving metadata
shutil.copy2("source.txt", "backup/source.txt")

# Set file to readable + executable by its owner
os.chmod("run_payload.sh", stat.S_IRUSR | stat.S_IXUSR)

shutil provides higher-level operations that bundle multiple system calls safely, reducing the risk of race conditions or inconsistent states; especially important when tooling handles logs, loot files, or output directories during offensive engagements.

Temporary Files and Atomic Writes

Temporary files are essential when a script needs to generate output safely without exposing partially written data. Python’s tempfile module provides utilities for creating secure, uniquely named temporary files.

An atomic write pattern works by writing data to a temporary file and then replacing the target file in a single filesystem operation. This guarantees that other processes never see an incomplete file (a useful property when storing scan results, staging payloads, or updating logs in environments where multiple tools may read the same files).

from tempfile import NamedTemporaryFile
import os

# Write data to a temporary file
with NamedTemporaryFile("w", delete=False) as tmp:
    tmp.write("draft")
    temp_name = tmp.name

# Atomically replace the final file
os.replace(temp_name, "final.txt")

The os.replace call performs an atomic swap on most filesystems, ensuring that the final file appears fully formed. This pattern prevents corruption, race conditions, and half-written outputs (issues that can cause inconsistent results during offensive tooling execution).

Memory-Mapped Files

Memory-mapped files allow a script to treat file contents as if they were byte arrays in memory. This is especially valuable when dealing with large binaries, disk images, or forensic artifacts, since reads and writes occur without repeatedly copying data between Python and the operating system.

A memory-mapped region exposes the underlying file directly through slicing, enabling fast random access, useful when parsing structured binary formats, scanning raw disk sectors, or extracting payloads from large images.

import mmap

with open("disk.img", "r+b") as fh:
    mm = mmap.mmap(fh.fileno(), 0)   # Map the entire file into memory

    chunk = mm[0x100:0x108]          # Read a slice directly
    print(chunk)

    mm.close()

Because the file is mapped into virtual memory, operations are efficient even for multi-gigabyte data sources. This pattern is well-suited for offensive tooling that needs to inspect or manipulate binary data at arbitrary offsets.

Common Structured Formats

Different file formats are frequently used in offensive tooling—for configuration, storing results, exchanging structured data, or compressing large logs. The following cards summarise the most common ones.

JSON (JavaScript Object Notation)

Simple, human-readable, ideal for configuration files or structured outputs.

import json

with open("conf.json") as f:
    cfg = json.load(f)        # Load JSON into a dict

cfg["enabled"] = True         # Modify values

with open("conf.json", "w") as f:
    json.dump(cfg, f, indent=2)   # Write JSON back to disk

json.load / json.dump → work with file handles
json.loads / json.dumps → work with strings
Useful for tool configuration, module settings, or scan summaries.

CSV (Comma-Separated Values)

Common for lists of targets, credentials, hosts, or scan data.

import csv

with open("hosts.csv", newline="") as f:     # newline="" prevents Windows issues
    for row in csv.DictReader(f):
        print(row["ip"], row["hostname"])

DictReader provides each row as a dictionary.
Always specify newline="" when opening CSV files.

Compressed Files (gzip, bz2, etc.)

Many logs and packet captures are stored compressed to save space.

import gzip

text = gzip.open("audit.log.gz").read().decode()

Behaves like open(), including context-manager support.
Helpful when processing large log archives, telemetry dumps, or exfiltrated datasets.

Pickle (Dangerous!)

Binary serialization of arbitrary Python objects.

import pickle

# pickle.load() can execute arbitrary code — unsafe with untrusted data

Should never be used with attacker-controlled data.
Useful only for trusted internal caching.

Exception Handling

File operations commonly fail due to missing files, permission issues, or invalid paths. Handling these exceptions explicitly helps tools behave predictably, especially when dealing with user-supplied input or external resources.

from pathlib import Path

try:
    data = Path("config.ini").read_text()
except FileNotFoundError:
    print("Missing config; using defaults.")
except PermissionError:
    print("Run with correct privileges.")

It is recommended to catch specific exceptions rather than the broad OSError.
Useful subclasses include: IsADirectoryError, FileExistsError, NotADirectoryError, and others.
Targeted exception handling provides clearer error messages and safer behaviour, particularly in offensive tooling where unexpected paths or permissions are common.

File I/O Best Practices

Performance Guidelines

Read and write in chunks (e.g., read(65536)) for large files.
Use binary mode to avoid per-character encoding overhead when copying raw data.
Prefer Path.read_bytes() and Path.write_bytes() for concise code paths.
Use memory-mapped files (mmap) for random access on multi-gigabyte files.
Avoid many small writes; accumulate into a buffer or use io.BufferedWriter.

Security Guidelines

Concern

Mitigation

Path traversal (../../etc/passwd)

Validate or normalize user-supplied paths with Path.resolve(), then ensure the result stays inside the allowed directory.

Untrusted pickle data

Avoid loading untrusted pickle content; use JSON, MessagePack, or custom formats instead.

Race conditions (TOCTOU)

Use appropriate open flags (e.g., "xb" for exclusive creation) or write to a temporary file and then atomically rename.

Encoding pitfalls

Always specify encoding="utf-8" unless there is a specific need for another encoding.

Leaked file handles

Use with blocks consistently to guarantee file handles are closed.

Last updated 1 month ago

hashtagData Types

hashtagSummary

hashtagQuick Function Reference Table

hashtagMini CheatSheet Snippets

hashtagData Structures

hashtagSummary

hashtagCharacteristics Summary

hashtagControl Flow

hashtagBoolean Contexts

hashtagConditional Expressions

hashtagLoops

hashtagwhile

hashtagfor

hashtagLoop Control

hashtagComprehensions

hashtagPattern Matching (match ... case) 3.10+

hashtagFunctions

hashtagDefinition

hashtagArgument Handling

hashtagTypes of Calls

hashtagDefault Argument Values

hashtagVariable-Length Parameters

hashtagPositional-Only and Keyword-Only Parameters

hashtagAdvanced Functions

hashtagType Annotations and Type Hints

hashtagFirst-Class and Higher-Order Functions

hashtagLambda Expressions

hashtagClosures

hashtagDecorators

hashtagGenerator Functions

hashtagRecursion

hashtagUseful Standard Library Tools

hashtagIntrospection

hashtagPerformance Notes

hashtagModules & Packages

hashtagModules

hashtagImporting Modules

hashtag__all__ and import *

hashtagModules Execution Context

hashtagModule Search Path

hashtagPackages and Sub-packages

hashtagThe __init__.py File

hashtagResource Files Inside Packages

hashtagInstalling External Packages

hashtagDistributing a Python Package

hashtagVirtual Environments

hashtagFile Handling

hashtagOpen Function

hashtagText vs Binary

hashtagPatterns

hashtagReading Patterns

hashtagWritting Patterns

hashtagFile Positioning

hashtagThe pathlib Module

hashtagCommon Path Methods

hashtagExample

hashtagDirectory and Metadata Operations

hashtagCommon Functions

hashtagExample

hashtagTemporary Files and Atomic Writes

hashtagMemory-Mapped Files

hashtagCommon Structured Formats

hashtagJSON (JavaScript Object Notation)

hashtagCSV (Comma-Separated Values)

hashtagCompressed Files (gzip, bz2, etc.)

hashtagPickle (Dangerous!)

hashtagException Handling

hashtagFile I/O Best Practices

hashtagPerformance Guidelines

hashtagSecurity Guidelines

Data Types

Summary

Quick Function Reference Table

Mini CheatSheet Snippets

Data Structures

Summary

Characteristics Summary

Control Flow

Boolean Contexts

Conditional Expressions

Loops

`while`

`for`

Loop Control

Comprehensions

Pattern Matching (`match ... case`) 3.10+

Functions

Definition

Argument Handling

Types of Calls

Default Argument Values

Variable-Length Parameters

Positional-Only and Keyword-Only Parameters

Advanced Functions

Type Annotations and Type Hints

First-Class and Higher-Order Functions

Lambda Expressions

Closures

Decorators

Generator Functions

Recursion

Useful Standard Library Tools

Introspection

Performance Notes

Modules & Packages

Modules

Importing Modules

`all` and `import *`

Modules Execution Context

Module Search Path

Packages and Sub-packages

The `init.py` File

Resource Files Inside Packages

Installing External Packages

Distributing a Python Package

Virtual Environments

File Handling

Open Function

Text vs Binary

Patterns

Reading Patterns

Writting Patterns

File Positioning

The `pathlib` Module

Common `Path` Methods

Example

Directory and Metadata Operations

Common Functions

Example

Temporary Files and Atomic Writes

Memory-Mapped Files

Common Structured Formats

JSON (JavaScript Object Notation)

CSV (Comma-Separated Values)

Compressed Files (gzip, bz2, etc.)

Pickle (Dangerous!)

Exception Handling

File I/O Best Practices

Performance Guidelines

Security Guidelines