So you need to let users upload files. Profile pictures, maybe? Documents? "How hard could it be?" you think. Just accept the file, check if it's a JPEG, save it somewhere. Ship it Friday, celebrate with the team.
Congratulations, you've just opened a portal straight to hell.
Let's talk about the OWASP File Upload Cheat Sheet. If you haven't read it, bookmark this blog and go read it right now. I'll wait.
Back? Good. Scared? You should be.
OWASP maintains this cheat sheet because developers keep making the same mistakes, and attackers keep loving them for it.
Let's start with everyone's favorite: client-side validation. You know, that thing that makes you feel like you're doing security while actually doing UX.
// Basic client-side validation in React
function FileUpload() {
const handleFileChange = (e: React.ChangeEvent<HTMLInputElement>) => {
const file = e.target.files?.[0];
if (!file) return;
// File type check
const allowedTypes = ['image/jpeg', 'image/png', 'image/webp'];
if (!allowedTypes.includes(file.type)) {
alert('Only JPEG, PNG, and WebP images are allowed');
return;
}
// File size check (5MB limit)
const maxSize = 5 * 1024 * 1024;
if (file.size > maxSize) {
alert('File size must be less than 5MB');
return;
}
// Check image dimensions
const img = new Image();
img.onload = () => {
if (img.width > 2000 || img.height > 2000) {
alert('Image dimensions must be less than 2000x2000');
} else {
uploadFile(file);
}
};
img.src = URL.createObjectURL(file);
};
return <input type="file" accept="image/*" onChange={handleFileChange} />;
}
This is nice for users! They get instant feedback. They don't accidentally upload their tax returns instead of their profile picture. It's friendly. It's helpful. It's completely worthless for security.
Why? Because attackers don't use your carefully crafted React component. They are calling your API directly.
That beautiful validation you wrote? Attackers are laughing at it while uploading definitely-not-malware.exe.jpg.
Client-side validation is for honest users making honest mistakes. But security? Nah.
Okay, NOW we're doing security. Everything you did on the client side? Do it again on the server. Every. Single. Check. I don't care if it feels redundant. That's the point.
# Python/Flask example - This alone is basically waving at hackers
def validate_extension(filename):
allowed_extensions = {'.jpg', '.jpeg', '.png', '.webp'}
ext = os.path.splitext(filename)[1].lower()
return ext in allowed_extensions
"Look, it has .jpg in the name, must be safe!" - Developer who's about to learn a valuable lesson
Attackers have been bypassing this since before you and I were writing code:
malware.php.jpg (which file extension wins? Spoiler: not the one you want)malware.php%00.jpg (that .jpg gets truncated and whoops, you're executing PHP)malware.PhP (because apparently computers can't read)malware.php with invisible zero-width characters (yes, really)File extensions are suggestions. Suggestions made by the attacker. Would you take security advice from someone trying to hack you? Then why are you trusting their filename?
# Checking Content-Type header - This is what amateurs do
content_type = request.headers.get('Content-Type')
if content_type not in ['image/jpeg', 'image/png']:
return 'Invalid file type', 400
Oh sweet summer child. The Content-Type header is sent by the client. You know, the client we JUST established is controlled by the attacker who wants to ruin your day?
They can set it to image/jpeg. They can set it to image/cute-puppy. They can set it to definitely-not-malware/promise. The HTTP request doesn't care about your feelings.
The slightly better approach - actually reading the file content:
import magic
def validate_mime_type(file_path):
# Use python-magic to detect ACTUAL MIME type from file content
mime = magic.Magic(mime=True)
detected_mime = mime.from_file(file_path)
allowed_mimes = ['image/jpeg', 'image/png', 'image/webp']
if detected_mime not in allowed_mimes:
return f'Invalid MIME type: {detected_mime}', 400
return detected_mime, 200
This is better because it reads the actual file content instead of trusting headers. But even this isn't bulletproof - an executable pretending to be an image can fool MIME detection. A PHP web shell cosplaying as a JPEG? Still possible. MIME types are just educated guesses with better accuracy.
Okay, NOW we're cooking. Every file type has a unique signature (called "magic bytes" because apparently security people are wizards) at the beginning. JPEG files start with FF D8 FF, PNG files start with 89 50 4E 47.
This is actually harder to fake:
def validate_file_signature(file_stream):
# Read first bytes
header = file_stream.read(12)
file_stream.seek(0) # Reset stream
# JPEG signatures
if header[:3] == b'\xFF\xD8\xFF':
return 'jpeg'
# PNG signature
if header[:8] == b'\x89PNG\r\n\x1a\n':
return 'png'
# WebP signature
if header[:4] == b'RIFF' and header[8:12] == b'WEBP':
return 'webp'
return None
# Usage
actual_type = validate_file_signature(file)
if not actual_type:
return 'Invalid file format', 400
This is much better! We're actually checking what the file IS, not what it CLAIMS to be.
But of course, attackers have a counter-move. There's a thing called polyglot files - files that are somehow valid in multiple formats at the same time (like someone who speaks three languages, except evil). An attacker can craft a file that passes your JPEG validation AND contains executable code.
Because file formats are complicated, and where there's complexity, there's exploitation.
Even after checking magic bytes, you should run the file through image libraries and re-encode it. Think of it as putting the file through a car wash, except instead of removing dirt, you're removing malware:
from PIL import Image
from io import BytesIO
def sanitize_image(file_stream):
try:
# Open and validate the image
img = Image.open(file_stream)
img.verify() # Verify it's actually an image
# Reopen (verify() closes the file)
file_stream.seek(0)
img = Image.open(file_stream)
# Re-encode to strip metadata and potential exploits
output = BytesIO()
img.save(output, format='JPEG', quality=85)
output.seek(0)
return output
except Exception as e:
raise ValueError(f'Invalid image file: {str(e)}')
This strips out EXIF data, comments, and any embedded nasties. It's like photocopying a photocopied document - some information gets lost in translation. The good kind of information loss.
But plot twist: even image processing libraries have vulnerabilities. Remember ImageTragick? Yeah. The tools meant to protect you can become attack vectors. Security is a nightmare and we're all just doing our best.
User-provided filenames are basically attack vectors with extra steps. Check out this totally innocent filename:
../../etc/passwd
If you use this directly in a file path, congratulations! You've just let an attacker read (or overwrite) system files. Your /etc/passwd is now their /etc/passwd.
Or how about this beauty:
; rm -rf / ;.jpg
If this filename ends up in a shell command somewhere in your stack, you're not just having a bad day. You're having a "update your resume" kind of day.
NEVER. USE. USER. FILENAMES. I don't care if it makes the UX slightly worse. I don't care if your PM wants users to see their original filename. Generate your own:
import uuid
import os
def generate_safe_filename(original_filename):
# Extract extension (AFTER validation, not before!)
ext = os.path.splitext(original_filename)[1].lower()
# Generate UUID
unique_name = str(uuid.uuid4())
return f"{unique_name}{ext}"
# Store mapping in database if you really need to show original names
# user_filename -> generated_filename
Let the user see their original filename in the UI if you must. Store it in the database. But on disk? UUID or bust.
Where you store files is the difference between "we had a security incident" and "we're on the front page of Hacker News (not in a good way)".
❌ The "I Like To Live Dangerously" Approach: Storing files in your web application directory
/var/www/app/uploads/user_file.php
# If the webserver executes this... narrator: it will
✅ The "I Read The Manual" Approach: Store files outside the webroot
/var/file_storage/uploads/uuid-123.jpg
# Webserver can't execute these directly. Attacker sad.
✅ The "I Have AWS Credits" Approach: Use cloud storage
// Supabase example - let someone else deal with security
const { data, error } = await supabase.storage
.from('user-uploads')
.upload(`${userId}/${uuid}.jpg`, file, {
cacheControl: '3600',
upsert: false
});
And for the love of all that is holy, set proper permissions:
If your files have execute permissions "just in case", I have questions. Mainly: why do you hate your future self?
Attackers don't just want to hack you. Sometimes they just want to ruin your day by filling up your entire server with garbage:
# Set hard limits or cry later
MAX_FILE_SIZE = 5 * 1024 * 1024 # 5MB
MAX_FILES_PER_HOUR = 10 # Stop. Uploading. Things.
MAX_TOTAL_STORAGE_PER_USER = 50 * 1024 * 1024 # 50MB total
@app.route('/upload', methods=['POST'])
def upload():
# Check size BEFORE reading the entire 50GB "image"
content_length = request.content_length
if content_length and content_length > MAX_FILE_SIZE:
return 'File too large', 413
# Rate limiting - because some people have no chill
user_uploads = get_user_upload_count(user_id, hours=1)
if user_uploads >= MAX_FILES_PER_HOUR:
return 'Rate limit exceeded', 429
And then there's zip bombs. Oh boy. A 42KB file that decompresses to 4.5 petabytes. It's like the TARDIS of malicious files - bigger on the inside. Always check decompressed size:
import zipfile
def safe_extract(zip_path, max_size=100 * 1024 * 1024): # 100MB
total_size = 0
with zipfile.ZipFile(zip_path) as zf:
for member in zf.namelist():
total_size += member.file_size
if total_size > max_size:
raise ValueError('Nice try. Decompressed size exceeds limit.')
# Okay, it's probably safe
zf.extractall()
Don't be the person who explains to their boss why AWS charged $50,000 for storage this month.
SVG files are images, right? They go in <img> tags. They're just vectors and paths and... wait, what's that <script> tag doing there?
<svg xmlns="http://www.w3.org/2000/svg">
<script>
alert('Surprise! XSS via SVG!');
// Or: steal session tokens, perform actions as the user, general chaos
</script>
</svg>
SVG files are XML. XML can contain JavaScript. If you allow SVG uploads and serve them with Content-Type: image/svg+xml, congratulations on your new XSS vulnerability! Your security team will love this.
Solutions (pick your poison):
Content-Type: text/plain with X-Content-Type-Options: nosniff (they won't render, but they also won't execute)There's no good answer here. SVG is the file format equivalent of "we have security at home".
Image files contain EXIF metadata - GPS coordinates, camera model, the date your iPhone decided to save photos with the wrong timezone. But also: malicious payloads that exploit parser vulnerabilities.
# Strip all metadata - burn it with fire
from PIL import Image
def strip_exif(image_path):
image = Image.open(image_path)
data = list(image.getdata())
clean_image = Image.new(image.mode, image.size)
clean_image.putdata(data)
clean_image.save(image_path)
Is this overkill? Maybe. Is it necessary? Yes. Metadata is where attackers hide their fun surprises.
If you're processing files with system commands, you're in dangerous territory:
# This is basically asking to get hacked
os.system(f"convert {filename} -resize 800x600 {output}")
# Attacker uploads: "image.jpg; rm -rf /"
# Your server executes: convert image.jpg; rm -rf / -resize 800x600 output
# Your career status: ???
Always use parameterized commands. Always. Always:
# This is how adults do it
subprocess.run([
'convert',
filename,
'-resize',
'800x600',
output
], check=True)
The extra syntax is annoying. Know what's more annoying? Explaining why the server's root directory is empty.
Here's the part where I ruin your day: even if you implement EVERYTHING I've mentioned - extension validation, MIME checking, signature verification, content sanitization, safe storage, rate limiting, proper error handling, security headers, and a sacrificial offering to the security gods - you're still not fully protected.
Why? Because:
This is where external security services enter the chat. These are the specialized tools that make security their entire job:
import requests
def scan_with_virustotal(file_path, api_key):
url = 'https://www.virustotal.com/api/v3/files'
with open(file_path, 'rb') as f:
files = {'file': f}
headers = {'x-apikey': api_key}
response = requests.post(url, files=files, headers=headers)
if response.status_code == 200:
analysis_id = response.json()['data']['id']
# Poll for results... (not shown because this is already long)
return check_analysis(analysis_id, api_key)
These services provide:
File uploads are a minefield wearing a "kick me" sign. Every validation layer you skip is a door you left open with a welcome mat. Every "it's probably fine" is a future post-mortem.
The OWASP cheat sheet isn't paranoia - it's documented history. These attacks don't just "could happen". They do happen. They're happening right now to someone who thought they were being careful.
Your client-side validation is UX theater. Your server-side validation alone is security theater. Even perfect validation isn't enough without external scanning watching your back.
You need defense in depth. You need to trust nothing. You need to validate everything. And even then, you need a plan for when something sneaks through.
Because in security, it's not "if something goes wrong". It's "when".
Now go forth and implement all of this. Your future self (and your security team) will thank you.