Astro Basics

Minwin handles a lot of visual content — profile avatars, post images, cover photos, product shots, and video. Every upload needs to be resized, optimized, and stored in multiple formats before it can be served. I built a media processing pipeline using Sharp for images and FFmpeg for video, orchestrated by BullMQ workers and stored in Cloudflare R2.

Image processing with Sharp

Every image that enters the system goes through a preset-based pipeline. Instead of ad-hoc resize calls scattered across the codebase, I defined a set of presets that map directly to how images are used in the app:

Preset	Dimensions	Quality	Use case
`avatar`	400×400	80	Profile pictures
`post`	1080×1350	80	Feed posts
`cover`	1920×1080	80	Profile covers
`product`	600×600	60	Product listings
`thumbnail`	300×300	60	Preview thumbnails

All output is WebP. The quality split is intentional — hero content (avatars, posts, covers) gets quality 80 for visual fidelity, while secondary content (products, thumbnails) gets quality 60 to keep payload sizes down.

The Sharp pipeline for a single image looks like this:

await sharp(buffer)
  .resize(preset.width, preset.height, { fit: 'cover', position: 'centre' })
  .webp({ quality: preset.quality })
  .toBuffer();

Every post upload generates two outputs: the full-size post image at 1080×1350 and a 300×300 thumbnail. The thumbnail is used in grids and previews where loading the full image would be wasteful.

Video transcoding with FFmpeg

Video is where things get more complex. Raw uploads can be anything — different codecs, resolutions, frame rates. The goal is to produce HLS (HTTP Live Streaming) output that works across all devices and adapts to the viewer’s bandwidth.

Each video gets transcoded into four renditions:

Rendition	Resolution	Bitrate	Audio
360p	640×360	800k	96k AAC
480p	854×480	1400k	128k AAC
720p	1280×720	2800k	128k AAC
1080p	1920×1080	5000k	192k AAC

The FFmpeg command generates all four renditions in a single pass, producing fragmented MP4 (FMP4) segments with 2-second segment durations. FMP4 over traditional MPEG-TS because it supports better seeking and is the direction Apple has been pushing HLS toward.

The output structure for a single video:

video_id/
├── master.m3u8          # Master playlist (points to renditions)
├── 360p/
│   ├── playlist.m3u8    # Rendition playlist
│   └── segment_%03d.m4s # 2-second segments
├── 480p/
│   └── ...
├── 720p/
│   └── ...
└── 1080p/
    └── ...

The master playlist lists all renditions with their bandwidth and resolution metadata. The video player picks the appropriate rendition based on the viewer’s connection speed and switches between them seamlessly.

Queue architecture

Media processing is CPU-intensive and unpredictable in duration. It can’t happen in the request path. BullMQ handles the orchestration.

There are separate queues for images and video, each with its own worker configuration:

Image queue — processes multiple jobs concurrently. Image resizing with Sharp is fast (sub-second for most presets), so parallelism is fine. Failed jobs retry 3 times with exponential backoff.

Video queue — concurrency locked to 1. Video transcoding is heavy — it saturates CPU and memory. Running multiple FFmpeg processes simultaneously would degrade quality for all of them. Each video job gets a 10-minute lock timeout to accommodate longer videos without the job being considered stale.

The retry strategy uses exponential backoff:

{
  attempts: 3,
  backoff: {
    type: 'exponential',
    delay: 1000
  }
}

If a job fails three times, it moves to the failed set where I can inspect it manually. Most failures are transient — R2 upload timeouts, corrupted input frames — so the retries handle them.

Storage in Cloudflare R2

All processed media goes to Cloudflare R2. The key structure is deterministic:

media/{mediaId}/post.webp
media/{mediaId}/thumbnail.webp
media/{mediaId}/video/master.m3u8
media/{mediaId}/video/720p/playlist.m3u8
media/{mediaId}/video/720p/segment_001.m4s

R2 was chosen over S3 for zero egress fees. When a post goes viral and gets millions of views, the storage cost stays flat. The R2 bucket sits behind Cloudflare’s CDN, so content is cached at the edge and served from the nearest POP to the viewer.

The upload flow end to end

Client uploads the raw file to a presigned R2 URL
The API receives a webhook confirming the upload, creates a media record with status uploaded
A BullMQ job is enqueued for processing
The worker downloads the raw file from R2, processes it through the appropriate preset pipeline
Processed outputs are uploaded back to R2
The media record status moves to processed
If the media is a post, a second job fires to generate embeddings (vision analysis → tag extraction → vector embedding)

The status field (uploaded → processing → processed) lets the frontend show appropriate loading states. Posts in processing state show a shimmer placeholder. Failed processing sets an error status that triggers a notification to the uploader.

What I’d do differently

The four-rendition HLS setup is overkill for the current scale. Most viewers are on mobile with decent connections — 720p and 1080p would cover 95% of playback. Dropping 360p and 480p would halve transcoding time and storage.

Sharp’s fit: 'cover' works well for square and portrait content but occasionally crops important details from landscape images. A smarter approach would use Sharp’s attention-based cropping or run a lightweight saliency detection pass before deciding the crop region.

But the core architecture — preset-based processing, queue-driven workers, deterministic storage paths — has held up well. Adding a new image format means adding a preset. Adding a new video quality means adding a rendition config. The pipeline scales by adding workers, not by changing code.