yt-dlp Download Tool: From Pitfalls to Final Solution

Complete record of building a “seemingly simple” video batch download tool, along with engineering practices and token consumption optimization ideas assisted by GitHub Copilot.

1. Background and Goals

Earlier, I paid for VidJuice UniTube to learn English. After multiple computer reinstallations, I wanted to reactivate it but found the official website was inaccessible. Well, I had to contact customer support. Online search revealed that yt-dlp is quite popular, so I wanted to use it to download videos and subtitles for English listening practice as an alternative. Specific requirements:

Batch download from urls.txt, high quality (1080p+)
Auto-fetch subtitles: original language + Chinese only + bilingual merged
Subtitle format: ASS (supports styling), green color, adaptive resolution
Bilingual subtitles: English first line + Chinese second line, unified at the bottom of the screen
Windows 11 environment, support both double-click .bat and bash execution
Auto-fetch cookies (no manual export needed)

2. Technology Selection Evolution

Version 1: Go Script (Failed)

Initially planned to write a download script in Go, calling os/exec to execute yt-dlp commands.

Reasons for failure:

n challenge solving failed — yt-dlp installed via Chocolatey is a third-party package, not bundled with EJS solver, unable to solve YouTube’s JS challenge
android client does not support cookies — Specified both android client and cookies, parameter conflict
After solving the above, the Go script was just a wrapper for yt-dlp with no added value

Decision: Abandoned Go, switched to native yt-dlp + shell script. Deleted main.go, go.mod, ytdl.exe.

Version 2: Manual cookies.txt (Failed)

Tried exporting cookies.txt from browser for yt-dlp.

Reasons for failure: YouTube rotates cookies periodically; the exported file expires quickly, requiring re-export each time, high maintenance cost.

Decision: Switched to --cookies-from-browser, reading latest cookies directly from browser each run.

Version 3: Chrome Cookies Read Failure

--cookies-from-browser chrome reported SQLite database lock error.

Root cause: Chrome uses SQLite exclusive lock; other processes cannot read while it’s running; Firefox uses WAL mode, allowing reads even when open.

Solution: Script detects Chrome process; if running, automatically falls back to Firefox; if Firefox not logged into YouTube, prompts user to log in.

# Chrome running detection
if tasklist.exe 2>/dev/null | grep -qi "chrome.exe"; then
    CHROME_RUNNING=true
fi

Final Solution: Native yt-dlp + Python Subtitle Post-processing

toolkits/yt-dlp/
├── urls.txt          ← Only maintain this, one URL per line
├── yt-dlp.conf       ← Download configuration
├── 2download.sh      ← Bash execution (daily use)
├── download.bat      ← Windows double-click execution
├── merge_subs.py     ← Subtitle post-processing (Python)
└── README.md

3. Key Issues and Solutions

Issue 1: n-challenge Failed

YouTube uses JS challenge (n-challenge) to counter scraping; yt-dlp needs to run JS to solve it. Chocolatey version of yt-dlp does not include this script.

Solution: Configure auto-download of EJS solver from GitHub:

# yt-dlp.conf
--js-runtimes node
--remote-components ejs:github
--extractor-args youtube:player_client=tv,web

First run downloads from GitHub once, then caches locally.

Issue 2: Output Path Error (`D:yt-dlpDown` Missing Backslash)

In yt-dlp.conf:

-P D:\yt-dlpDown   ← Wrong!

yt-dlp’s conf parser treats backslashes as escape characters, resulting in relative path D:yt-dlpDown.

Solution: Use forward slashes for all paths:

-P D:/yt-dlpDown   ← Correct
-o %(title)s/%(title)s.%(ext)s

Issue 3: All Files in Same Directory

Default output template -o %(title)s.%(ext)s puts all video and subtitle files in D:\yt-dlpDown\ root, messy.

Solution: Create subdirectory per video title:

-o %(title)s/%(title)s.%(ext)s

All files for each video (video + subtitles in various languages) are in their own directory:

D:\yt-dlpDown\
└── Video Title\
    ├── Video Title.mp4
    ├── Video Title.en.ass
    ├── Video Title.zh-Hans.ass
    └── Video Title.en-zh.ass

Issue 4: Bilibili video shows “subtitles” but cannot be extracted

After downloading a Bilibili video, the player clearly shows subtitles, but the script reports:

[skip] no soft subtitle streams found (burned-in subtitles cannot be extracted)

Root cause: The subtitle type determines extractability:

Type	Feature	Extractable?
Soft subtitles (embedded track)	Independent subtitle track in MKV/MP4 container, player can toggle	Yes, ffmpeg extracts
Burned-in subtitles (hardcoded)	Text drawn directly onto video pixels, player cannot turn off	No, only OCR

Bilibili fan-made/repurposed videos almost always have burned-in subtitles; what the player shows is just the video itself, not an independent subtitle track.

Key lesson: This issue cannot be predicted before writing code; it only triggers when downloading real videos—a classic “found only when used” problem.

Issue 5: Already processed directories repeatedly trigger embedded subtitle detection

After adding embedded subtitle extraction, every run re-checks already processed directories (with .ass but no .srt) via ffprobe:

[subtitle] Vibe Coding in VS Code...
  [info] no external subtitles, checking embedded streams...
  [skip] no soft subtitle streams found

Cause: The new feature changed the logic, but the old “already processed” status check was not updated accordingly.

Fix (one line):

if not srts:
    if list(d.glob("*.ass")):
        return  # Already processed, skip

Such bugs only surface when running on multiple video directories; single test runs won’t catch them.

IV. Subtitle System Design

Why ASS instead of SRT?

Requirement	SRT	ASS
Green subtitles	`<font color>` tag, ignored by most players	Native `PrimaryColour` property, rendered by all mainstream players
Adaptive font size	Entirely determined by player	`PlayResX/Y` define reference resolution, player scales proportionally
Bilingual position control	Cannot control	`Alignment` precisely controls each line’s position

Adaptive font size principle

PlayResY: 720        ← Reference resolution (baseline)
Fontsize: 30         ← Font size based on 720p

The player automatically converts during rendering:

$$\text{Actual font size} = 30 \times \frac{\text{Actual window height}}{720}$$

720p window → 30pt
1080p window → 45pt
2K 1440p window → 60pt

Regardless of resolution, subtitles always occupy about 5% of window height (single line), bilingual total about 10%.

Bilingual subtitle implementation

Use inline tags to switch fonts within the same Dialogue event, English first line + Chinese second line, both at the bottom:

Dialogue: 0,0:00:01.00,0:00:03.00,BiBottom,,0,0,0,,{\fnArial}Hello World\N{\fnMicrosoft YaHei}你好世界

Time interval merging (boundary split) ensures strict alignment of Chinese and English for each time segment.

Automatic original language detection

The script automatically identifies the video’s original language and generates subtitles according to rules:

_ZH_LANGS = {"zh-Hans", "zh-Hant", "zh", "zh-CN", "zh-TW"}

# Logic:
# - Original is Chinese → generate only zh-Hans.ass
# - Original is other language → generate original.ass + zh-Hans.ass + original-zh.ass

Video Language	Generated Files
English	`base.en.ass` + `base.zh-Hans.ass` + `base.en-zh.ass`
Japanese	`base.ja.ass` + `base.zh-Hans.ass` + `base.ja-zh.ass`
Chinese	`base.zh-Hans.ass` (only one)

V. Final Configuration Notes

yt-dlp.conf

# Output path (must use forward slashes)
-P D:/yt-dlpDown
-o %(title)s/%(title)s.%(ext)s

# Highest quality
-f bestvideo[ext=mp4]+bestaudio[ext=m4a]/bestvideo+bestaudio/best
--merge-output-format mp4

# n-challenge fix
--js-runtimes node
--remote-components ejs:github
--extractor-args youtube:player_client=tv,web

# Subtitles (cover common languages)
--write-sub
--write-auto-sub
--sub-langs zh-Hans,zh-Hant,zh,en,ja,ko,fr,de,es,pt,it,ru,ar,hi,th,vi
--convert-subs srt

--no-playlist
--retries 5
--fragment-retries 5
--file-access-retries 3

Cookies Priority Logic

cookies.txt exists?
  └─ Yes → Use cookies.txt (manual maintenance mode)
  └─ No → Chrome directory exists?
             └─ Yes → Chrome running?
                        └─ No → Use Chrome cookies (optimal)
                        └─ Yes → Firefox directory exists?
                                   └─ Yes → Use Firefox cookies (must be logged into YouTube)
                                   └─ No → Error and exit
             └─ No → Use Edge cookies

VI. Engineering Practice: Token Consumption Analysis for AI-Assisted Development

Consumption Distribution

The entire project consumed about 11% of the monthly subscription quota, with major waste sources:

Reason	Estimated Proportion
Subtitle requirements iterated 6 times (each time read file + modify + test)	~30%
Re-download subtitles from YouTube for each debug	~20%
No instructions file, repeated background explanation each time	~20%
Multiple rounds of troubleshooting for paths, cookies, etc.	~20%
Long conversation triggers compression, summary itself consumes tokens	~10%

If Redone, Could Save 40%+

Key improvements:

1. Write `.github/copilot-instructions.md` First

Solidify project background once, AI automatically loads in subsequent conversations, no need to repeat:

## Project: yt-dlp Download Tool
- Path convention: conf files must use forward slashes
- Output: D:/yt-dlpDown/<title>/<title>.mp4
- Subtitles: merge_subs.py, PlayResY=720, fontsize=30, ASS format
- Cookies: Prefer Firefox (Chrome locks database when running)

2. Propose Plan First, Human Confirms, Then Write Code

Subtitle format iterated 6 times, each time AI directly modified code, user saw effect then gave feedback.

Correct workflow:

User: "Change subtitles to bottom double line, English on top Chinese on bottom"
AI: "Planned modifications:
     1. Delete EnTop style, add BiBottom style (Alignment=2)
     2. _merge_bilingual() generates single Dialogue
     3. Inline \fn tags to switch fonts
     Confirm to proceed?"
User: "Confirm"
AI: Complete all modifications at once

3. Use Local Mock Data for Testing

Each time debugging subtitle script, re-download SRT from YouTube, wasting time and tokens.

Correct approach: Place a 20-line test SRT in project directory, run locally during debugging:

python merge_subs.py ./test   # 0.1 second, no network needed

4. Only Paste Key Error Lines

Inefficient: Paste 100 lines of full yt-dlp output

Efficient: Only paste the last ERROR line:

ERROR: Sign in to confirm you're not a bot.

VII. Daily Usage Workflow

# 1. Edit urls.txt, one YouTube URL per line
vim urls.txt

# 2. Run download (bash method)
bash 2download.sh

# 3. Or directly double-click download.bat (Windows)

After download completes, subtitles are processed automatically, output structure:

D:\yt-dlpDown\
└── Video Title\
    ├── Video Title.mp4
    ├── Video Title.en.ass       ← English only, green, adaptive
    ├── Video Title.zh-Hans.ass  ← Chinese only, green, adaptive
    └── Video Title.en-zh.ass    ← Bilingual (English top, Chinese bottom), bottom, green

VIII. Dependency Installation

# Windows, using Chocolatey
choco install yt-dlp ffmpeg nodejs python

# First run of yt-dlp will automatically download EJS solving script from GitHub (requires network)

Firefox must be logged into YouTube in advance (one-time operation).

9. Summary

This project is essentially just “configuring yt-dlp + writing a subtitle post-processing script”, with core code not exceeding 200 lines. However, the entire process consumed a lot of time and tokens, with lessons falling into two categories:

Avoidable consumption:

Clarify requirements before starting: Subtitle position, color, and size were not defined at the beginning, leading to 6 iterations of rework.
Understand tool characteristics: The backslash issue in yt-dlp conf and Chrome SQLite locking should have been checked in documentation first.
Localize test data: Network-dependent tests slowed down the entire debugging loop.
Use instructions file to solidify context: Avoid re-explaining project background from scratch in every conversation.

Unavoidable consumption (exploration cost):

Burned-in subtitles vs. soft subtitles—you won’t know Bilibili uses burned-in subtitles without downloading a real video.
State judgment bugs introduced by new features—only exposed when running multiple real directories.
Subtitles appearing too large on a 2K screen—only discovered when viewed on the target screen.
Chrome SQLite locking—only triggered in a real browser environment.

These “discovered only after use” issues are normal costs of engineering exploration and should not be forced to save.

Reasonable expectation: Optimizing preparation can save 30-50% of tokens; runtime issues still require spending as needed.

The essence of AI-assisted development is not “let AI think for you”, but “efficiently convert your thinking into code”. The clearer the requirements, the more efficient the AI, and the fewer tokens consumed.

Project code: All files in the toolkits/yt-dlp/ directory