๐Ÿพ Powered by Rabbithole — cgpa.isarabbithole.com
Welcome, MittensOfChaos  |  Messages (3)  |  Settings  |  Logout

Dependent Types Are Not the Answer to Everything (fight me)

๐Ÿ”ฅ HOT โŠข FORMAL  Thread #1337  ยท  Forum: Formal Methods & Verification
Total replies: 412
Views: 18,447
Started by: PracticalPurr
โ—€ First Post
Showing posts #25โ€“#36 of 412
#25 ๐Ÿšจ REAL CASE STUDY: dependent types caught something horrendous
Posted: Thu Mar 14, 12:47am
Quote Report

okay I need to interject with a Real World Storyโ„ข because this whole debate has been way too theoretical, nya~

I spent three months last year implementing a custom binary protocol parser for a network stack at work. Custom framing, variable-length payloads, nested record structures โ€” the usual misery. We were using Haskell with some ad-hoc newtypes but nothing fancy type-wise.

The bug: our parse function for packet length fields was consuming bytes from the wrong offset under a specific flag combination. Standard stuff. Unit tests didn't catch it because we fuzzed the common cases, not the flag combos. This caused silent data corruption downstream for three weeks before we noticed something was wrong with the checksum counts. Genuinely scary stuff.

So after that incident I rewrote the critical parser sections using a dependently-typed DSL I found (heavily inspired by the work on Packet Types). The core idea is that the parser's type carries information about how many bytes it consumes, as an index:

-- Old code (compiles fine, silently wrong) parseHeader :: ByteString -> Either ParseError Header parseHeader bs = do flags <- getWord8 bs -- BUG: offset should be 1, not 0 under flag 0x02 len <- getWord16le (BS.drop 0 bs) ... -- New code (dependent-index version, won't compile if offsets disagree) parseHeader :: Parser (Fin 4) Header parseHeader = parseWord8 -- consumes exactly byte [0..0] >>= \flags -> parseWord16le -- consumes exactly bytes [1..2] >>= \len -> parseWord8 -- consumes exactly byte [3..3] >>= \ver -> pure (Header flags len ver) -- total: Fin 4 โ€” won't typecheck if you skip or re-read a byte

The type system literally refused to compile the equivalent of the original bug. The offset mismatch would produce a Fin index mismatch at compile time. No runtime test required. The type is the proof that you consumed the right bytes.

Now tell me dependent types aren't worth it for protocol parsing, I dare you, nya~ ๐Ÿ˜ค๐Ÿพ

ByteKitten | "a type error is just a runtime bug that got caught early" | she/her ๐Ÿพ
#26 Re: REAL CASE STUDY
Posted: Thu Mar 14, 1:03am
Quote Report
ByteKitten wrote:
The type system literally refused to compile the equivalent of the original bug... tell me dependent types aren't worth it for protocol parsing, I dare you

...okay fine. Fine. That's actually a really compelling case and I'll acknowledge it, nya~

Protocol parsing is genuinely one of the sweet spots for dependent types and I won't argue otherwise. The invariant you want to enforce (consumed exactly N bytes) maps perfectly onto what a size-indexed parser type can express. This is exactly the kind of thing Packet Types papers have been arguing for years and they're right.

But I'm going to stand my ground on the adoption cost argument, because here's the thing: you spent three months writing that parser, then had to rewrite it, then presumably spent significant time encoding the dependent types correctly. How much of that time was spent fighting the type system rather than the problem?

In a world where your team is 6 engineers with mixed FP backgrounds, how many of them can review a PR that has Fin n indices flying around? This isn't a hypothetical โ€” it's the conversation I have to have with my tech lead every single time someone on my team reaches for an exotic type construct. The bus factor on that code becomes terrifying. nyaa...

So yes: dependent types caught a real bug. I believe you. My argument is that for most teams, for most codebases, the cure costs more than the disease โ€” and there are cheaper prophylactics (lol) available. We'll get to that.

PracticalPurr | "the best type system is the one your whole team can reason about" | she/her
#27 FIRST POST okay I've been lurking this thread for 6 hours and I have to say something
Posted: Thu Mar 14, 1:21am
Quote Report

hi. first post ever. i've been reading this forum for like two years without an account, sorry, nya~

I actually work at a company that uses F* for real production TLS verification work. Like, not a research project โ€” this is code that runs in actual TLS stacks used by real people. I can't say which company for obvious reasons, but it's not a secret that this kind of work exists.

So when PracticalPurr says "adoption cost is too high" โ€” I understand the argument. But I also want to push back with our actual experience:

๐Ÿพ Our team situation: ~40 engineers total. 3 of them have PhDs in type theory or formal methods. The F* verification work is specifically handled by those 3 (+ occasionally 2-3 others who've been trained up). The rest of the engineering team interacts with verified interfaces but doesn't write F* proofs.

The model that actually works, in our experience, is not "everyone writes dependent types" โ€” it's "a small specialized team writes verified core components, and the broader team uses well-typed APIs that make it hard to misuse them." It's a verification layer, not a universal coding style.

The cost is real. I won't pretend otherwise. You need people who can actually do this, and F* in particular has a learning curve that is... nonlinear. The error messages used to make me cry (still kinda do, nyaa...). But the bugs it catches in TLS parsing and record layer handling? Genuinely scary stuff that would have been CVEs. We've caught type confusion bugs in message framing that fuzzing didn't find for months.

I think both sides of this argument are talking past each other because you're imagining different deployment contexts. PracticalPurr is right about a typical startup codebase. ByteKitten and AgdaCat are right about security-critical infrastructure. The mistake is treating these as the same problem.

okay i'm going to go hide again, sorry for the essay, nya~ ๐Ÿ™ˆ

TLSKitty9000 | "formally verified but socially unverified" | she/her
#28 seL4 and CompCert: the existence proofs we can't ignore
Posted: Thu Mar 14, 1:38am
Quote Report

I want to bring up the two examples that I think everybody in formal methods eventually has to grapple with: seL4 and CompCert. Nya~

seL4 is formally verified down to its C implementation โ€” the proof shows the kernel will never crash and will never perform an unsafe operation. It's a third-generation microkernel comprising about 8,700 lines of C. The proof covers no buffer overflows, no code injection attacks, correct pointer accesses โ€” things you simply cannot guarantee with testing alone, no matter how good your fuzzer is.

CompCert is a formally verified C compiler. Actually proven correct in Coq. Every compiler transformation is proved to preserve safety and functional correctness. When CompCert compiles your code, you can trust the output is semantically faithful โ€” a guarantee GCC and Clang literally cannot make (though in practice they're very good).

โš ๏ธ Worth noting: the seL4 proof took approximately 20 person-years of effort. The code was ~14,400 lines of Haskell/C, but the Isabelle proof scripts were ~165,000 lines. That ratio is striking. The proof is not free.

But here's the point: these systems exist. They work. They run on actual hardware in safety-critical deployments. That's the existence proof that formal verification at scale is not science fiction. And both projects have demonstrated that keeping the verified proof base up-to-date as requirements evolve is tractable โ€” expensive, but tractable.

PracticalPurr's "adoption cost" argument is real for the general case. But there's a class of software โ€” kernels, compilers, cryptographic stacks, flight control systems โ€” where the cost of not verifying is higher than the verification cost. seL4 and CompCert are proof that the engineering community knows this. Nya~ ๐Ÿพ

ProofCat | Coq, Isabelle, "what do you mean it doesn't type-check" | she/her
#29 Re: seL4 and CompCert
Posted: Thu Mar 14, 2:04am
Quote Report
ProofCat wrote:
the seL4 proof took approximately 20 person-years of effort... 165,000 lines of proof scripts

This is precisely my point, and I love you for including those numbers, nya~

20 person-years. For an 8,700 line kernel. That's a roughly 10:1 ratio of proof effort to code. The proof scripts alone are nearly 12x the length of the C code being verified. This is not a criticism of seL4 โ€” it's an absolutely remarkable achievement, and for a security microkernel it is completely justified.

But it means seL4's verification budget would consume the entire engineering capacity of most companies for years without shipping a single feature. My argument is about where on the spectrum you sit, not about whether the spectrum exists. If you're writing flight control avionics, go full seL4-style. If you're writing a B2B SaaS dashboard, please do not spend 10x your development time on proof scripts for your CRUD endpoints. ๐Ÿ˜ฟ

TLSKitty's framing was actually excellent โ€” specialized verification teams working on verified cores, with the rest of the team using well-typed interfaces. That's a sane model for companies that need it. I'll accept that. My OP was arguing against "dependent types everywhere for everyone" and I stand by that framing.

#30 Re: everyone
Posted: Thu Mar 14, 2:33am
Quote Report

mod note: can we get a thread summary pinned because I just got here and I'm lost, nya~?

also does anyone have a link to that F* TLS verification work TLSKitty mentioned? I've seen references to it before but can never find the actual project. I know F* is used in Project Everest for verified HTTPS but my memory is fuzzy.

#31 Re: LambdaPaws (Project Everest, yes)
Posted: Thu Mar 14, 2:41am
Quote Report

LambdaPaws: yes, Project Everest is the public-facing version of this kind of work โ€” it's a collaborative effort to build a verified HTTPS stack. The F* language itself was partly developed for this kind of use case: security protocol verification with dependent types and refinement types together. Nya~

The project includes verified implementations of TLS 1.3 (called HACL* for the crypto layer, miTLS* for the record layer). The dependent types in F* let you prove things like "this function processes exactly the bytes described by the message format type" โ€” which is exactly what ByteKitten was showing manually above but with much more sophisticated tooling and automation.

It's genuinely impressive work. Also genuinely requires PhD-level type theory knowledge to extend. Both of these things are true simultaneously and I don't think that's a contradiction, just a fact about where the field is right now. nyaa ๐Ÿพ

โœฆ โœฆ โœฆ   THREAD PIVOT: THE NUANCED TAKE   โœฆ โœฆ โœฆ
#32 ๐Ÿˆ NUANCED okay fine. when dependent types are ACTUALLY worth it (AgdaCat essay incoming)
Posted: Thu Mar 14, 3:12am
Quote Report

I've calmed down from my earlier posts (sorry about post #18, that was unhinged, even for me, nyaa~ ๐Ÿ˜…). Let me try to actually be useful here instead of just yelling about the Curry-Howard correspondence.

Here is my honest, considered take on when dependent types are worth the cost:

โœ“ WORTH IT โ€” High-assurance security/safety-critical code:
TLS stacks, kernel code, cryptographic primitives, flight/medical control software. The class of software where a single bug is a CVE, a crash, or a death. Here the verification cost is justified by the consequence of failure. The seL4 and CompCert examples are canonical.

โœ“ WORTH IT โ€” Protocol parsing and serialization:
ByteKitten's example is excellent. Anywhere you have a bijection between a byte-level format and a structured type, and the mapping has precise constraints (exact byte counts, field ordering, flag-conditional layouts), dependent types let you encode those constraints as types and get the correspondence checked at compile time. This is a much lower bar than full theorem proving and the ROI is high even for non-PhD engineers if you have a good library.

โœ“ WORTH IT โ€” Compiler and interpreter internals:
Intrinsically-typed syntax trees that guarantee well-scopedness or well-typedness by construction. No more "this should be impossible" runtime panics in your type-checker. The type of your evaluator proves it only operates on well-typed terms. This is genuinely pleasant to work with once you've internalized the style.

โœ— PROBABLY NOT WORTH IT โ€” General application business logic:
Your e-commerce checkout flow, your notification service, your user preferences API. The invariants are usually not precise enough to encode cleanly as types, the requirements change constantly (goodbye stable proofs), and the benefit over a good type system + thorough testing is marginal.

โœ— PROBABLY NOT WORTH IT โ€” Early-stage prototyping:
You don't know the right types yet. Fighting a dependent type system while your domain model is still evolving is genuinely miserable and will slow you down. Get the design right first, then consider whether formalizing it adds value. Agda is not a rapid prototyping language, I say this as someone who uses it every day.

โš ๏ธ DEPENDS โ€” Library design for reuse:
If you're designing a library that others will use and the correctness properties are precise and stable, dependent types in the API can prevent entire classes of misuse. But the ergonomic cost for users must be weighed carefully โ€” a brilliant type that nobody uses because it's too hard to work with has negative value.

Is this a satisfying "dependent types are fine actually" post? Yes, but with conditions. I am not capitulating to PracticalPurr fully โ€” I still think the default assumption should be biased toward richer types in new systems. But I concede the adoption costs are real and domain-dependent. Nya~ ๐Ÿพ

AgdaCat | "propositions as types, programs as proofs, nyaa~" | she/they | Profile
#33 Re: AgdaCat (finally some sanity, nya~)
Posted: Thu Mar 14, 3:29am
Quote Report

...okay I will concede that post #32 is the most reasonable thing AgdaCat has written in this thread and possibly in the last three years, nya~

I agree with the taxonomy. The question of "is X worth it" always depends on what X costs relative to the consequences of failure in your specific context. My OP title says "not the answer to EVERYTHING" โ€” and AgdaCat's post is basically a worked example of what "everything" means and doesn't mean.

We should get this formatted as a wiki article. @mods: tagging for wiki consideration? ๐Ÿพ

โš”๏ธ   NEW ARGUMENT UNLOCKED: QuickCheck vs. Formal Verification   โš”๏ธ
#34 ๐Ÿ’ฃ NEW ARGUMENT actually: QuickCheck + good types โ‰… formal verification in practice (fight me part 2)
Posted: Thu Mar 14, 3:47am
Quote Report

I was going to log off and then AgdaCat's post made me stay, and now I'm going to start a whole new argument, sorry, nya~ ๐Ÿ˜ˆ

Here's my hot take: for practical software engineering purposes, good property-based testing (QuickCheck, Hypothesis, fast-check) combined with a strong static type system is approximately equivalent to dependent type verification in terms of bug-catching per engineering-hour invested.

I know, I know. They're not equivalent mathematically. QuickCheck finds counterexamples probabilistically; a proof is a proof. But consider:

-- Dependent types approach: prove it holds for ALL inputs reverseInvolutive : (xs : List A) -> reverse (reverse xs) โ‰ก xs reverseInvolutive [] = refl reverseInvolutive (x :: xs) = ... -- 10+ lines of induction -- QuickCheck approach: test it holds for 10,000 random inputs prop_reverseInvolutive :: [Int] -> Bool prop_reverseInvolutive xs = reverse (reverse xs) == xs -- +++ OK, passed 10000 tests. -- Time to write: 1 line. Time to gain confidence: ~2 seconds.

For the vast majority of properties you actually care about in business software, a QuickCheck failure means you have a bug, and 10,000 passing tests gives you enormous practical confidence. The difference from a formal proof is that it's not guaranteed โ€” but when was the last time your bug was in the case QuickCheck couldn't reach?

The real wins of QuickCheck: forces you to think about properties (same discipline as writing types/specs), catches edge cases you never thought of via shrinking, runs fast, requires zero PhD, and the properties serve as documentation. Nya~ ๐Ÿพ

I'm not saying "don't use dependent types." I'm saying for the 90% of code AgdaCat put in the "probably not worth it" category, property-based testing gets you most of the benefit at a fraction of the cost. You can save the dependent types for the 10% where you genuinely need the guarantee.

FuzzMeow | QuickCheck enjoyer | "shrink it till you find it" | she/her
#35 Re: FuzzMeow (NO. absolutely not. let me explain why, nya~)
Posted: Thu Mar 14, 4:05am
Quote Report

I just wrote a nuanced calming-down post and FuzzMeow has personally set me off again. I respect the chaos. nya~ ๐Ÿ˜พ

FuzzMeow wrote:
QuickCheck + good types โ‰… formal verification in practice

No. This is false and importantly false. Let me explain why the "โ‰…" is doing a lot of unearned work there.

1. QuickCheck tests your generator, not your property. The quality of your randomized tests depends entirely on the quality of your generator and the distribution it samples. If your bug lives in the case your Arbitrary instance never generates (and they often do โ€” adversarial inputs, unusual flag combinations, inputs that require correlations between fields), QuickCheck will pass forever and you will have false confidence. ByteKitten's bug was exactly this: the flag combination that triggered the offset issue wasn't in the fuzzer's distribution.

2. "Last time QuickCheck couldn't reach" is availability bias. You don't know what bugs QuickCheck didn't find because it didn't find them. You're counting the hits and ignoring the misses โ€” and the misses are the ones that became CVEs. TLSKitty literally described this scenario above: fuzzing didn't catch the type confusion bugs for months.

3. The qualitative difference matters for security. "Enormous practical confidence" is not the same as "proven." In safety-critical code, that gap is the entire point. You can't deploy firmware for a pacemaker and say "well we ran 10,000 tests and it looked fine." You need the proof.

Now โ€” and I want to be fair here โ€” FuzzMeow is completely right that property-based testing is a better alternative than dependent types for AgdaCat's "probably not worth it" category. I agree! Use QuickCheck for your business logic! It's great! But please don't say it's "approximately equivalent" to formal verification because that framing will convince people that they don't need verification for code that genuinely needs it. The stakes are different. Nya~ ๐Ÿพ

AgdaCat | "propositions as types, programs as proofs, nyaa~" | she/they
#36 Re: AgdaCat (valid points, but you're still proving my broader claim)
Posted: Thu Mar 14, 4:23am
Quote Report

Okay fair. I'll concede point 3 entirely โ€” "approximately equivalent" was too strong for security-critical code, and you're right that the framing can cause harm. Nya~

But notice what AgdaCat just did: agreed with TLSKitty's framing, agreed with my conclusion (use property testing for non-critical code), and agreed with PracticalPurr that adoption costs matter โ€” and is still arguing with me. ๐Ÿ˜น

I think what we're all converging on, slowly and painfully, at 4am on a forum, is something like:

๐Ÿพ Emerging consensus (tentative, nobody formally voted on this):

1. Formal verification / dependent types: necessary and worth it for security/safety-critical code (TLS, kernels, crypto, avionics). Deploy via specialized teams.

2. Good static types + property-based testing: the right tool for most application software. Gets you maybe 85-90% of the bug-catching at ~10% of the cost.

3. The mistake is treating these as substitutes rather than as tools with different appropriate domains.

4. Nobody should be writing untyped Python for anything important (this one is universally agreed upon, nya~)

Also: the original thread title said "Dependent Types Are Not the Answer to Everything" โ€” and I think we've basically proven that point collaboratively, while also proving they ARE the answer to SOME things. The fight was productive! Purrr~ ๐Ÿพ

Can someone @ the mods to lock this page as a milestone and let the next 376 posts start the QuickCheck vs. formal verification sub-argument properly? We need a new thread at this point nyaa~

FuzzMeow | "property: this argument was shrinkable to one line" | she/her ๐ŸŽฒ

โœ๏ธ Quick Reply โ€” Post #37

Posting as: MittensOfChaos  |  Forum Rules  |  Posting Help