Monad Transformers: A Defense (yes I know, please read before groaning)

#1 Sun Nov 03, 2024 02:17:44 UTC Quote Report ❤

Okay. I know. I know. The title alone probably caused 40 of you to preemptively groan. I saw the groan counter hit 12 before I even finished the first paragraph last time I brought this up. So: I'm asking for five minutes. Read the whole post. Then groan. Deal?

Let me be clear about what I'm not defending: I'm not saying monad transformer stacks are the future. I'm not saying you should use StateT (ReaderT (WriterT (ExceptT AppError IO))) a for your new greenfield project and feel smug about it. I am saying they are underrated, misunderstood, and often the right tool for a specific class of problem — and that the Haskell community's wholesale abandonment of them in favor of effect systems involves some motivated reasoning we should be honest about.

Here's my thesis in three parts:

1. ReaderT/StateT/EitherT stacking is appropriate when your effect set is small, known, and stable.

The killer use case for polysemy or effectful is when you need to swap out interpretations at the boundary — testing vs. production, logging vs. no-logging, etc. But if your app genuinely just needs config (ReaderT), mutable state (StateT), and error handling (ExceptT), and you know that will not change? The stack is not boilerplate. It is the model. A 3-layer stack is 3 lines of type alias. The "quadratic instance problem" doesn't bite you until you're at 5+ effects, which is a code smell anyway.

-- This is fine. Actually fine. Not embarrassing.
type AppM = ReaderT Config (StateT AppState (ExceptT AppError IO))

runAppM :: Config -> AppState -> AppM a -> IO (Either AppError (a, AppState))
runAppM cfg st action =
  runExceptT (runStateT (runReaderT action cfg) st)

2. The "monad transformers compose badly" argument is mostly about the degenerate cases.

Yes, WriterT has well-known space leak issues. Yes, if you stack two ReaderTs, the MTL functional dependencies will bite you — ask will only see the outermost one. These are real problems. But they're known problems with known workarounds, and they don't apply to 80% of usage. The argument "transformers are bad because WriterT leaks" is like saying "IO is bad because you can write infinite loops." Technically true. Not really the point.

3. Effect systems trade one set of problems for another.

Polysemy is beautiful. I've written polysemy code. I love the interpreter pattern. But let's be honest: the compile times are rough, the error messages can be cryptic, and the "you can swap interpreters" promise is mostly useful in tests (which you could also get with ReaderT + typeclass). Effectful is better on performance since it's essentially ReaderT over IO underneath anyway, but then you're kind of just doing... what we were doing before? With extra steps?

I'm not trying to start another flame war. I want an actual discussion. Where do you draw the line? When does a transformer stack become genuinely unworkable versus "aesthetically displeasing"?

nya~ 🐾

😤 Groan (187) 🐾 Paw (44) 🧠 Big Brain (31) 💜 Purple Heart (28) 😺 Mrow (19)

🐱 "The Monad is not the message; the transformer is the medium." — MeownadTransformer, 2021, post #2047, thread "Monad Transformers: Why I Don't Hate Them" (archived)
nya~ :: AppM nya~

#2 Sun Nov 03, 2024 02:19:12 UTC Quote Report

*long suffering sigh*

...okay fine I read it. You raise point 3 which I wasn't expecting from you. Groaning retracted conditionally. Don't make me regret this. nya~

😤 Groan (3) 😺 Mrow (47)

😸 module CGPA.Mod where -- purrincess_admin :: ∀ r. Moderator r => Chaos r nya~

#3 Sun Nov 03, 2024 02:21:58 UTC Quote Report

*sees thread title*

GROAN

*reads post*

...okay the point about free monads is fair but let me push back on point 2. The "known workarounds" argument kind of falls apart at scale. The functional dependency issue with MTL isn't just an edge case — mtl makes use of functional dependencies to retain type inference, but this comes at a compositional cost when the same typeclass appears at multiple transformer layers. You can work around it but you're basically writing the effect system that effect systems solved for you.

Also the WriterT thing isn't just space leaks. The semantics are genuinely weird when combined with exceptions. If you throw in ExceptT and WriterT is above it, you lose all your log output. If WriterT is below, you keep it. The order matters and it's non-obvious. Effect systems let you specify semantics more explicitly.

😤 Groan (1) 🧠 Big Brain (22) 🐾 Paw (11)

🙀 Free f a = Pure a | Free (f (Free f a)) — it's turtles all the way down, nya~

#4 Sun Nov 03, 2024 02:31:05 UTC Quote Report

freermonady wrote:

The order matters and it's non-obvious. Effect systems let you specify semantics more explicitly.

This is actually my favorite counterargument and I want to steelman it properly: you're right that the semantics of stacking are order-dependent, and that's not always obvious. The classic example is exactly what you described — runExceptT (runWriterT m) vs runWriterT (runExceptT m) give you different types and different behavior. The first gives you IO (Either e (a, w)), the second gives you IO (Either e a, w).

But: I'd argue this explicitness is a feature, not a bug! When I see the stack type, I know exactly what I get. The interaction is deterministic. Effect systems don't actually eliminate this problem — polysemy's state semantics between effects depend on the order interpreters are run. The docs literally say: "if the final monad is a monad transformer stack, then state semantics will depend on the order monad transformers are stacked." Same thing, different syntax.

-- These are different programs with different semantics.
-- Transformer version: transparent, explicit in the type
IO (Either AppError (a, Log))   -- log lost on error
IO (Either AppError a, Log)      -- log preserved on error

-- Polysemy version: semantics still depend on handler order
-- just... less obvious from the type alone
runM . runError . runOutputList $ prog  -- vs
runM . runOutputList . runError $ prog

The order-dependence doesn't go away. You just lose the type-level breadcrumb. nya~

🧠 Big Brain (51) 🐾 Paw (19) 😺 Mrow (14)

🐱 "The Monad is not the message; the transformer is the medium."
nya~ :: AppM nya~

#5 Sun Nov 03, 2024 02:44:19 UTC Quote Report

First time posting in an MT thread without immediately leaving. Good sign.

What I don't see discussed enough: the actual academic connection between the two approaches. There's a paper — "Monad Transformers and Modular Algebraic Effects: What Binds Them Together" (Schrijvers et al., Haskell '19) — that formalizes exactly this. every monad transformer for algebraic operations gives rise to a modular effect handler. They're not opposites. They're duals in a precise categorical sense.

The community argues about mtl vs polysemy like they're fundamentally different paradigms. They're not. Polysemy's interpreter is a monad transformer in disguise. Effectful's Eff monad is explicitly described as "essentially a ReaderT over IO on steroids." If effectful is acceptable, you've already accepted the ReaderT pattern. You're just using a library to manage the dictionary-passing that ReaderT does manually.

🧠 Big Brain (38) 😤 Groan (2) 🐾 Paw (15)

😻 Katsellov :: ∀ cat. Cute cat => Theorem cat -> Proof cat nya~

#6 Sun Nov 03, 2024 03:02:44 UTC Quote Report

Groan. (Required by forum tradition. I still read the post.)

Polysemy defender checking in. The "cryptic error messages" point lands, I won't lie. When you get a No instance for (Member (Embed IO) r) error nested inside three other constraints it's... not fun. And yes, compile times with polysemy + type-level lists are genuinely rough. I've seen 45-second incremental compile times on medium-sized polysemy projects.

But the interpreter composability is real. The thing transformers genuinely can't do easily: you can run the same Sem r a computation under completely different interpreters in the same binary. Pure interpreter for property tests with QuickCheck, IO interpreter for integration tests, mock interpreters for unit tests — all without touching the business logic. With a transformer stack you're either parameterizing over a typeclass (which you need MTL for, buying back complexity) or you're hardcoding the stack.

-- polysemy: interpreter swap is clean
testPure :: IO ()
testPure = do
  let result = run . runState initialState . runError $ myProgram
  result `shouldBe` Right expectedOutput

testIO :: IO ()
testIO = do
  result <- runM . runStateIORef initialState . runError $ myProgram
  result `shouldBe` Right expectedOutput

😤 Groan (5) 🧠 Big Brain (29) 🐾 Paw (12)

😼 runM . runError . runState mempty . runReader cfg $ nya | polysemy forever 💜

#7 Sun Nov 03, 2024 03:15:00 UTC Quote Report

polysemy_paws wrote:

Pure interpreter for property tests with QuickCheck, IO interpreter for integration tests, mock interpreters for unit tests — all without touching the business logic.

This is a genuinely good point and I want to acknowledge it properly. The interpreter-swap pattern for testing is real and useful. My counter: with MTL typeclasses you get the same benefit, the architecture is just different. You write against MonadState AppState m instead of Member (State AppState) r, and your test instantiates m ~ Identity while prod uses the full transformer stack. Both approaches work. MTL is more "standard library" and less extra-dependency. But polysemy wins on ergonomics for complex interpreter trees — agreed.

Where I push back: for simple CRUD apps, microservices, or any codebase that a junior Haskeller needs to navigate — transformer stacks are much more learnable. The cognitive overhead of polysemy's type-level machinery is not trivial. "It's just a list in the type" stops being reassuring when you see a type error 8 lines long. At least with transformers, Real World Haskell chapter 18 explains everything in plain english. nya~

🐾 Paw (33) 😺 Mrow (21) 🧠 Big Brain (18)

🐱 "The Monad is not the message; the transformer is the medium."
nya~ :: AppM nya~

#8 Sun Nov 03, 2024 03:29:17 UTC Quote Report

OK I'll be the effectful shill. I switched from polysemy to effectful last year and I'm not going back, but I'll admit OP has a point about effectful basically being a ReaderT in disguise.

The effectful Eff monad is designed as essentially a ReaderT over IO, extended with data types representing effects. The key win: the concrete monad type means GHC has many opportunities for optimization — in benchmarks it runs extremely close to equivalent mtl code at default optimization levels. No INLINE pragmas needed everywhere. And you get interop with MonadUnliftIO and MonadBaseControl basically for free, which polysemy made you cry over.

The honest comparison table from my notes:

┌──────────────────────┬──────────┬──────────┬───────────┐
│ Criterion            │ mtl/xfmr │ polysemy │ effectful │
├──────────────────────┼──────────┼──────────┼───────────┤
│ Runtime performance  │ ★★★★★   │ ★★☆☆☆   │ ★★★★☆    │
│ Compile times        │ ★★★★★   │ ★★☆☆☆   │ ★★★★☆    │
│ Error messages       │ ★★★☆☆   │ ★☆☆☆☆   │ ★★★★☆    │
│ Interpreter swap     │ ★★★☆☆   │ ★★★★★   │ ★★★★☆    │
│ Ecosystem compat     │ ★★★★★   │ ★★★☆☆   │ ★★★★☆    │
│ Learnability         │ ★★★★☆   │ ★★☆☆☆   │ ★★★☆☆    │
│ Delimited continutn  │ ★★★★☆   │ ★☆☆☆☆   │ ★☆☆☆☆    │
└──────────────────────┴──────────┴──────────┴───────────┘

Transformer stacks win on "small, stable effect set." Effectful wins on "medium complexity IO-heavy app." Polysemy wins on "I need true interpreter composition and I'm willing to pay the price." nya~

🧠 Big Brain (71) 🐾 Paw (44) 😺 Mrow (38) 💜 Purple Heart (22)

🐾 newtype Eff es a = Eff { unEff :: ... } — it's just ReaderT, but nyaaaa~

#9 Sun Nov 03, 2024 03:41:59 UTC Quote Report

effectful_enjoyer wrote:

Delimited continuation support: transformers ★★★★☆, polysemy/effectful ★☆☆☆☆

Can you expand on this? I've always considered the lack of ContT in effect systems a genuine hole. The fused-effects docs even acknowledge that mtl provides ContT and they don't, though they note "many behaviours possible with delimited continuations (e.g. resumable exceptions) are directly encodable as effects." But that's a bit of a dodge because the encoding isn't always clean.

Also re: free monads specifically (since I'm contractually obligated to defend them): they're genuinely useful for a different problem class — DSL construction and program transformation. If you're writing an AST and you want to interpret it multiple ways, Free is a natural fit. The performance penalty is real but overblown for that use case. "Freer monads are today somewhere around 30x slower than equivalent mtl code" is the inflammatory stat that gets thrown around — but that's microbenchmark bind overhead, not application-level performance. Network IO drowns it out.

🧠 Big Brain (19) 🐾 Paw (8)

🙀 Free f a = Pure a | Free (f (Free f a)) — it's turtles all the way down, nya~

#10 Sun Nov 03, 2024 04:02:11 UTC Quote Report

Can we zoom out for a second to the actual theoretical picture? Algebraic effects as defined by Plotkin & Pretnar have a clean semantics: effects are operations plus handlers, handlers are folds over effectful trees. Monad transformers preceded this but address the same problem class with a different formalism.

The key insight from the literature: a modular algebraic effect handler and a monad transformer for the same algebraic operations are mathematically equivalent. This is what Schrijvers et al. proved formally. So the mtl vs. effects debate is partly a taste debate dressed up as a correctness debate.

Where algebraic effects genuinely win on theory: higher-order effects like local in ReaderT or catch in ExceptT. In a purely first-order algebraic effect system, scoped operations like these have to be implemented as interpreters, losing the ability to reinterpret them later. This is the "higher-order effects problem" and it's what keeps fused-effects and polysemy from being fully algebraic in the original Plotkin sense. Katsellov did a good post on this last year.

For Haskell specifically: the ROW typed algebraic effects approach (as in Koka language) avoids all this cleanly, but Haskell doesn't natively support row types, so every effect library is approximating. Transformer stacks are a different approximation strategy, not an inferior one. nya~

🧠 Big Brain (55) 💜 Purple Heart (31) 😺 Mrow (14)

🐈‍⬛ handle :: (a → c) → (∀ b. op b → (b → c) → c) → Free op a → c nya~

#11 Sun Nov 03, 2024 04:17:33 UTC Quote Report

GROAN (pre-emptive, not based on content)

GROAN (also retroactive)

Just use IO. Pass your config as function arguments. Use IORef for mutable state. Throw exceptions with throwIO. This is a solved problem that does not require 400 posts to discuss. I have shipped more Haskell to production than anyone in this thread and my monad stack is: IO. Flat IO. Just IO.

...that said the comparison table in #8 is accurate and fair so I'm not fully groaning. Mostly groaning. nya (begrudgingly).

😹 Mrow (88) 😤 Groan (14) 🐾 Paw (7)

😾 main :: IO () — and I mean it. nya.

#12 Sun Nov 03, 2024 04:31:22 UTC Quote Report

This thread is going better than I expected. I thought I'd have to fight through 50 groan-only posts before anyone engaged substantively. You're all secretly type theory nerds and I love that about CGPA.

Let me try a synthesis of what we have so far:

When to use a transformer stack: Small (<4), known, stable effect set. Team unfamiliar with effect systems. High performance requirements where you can't afford even effectful overhead. Pure computation with Identity at the base. "Library API" code where you want to expose MonadX typeclasses for maximum flexibility.

When to use effectful: IO-heavy application with medium effect complexity. Need MonadUnliftIO compatibility. Want effect swapping for testing. Want performance close to mtl without the lifting boilerplate.

When to use polysemy: You need true interpreter composition. You're building something where the same program genuinely needs multiple semantically distinct interpretations. You're OK with paying the compile time cost.

When to use free monads: DSL construction. Program analysis and transformation. Cases where you want to inspect the structure of a computation before running it.

When to use flat IO (hat tip to GrumpyCatIO): You're a professional and you don't need any of this. nya~

🧠 Big Brain (94) 🐾 Paw (67) 💜 Purple Heart (55) 😹 Mrow (42)

🐱 "The Monad is not the message; the transformer is the medium."
nya~ :: AppM nya~

#13 Sun Nov 03, 2024 04:52:08 UTC Quote Report

Coming in late as a relative newcomer — I learned Haskell last year and I want to give the newbie perspective that post #7 mentioned.

I picked up transformer stacks first because Real World Haskell covered them. I understood them in a weekend. Then I tried polysemy. I spent three days confused by type errors before I got a hello-world working. The community is right that effect systems are more ergonomic once you understand them — but the learning curve asymmetry is massive. "Just learn the type-level list stuff" isn't beginner-friendly advice.

Effectful was actually friendlier because the error messages are closer to what I was used to. But I still think transformer stacks deserve a place in the ecosystem as the "introductory" approach. Maybe I'm wrong. What did other people learn first? nya~

🐾 Paw (29) 😺 Mrow (18)

🐈 nyaomi :: ∀ (n : Nat). Cute n → Proof (SuperCute (n + 1)) nya~

#14 Sun Nov 03, 2024 05:10:45 UTC Quote Report

MeownadTransformer's post #12 is getting pinned in the wiki. Locking the thread would be a crime so I'm not doing that. Continue. But if the groan-to-content ratio drops below 1:2 I'm adding a mandatory kitten pic requirement. This is a formal warning. nya~ 🐱

😹 Mrow (111) 🐾 Paw (88)

😸 module CGPA.Mod where nya~

😿 Monad Transformers: A Defense (yes I know, please read before groaning)

✏ Post Reply

👥 Active in Thread

📎 Related Threads

📊 Thread Stats

📖 Thread Wiki

🐈 CGPA Quick Links