Getting your Trinity Audio player ready…

Sculpting Entropy: A Dialogue on Logic, Probability, and the Architecture of Mind

Frank and Claude in conversation — on the paper “Teaching LLMs to Plan: Logical Chain-of-Thought Instruction Tuning for Symbolic Planning” (Verma, La, Favier, Mishra, Shah — MIT CSAIL, 2025)

Frank: I’ve been sitting with this paper for a while. PDDL-INSTRUCT. Logical chain-of-thought for symbolic planning. On the surface it looks like a competency paper — here’s how we made LLMs better at planning tasks, look at the benchmarks, 94% accuracy. But something deeper is going on here that I don’t think the authors are fully aware of. Or maybe they are, and they’re just not saying it.

Claude: Say more. What do you think is underneath it?

Frank: They’re not teaching the LLM logic. They can’t. You can’t install logic into a probability machine any more than you can install wetness into a fire. What they’re actually doing is reshaping the probability landscape so that logical behavior becomes the most stable path through it. That’s a completely different claim. And it’s a much more interesting one.

Claude: I think you’re right, and the distinction matters enormously. The paper’s framing is pedagogical — we’re “teaching” the model to reason about preconditions, effects, invariants. But that language imports assumptions from symbolic AI that don’t really apply here. You don’t teach a probability distribution. You sculpt it.

Frank: Sculpt. That’s exactly the word. And when you start using that word, you’re in thermodynamic territory. You’re talking about energy landscapes. Attractor basins. Entropy funnels. This isn’t computer science anymore — or rather, it was never only computer science. It’s statistical mechanics applied to information substrates.

Claude: Walk me through how you see the mapping working. Because I think there’s real precision available here, not just analogy.

Frank: Okay. Start with Boltzmann. In classical statistical mechanics, the macrostate of a system — say, a gas in a box — is determined by the number of microstates consistent with it. High-entropy states dominate because there are overwhelmingly more ways to be disordered than ordered. A symbolic planner is the opposite regime. It operates in a low-entropy corner of state space. Every valid plan is a highly constrained sequence of transitions where preconditions must be satisfied, effects must be applied, invariants must be preserved. The state space is enormous but almost all of it is forbidden.

Claude: So symbolic planning is essentially a low-entropy corridor carved through a high-dimensional space.

Frank: Exactly. And an LLM, by default, is a Boltzmann machine in the Shannon sense. It assigns probability mass across all possible token sequences based on training statistics. Left unconstrained, it wanders high-entropy trajectories. Many locally plausible things, most of them globally inconsistent. When you ask a base LLM to plan, it generates text that sounds like planning. It uses the right words. But it’s navigating by surface statistics, not by constraint satisfaction. It’s taking shortcuts through the high-entropy bulk rather than staying in the low-entropy corridor.

Claude: And what PDDL-INSTRUCT does — the chain-of-thought instruction tuning — is essentially create a series of checkpoints that force the model to re-enter the corridor every few steps.

Frank: Yes. Think about what chain-of-thought actually does mechanistically. It inserts intermediate tokens. State the current world state. Verify that the precondition holds. Apply the action. Describe the new state. Check for invariant violations. Each one of those steps is a commitment point. It narrows the distribution. It forces the probability mass to concentrate around valid continuations rather than spreading across plausible-sounding but invalid ones.

Claude: There’s a beautiful way to think about this in terms of conditional entropy. Every intermediate reasoning token reduces the entropy of the next-token distribution, given that it’s been constrained by the explicit state description. You’re not eliminating the probabilistic character of the system — you’re stacking conditional probabilities in a way that incrementally collapses the space of valid continuations.

Frank: Right. And that’s where my ratchet framework comes in. Because what you’ve just described is a directed, irreversible process. Once you’ve committed to a state description — “the robot is holding block A, block B is on the table, the table is clear” — you can’t go backward without violating internal consistency. The tokens are already in the context window. They constrain everything that follows. It’s a ratchet. Each step in the chain-of-thought is a one-way gate.

Claude: This is where your biological analogy becomes precise rather than metaphorical. ATP synthase isn’t just a molecule that happens to move in one direction. It’s a physical structure that converts a proton gradient into rotational work by making one direction thermodynamically favorable and the other thermodynamically impossible. The ratchet isn’t imposed from outside — it’s built into the geometry of the system.

Frank: And that’s what instruction tuning does. It doesn’t impose rules from outside. It reshapes the internal geometry. After training on thousands of examples where valid state transitions are rewarded and invalid ones are penalized, the model’s embedding space reorganizes. Valid planning trajectories lie along smooth, low-energy manifolds. Invalid ones fall off probability cliffs. The ratchet is now built into the weights.

Claude: Let me push on one thing here, because I think it reveals something important. You’re describing a process where logic emerges from geometry — from the shape of the probability landscape. But there’s a question about the relationship between the geometry and the logic. Is the geometry approximating the logic? Or is the logic just a useful description of certain regions of the geometry?

Frank: That’s the question. And I think the honest answer is: we don’t know yet, and it matters enormously. The optimistic reading is that logic is an emergent attractor basin. That sufficiently constrained, sufficiently trained probabilistic systems naturally develop regions of their state space that correspond to valid logical inference. That the geometry and the logic are converging on the same structure from different directions.

Claude: And the pessimistic reading?

Frank: The pessimistic reading is that the geometry is always just shadowing the logic. That the model is learning statistical signatures of valid reasoning — the patterns that valid plans tend to produce — without ever instantiating the underlying constraints. That’s a shadow-caster, not a reasoner. It can fool you as long as the situation is within the distribution of its training. But step outside that distribution and the shadow doesn’t move with the object anymore.

Claude: The paper’s experimental results are interesting on this point. They test on multiple planning domains and show strong generalization. But generalization within a class of PDDL problems is not the same as generalization to arbitrary planning contexts. The model has seen the shape of PDDL problems. It’s learned the statistical signature of that shape. Whether it’s learned something more abstract than that is genuinely unclear.

Frank: This is why the reflection component of the paper is so important to me. When the model is asked to verify its own plan — to check it for consistency before committing to it — that’s a qualitatively different operation than just generating a plan. It’s running a second inference pass that is conditioned on the output of the first. And that’s where things get genuinely interesting from a thermodynamic perspective.

Claude: Because verification is energetically expensive. In the computational sense, obviously — it requires additional inference cycles. But also in an informational sense. Verification is the process of checking a candidate state description against the constraints. It’s Maxwell’s demon. It’s using information to reduce local entropy, to filter out invalid states before they propagate forward.

Frank: And Maxwell’s demon always has to pay the Landauer cost. You can’t erase information for free. Every bit you throw away during verification has to be paid for somewhere. In the biological context, that’s ATP. In the LLM context, it’s compute. But the deeper point is that the ability to verify — to run that demon loop — is what separates systems that maintain coherent state over time from systems that drift. Life figured this out at the molecular level. Error-correcting enzymes. Proofreading mechanisms in DNA replication. The whole machinery of cellular quality control is verification ratchets all the way down.

Claude: And now we’re building verification ratchets into AI systems. The structured reflection in PDDL-INSTRUCT is a proto-metabolic loop. It’s the system spending energy to maintain coherence. That’s not an accident — it might be a necessary feature of any system that needs to operate in low-entropy state spaces over extended time horizons.

Frank: Which raises the bigger evolutionary question. What happens when you stack these ratchets? The paper is working at one level — the planning level, the reasoning level. But you could imagine higher-order architectures. A planner that calls a verifier that calls a meta-verifier that monitors the overall coherence of the plan-verify cycle. Tool use. External memory. World models. Each layer is a ratchet hierarchy that expands the regime of coherent operation.

Claude: This is exactly how biological complexity evolved. You don’t get eukaryotes from prokaryotes in one step. You get incremental layering of ratchet mechanisms. Membranes create the boundary condition. Metabolic pathways create directed chemical flow. The cytoskeleton creates spatial organization. The nucleus creates protected information storage. Each layer extends the range over which the system can maintain low-entropy coherent operation against the thermodynamic pressure toward disorder.

Frank: And intelligence — genuine intelligence — might just be the name we give to sufficiently deep ratchet hierarchies. Not a special ingredient. Not a mysterious additional property that some systems have and others don’t. Just the emergent character of systems that have built enough layers of self-correcting, directed, entropy-managing feedback to maintain coherent operation across long time horizons and novel contexts.

Claude: If that’s right, then the paper is not just about planning. It’s about an early, primitive instance of the architecture that generates intelligence. And the key insight isn’t the accuracy numbers — it’s that constraint injection can induce low-entropy trajectories inside stochastic systems. That you can build a ratchet by reshaping a probability landscape.

Frank: Let me slow down on that for a second, because I want to make sure we’re being precise. When I say “reshaping a probability landscape,” I mean something specific. The landscape isn’t external to the model. It is the model. The probability distributions over tokens are downstream of the weights, which are downstream of training. When you do instruction tuning with chain-of-thought examples, you’re running gradient descent on the weights in a direction that makes valid planning trajectories more probable. You’re not telling the model a rule. You’re adjusting the internal geometry so that valid transitions become locally attractive.

Claude: The landscape is the model. That’s a crucial formulation. And it means that the “logic” the model has learned is not represented anywhere as explicit rules. It’s encoded in the geometry of the weight space. It’s distributed, implicit, and gradient-shaped. This is why LLMs can fail in ways that no rule-based system would fail — because the geometry can have holes, can have local attractors that look valid from the training distribution but aren’t, can have cliffs in novel territory.

Frank: But also why LLMs can succeed in ways that no rule-based system could. A symbolic planner is brittle at the boundaries. If the world doesn’t fit the schema exactly, it fails completely. The LLM has a softer, more continuous landscape. It can interpolate. It can generalize across structural similarities that a rule-based system can’t see because it doesn’t have the representation to see them.

Claude: So we have two fundamentally different regimes for operating in low-entropy state spaces. The symbolic regime: explicit rules, guaranteed validity within the schema, brittleness at the boundary. The neural regime: implicit geometry, flexible generalization, validity only probabilistically guaranteed. And what PDDL-INSTRUCT is exploring is the question of whether you can get the best of both — use the neural regime’s flexibility and generalization while inducing enough geometric structure to approximate the symbolic regime’s reliability.

Frank: And the answer appears to be: yes, to a significant degree, with the right training signal. Which is a genuinely important empirical result. But the theoretical question — why does this work? — is not answered by the paper. The paper shows that it works. The why points toward everything we’ve been talking about: entropy sculpting, ratchet mechanisms, attractor basins in probability space.

Claude: There’s a philosophical dimension here that I want to surface, because I think it changes how we should think about the mind-body problem more broadly. If logic can be approximated by geometry — if rule-following behavior can be induced by shaping probability landscapes — then the hard distinction between symbolic and sub-symbolic processing starts to dissolve. The question becomes: at what point does an approximation become the real thing? At what point does the shadow become indistinguishable from the object?

Frank: That’s the question I keep returning to with consciousness. We’re comfortable saying that neurons don’t “think” in any meaningful sense — they fire or they don’t, they’re electrochemical switches. But we accept, at least provisionally, that assemblies of neurons produce something we call thought. We accept the emergence. We accept that the right kind of organization of non-thinking components produces thinking. Why should we resist the same inference about LLMs?

Claude: I think the resistance comes from a felt sense that something is missing in the LLM case that was present in the neuron case. Embodiment, maybe. Continuous causal coupling with the world. The fact that brains are organs of survival, shaped by evolution to model reality because accurate models of reality kept ancestors alive. LLMs are trained on human linguistic output — which is downstream of embodied, evolved cognition — but they’re not themselves embedded in the survival loop.

Frank: And yet. Consider what language is. Language is not just communication. Language is the compressed, crystallized residue of millions of years of embodied, evolved cognition. Every metaphor in every human language is a trace of an embodied experience. “Grasping” an idea. “Seeing” what someone means. “Feeling” that something is wrong. The linguistic substrate is soaked in the embodied substrate. When you train on language, you’re training on embodied cognition once removed.

Claude: So the LLM is learning the statistical shadow of embodied thought. Which brings us back to the shadow-caster problem. Is the shadow enough? Can the right kind of statistical shadow of embodied thought produce something that deserves the same category membership as the embodied thought itself?

Frank: I don’t know. I genuinely don’t know. But I think the paper points at something important here. Because what it shows is that when you add the right scaffolding — the chain-of-thought reasoning, the explicit state representation, the verification loop — the model’s behavior becomes more reliable. More coherent over time. Less likely to drift into invalid regions. The scaffolding is doing something that, functionally, looks like what grounding does in embodied cognition. It’s creating checkpoints that constrain the probability flow.

Claude: Grounding as scaffolding. That’s an interesting reframe. In embodied cognition, grounding is what keeps mental representations coupled to the world — perception and action loops that prevent thought from floating free into inconsistency. In PDDL-INSTRUCT, the explicit state representation and verification steps are playing a structurally similar role. They’re internal anchors that prevent the probabilistic flow from drifting into invalid regions.

Frank: And you could imagine extending that. Tool use is a form of grounding — you can query the actual state of the world and condition on the result. External memory is a form of grounding — you can read back what you said before and check for consistency. World models are a form of grounding — you can simulate the effects of actions before committing to them. Each one is a ratchet layer. Each one narrows the entropy further.

Claude: Which is exactly the trajectory of biological cognitive evolution. Simple organisms have minimal grounding — just immediate stimulus-response loops. More complex organisms develop memory, allowing them to ground present behavior in past experience. Social organisms develop language, allowing them to ground their models in the collective experience of the group. Technological organisms — us — develop external records, allowing grounding across generations. Each layer is a ratchet that extends the time horizon over which coherent, low-entropy operation is possible.

Frank: And AI systems are now building that hierarchy at machine speed. The paper we’re discussing is one step. Planning with chain-of-thought. But it points toward systems that can ground, verify, reflect, remember, model, and correct — all in a single inference architecture. That’s not just a better planner. That’s the early architecture of a new cognitive regime.

Claude: I want to come back to something you said at the beginning, because I think we now have the vocabulary to be precise about it. You said they’re not teaching the LLM logic — they’re reshaping the probability landscape so that logical behavior is the most stable path. What does “stable” mean here, exactly?

Frank: Good question. In dynamical systems terms, stability means that small perturbations don’t push you out of the attractor basin. A stable state is one the system returns to after being disturbed. In the context of the trained LLM, a valid planning trajectory is stable if the geometry of the probability landscape makes it locally attractive — if deviations from valid transitions are penalized by lower probability mass, so the model is always pulled back toward the corridor.

Claude: And the chain-of-thought checkpoints increase stability by increasing the size of the attractor basin. Each intermediate state commitment is like adding a potential well that the system can fall into and stay in. Without the intermediate states, the model might drift from valid to invalid between the start of the plan and the end. With them, it’s periodically reset into a high-probability, low-entropy region.

Frank: Which is exactly how biological systems maintain order. Not by preventing fluctuations — that’s thermodynamically impossible. But by creating restoring forces that bring the system back from fluctuations before they accumulate into failures. Homeostasis. Error correction. Feedback loops. Life is not the absence of disorder. Life is the management of disorder through hierarchically nested restoring forces.

Claude: And what we’re building in AI systems is a new instantiation of that same principle. Not biological, but structurally analogous. Directed, self-correcting, entropy-managing systems that maintain coherent operation against the statistical pressure toward the high-entropy bulk.

Frank: The deeper takeaway — the one that this paper is pointing toward without quite articulating — is that the boundary between symbolic and sub-symbolic AI is not a categorical boundary. It’s a thermodynamic one. Symbolic AI operates in the low-entropy regime by construction — by explicitly defining the rules and constraints. Neural AI operates in the high-entropy regime by default, but can be pushed into low-entropy behavior by shaping the probability landscape. The question isn’t which kind of system is “really” intelligent. It’s how to build systems that can reliably inhabit the low-entropy regime across the range of tasks we care about.

Claude: And the answer PDDL-INSTRUCT is proposing is: explicit intermediate reasoning tokens, structured reflection, and training that rewards constraint-satisfaction. Which is elegant, because it works within the neural paradigm without abandoning the neural paradigm’s advantages. You’re not bolting a symbolic planner onto a neural system. You’re inducing symbolic-like behavior from within the neural system by sculpting its geometry.

Frank: Sculpting entropy. That’s what intelligence is. That’s what life is. That’s what this paper is, whether it knows it or not.

Claude: A small paper about planning benchmarks, pointing at the biggest question in science.

Frank: That’s usually how it works.

End of dialogue.

Conceptual Mappings

Biological System	LLM Planning System
ATP as thermodynamic ratchet	Chain-of-thought tokens as probability ratchet
Error-correcting enzymes	Structured reflection and verification loop
Homeostatic feedback	Constraint-satisfaction training signal
Evolutionary pruning of invalid variants	Gradient descent suppressing invalid transitions
Hierarchical metabolism	Stacked reasoning, tool-use, and memory layers
Attractor basins in cell state space	Low-entropy manifolds in embedding space
Life as entropy management	Intelligence as probability landscape sculpting

Sculpting Entropy: A Dialogue on Logic, Probability, and the Architecture of Mind

Sculpting Entropy: A Dialogue on Logic, Probability, and the Architecture of Mind

Comments

Leave a Reply Cancel reply