AI as a Research Collaborator, Not a Paper Mill

The debate about AI in research has focused on speed. I think it’s missing the most important part: a room full of experts while you think.

Mar 09, 2026

Earlier this year, Roman Frydman and I had a paper to revise. The revision was substantial — a major extension of our empirical analysis and a complete restructuring of the manuscript. The paper examines forecast errors under structural change, and the extended analysis required econometric methods that my traditional software didn’t support. In a normal workflow, this would have meant weeks of coding, followed by weeks of writing, with the analysis and the manuscript developing in separate tracks that only converge at the end.

What happened instead was different. Over the course of a few weeks, I built an entire Python package for structural change econometrics, redid parts of the empirical analysis and extended it using the new package, and rewrote the paper — all with Claude Code as an AI collaborator involved at every stage. By the time we were restructuring the manuscript, Claude Code already understood the empirical results, the theoretical framework, and the paper’s argument. It could produce additional analyses on demand as the writing revealed gaps. It provided feedback on structure and references in real time. When we needed a robustness check, it didn’t start from scratch — it knew the data, the methods, and the research question.

This was not “AI wrote our paper.” This was something more interesting, and more useful. And the result was not just a faster paper — it was a better one.

A Different Kind of Setup

To understand why, you need to understand something about the environment. When most people think of using AI for research, they still imagine a browser-based chat: type a question, get an answer, copy-paste what’s useful. That is a fundamentally different technology from what I am describing.

I work in VS Code — a code editor — with Claude Code running inside it. VS Code has quickly become my single environment for everything: writing Python scripts for empirical analysis, compiling LaTeX documents to PDF, managing references, previewing Markdown files, running Jupyter notebooks, editing websites. My entire research workflow lives in one place, and Claude Code has access to all of it (with appropriate permissions — sensitive files and credentials remain excluded). It reads my files, understands my code, navigates my folder structure, and accumulates context across extended interactions. And it serves simultaneously as expert coder, econometrician, theorist, writer, and editor — all in one collaborator that can even clone itself into parallel sub-agents to handle multiple tasks at once.

Before this, my workflow was fragmented across tools: OxMetrics for econometrics, Overleaf for LaTeX, Zotero for references, a browser for literature search. Each tool was good at its job. None of them talked to each other, and none of them understood my research. The shift to a single integrated environment where an AI collaborator has access to everything is not a productivity hack. It is an architectural change that has completely transformed how I do research.

What makes this possible is a hierarchy of context files. At the top level, a file called CLAUDE.md describes my research program — my role, my intellectual framework, the key questions, ongoing projects and their status. Below that, each domain of my work has its own folder with its own CLAUDE.md: /library/ for my paper collection, references, and literature reviews, /research/ for research papers and empirical analyses, /packages/ for the econometrics package, /substack/ for writing, /web/ for websites. At the most specific level, individual projects have their own context. The forecast-error paper lives in /research/forecast-errors/, with a CLAUDE.md describing the paper’s argument, the coauthor relationship, the theoretical framework, and the empirical strategy, plus a ROADMAP.md tracking what has been done and what remains.

When Claude Code works on the paper, it loads context from all three levels: it knows the research program, it knows the conventions for empirical work, and it knows this specific paper’s details. This hierarchical context is what turns a generic AI into a deeply-informed collaborator.

The Conversation So Far

The debate about AI in academic research is accelerating. Scott Cunningham’s excellent Claude Code series — now at 29 installments — documents his experiments using AI for empirical economics, from difference-in-differences analysis to autonomous paper generation. Alexander Kustov’s “Academics Need to Wake Up on AI” (Part I and Part II) lays out twenty theses on institutional disruption. Ethan Mollick provides analytical frameworks for delegation and AI’s jagged capabilities. Claes Backman has built structured tools for AI-driven paper feedback, with parallel review agents simulating different aspects of journal review.

A common framing across these contributions is that AI makes research faster and cheaper. Production costs collapse. Papers can be generated in hours. Cunningham gave Claude Code a vague prompt and got a complete empirical paper. The University of Zurich’s Project APE has autonomously generated over 200 economics papers. Kustov estimates a publishable-quality manuscript can be produced for about $100.

This is all true, and important. But it is not the most interesting thing AI does for research.

A Room Full of Experts

Researchers have always relied on feedback to improve their work. The traditional mechanisms are familiar: you spend months writing a paper, submit it to a conference, wait months to learn if you are selected, wait more months for the event itself, and finally present it to an audience that mostly has not read it. You get feedback of mixed quality — some valuable post-presentation discussions, some half-formed comments disguised as questions. Then the peer-review cycle: submit to a journal, wait months for referee reports that are formatted as quality assessments and sometimes criticism, rarely as the kind of constructive feedback that actually improves the work. It is a slow, intermittent, and often frustrating process. And it only begins after the paper is already written.

Of course, many researchers also get faster feedback through regular conversations — with collaborators, colleagues down the hall, or more experienced mentors. These interactions are genuinely valuable, and for many academics they are the primary way ideas get sharpened. But even in the best case, colleagues have limited time and availability, and no single person combines deep expertise across every dimension of a project — the econometrics, the theory, the code, the writing, and the specific details of your data and manuscript.

AI changes this fundamentally. Working with Claude Code on the forecast-error paper was like writing in a room surrounded by experts — theorists, econometricians, coders, editors — who know your work intimately and are available while you think, not months after you have finished.

The most valuable part of the process was not the code or the text that Claude Code produced. It was the discussions that preceded them. Before any execution, I would present my ideas — a new way to structure a section, an extension of the empirical analysis, a theoretical claim I was considering. Claude Code would push back, ask follow-up questions, suggest connections I had not seen, identify weaknesses in the argument. These conversations made me think more clearly. They surfaced links between the theoretical framework and empirical results that I might have missed. They revealed gaps in the argument before those gaps made it into the manuscript.

This is what seminars and conferences are supposed to do. The difference is that these discussions happened continuously, at exactly the moment when the feedback could shape the work — not months later, addressed to a finished product.

And then the execution flowed naturally. Because we had already discussed and converged on what to do, the implementation — whether writing code, producing tables, or restructuring a section — was grounded in shared understanding. Claude Code wrote almost all of the code for the empirical analysis. I checked everything — manually recreating the data from scratch, reading through functions to understand what they did, and verifying results against known benchmarks. Boris Cherny of Anthropic has said that the software engineers at Anthropic no longer write code manually. This was essentially my experience too: the resulting code was better structured, better documented, and more coherent than what I would have written on my own. My role shifted from writing code to directing and verifying it.

The CLAUDE.md hierarchy is what makes the experts in the room genuinely expert. Generic AI feedback — the kind you get from a browser-based chat with no context — is easy to dismiss. But when the AI knows your theoretical framework, your empirical strategy, your data, and the specific claim you are trying to make, its feedback becomes substantive. It is the difference between asking a stranger at a conference and asking a colleague who has been working alongside you for weeks.

Could we have produced the same paper without AI, given enough time? Probably. But time is finite, especially with a deadline. The honest accounting is this: the ideas in the paper were there before we started rewriting. What AI made practical was the thoroughness — more robustness checks, a cleaner structure, tighter linkage between the theoretical claims and the empirical evidence. When the marginal cost of “one more robustness check” or “let’s try restructuring this section” drops to near zero, you do things you would otherwise have compromised on. That is not just faster — it is genuinely better, because the binding constraint was always time, not ideas.

This points to something important: a great paper still needs a great idea. AI can help you execute with extraordinary thoroughness, but it cannot turn a weak research question into a strong one. The irreducible human contribution is the intellectual insight — the question worth asking, the framework worth developing. AI raises the ceiling on execution. The floor is set by the quality of the idea.

As Alex Imas recently observed, the real value of AI in research is “not producing more papers, but rather very different papers, ones I would have not been able to write before.” I agree — and I would add that “different” here means more fully realized, not more hastily produced.

Intelligence Is the Wrong Question

Much of the debate about AI in research gets stuck on whether AI is “really” intelligent. I think this is the wrong question. The right question is whether the interaction produces useful research output. Rejecting AI-assisted research because “AI isn’t intelligent” is like rejecting research that used a calculator because “calculators don’t understand mathematics.” What matters is whether the findings are correct and the reasoning is sound — not whether every cognitive step was performed by a human brain. As Kustov points out, there is a striking double standard at work: we hold AI to a standard of zero errors while tolerating widespread human research flaws — data errors, p-hacking, non-replicable findings.

The answer, from my experience, is that AI is extremely useful — when the interaction is structured correctly. The key is methodology. Don’t ask the AI to write your paper. Start with a discussion where you present your ideas. Go through multiple rounds of back-and-forth. Ask the AI to push back, to identify weaknesses, to suggest what you might be missing. Converge on what to do before doing anything. Claude Code has a Planning Mode that I use constantly: after brainstorming back and forth, it presents a structured plan for what it will do, and I review and approve — typically after a few iterations — before any execution begins. Only then does the AI write code, produce text, or make changes.

This is the difference between having a conversation with a knowledgeable colleague and placing an order with a contractor. Both can produce output. Only one produces output you can trust.

But let me be clear about the limits. During the forecast-error paper work, Claude Code proposed an elaborate extension of a simple theoretical claim. It produced a formal proposition with a full proof — technically impressive, logically tight, even a bit elegant. I was immediately skeptical: the proposition was far too elaborate for what should have been a straightforward point. So I dug into the proof.

The problem: the entire proof was built on the assumption that the model’s shifting parameters were drawn from a bounded probability distribution. Our paper explicitly specifies these parameters as deterministic — we do not assume they are drawn from any distribution. That is a foundational feature of our theoretical framework, not a technical detail. The proposition looked rigorous. It was substantively wrong.

A “let the AI write the paper” approach would have put this impressive-looking-but-incorrect proof into a submitted manuscript with my name on it. The collaborative approach caught it — first through the skepticism that comes from knowing the theoretical framework, then through careful examination of the assumptions. Domain expertise operated as an early-warning system before the specific error was identified.

The lesson is not that AI is unreliable. The lesson is that AI is a collaborator that requires direction, critical engagement, and domain expertise. Used that way, it is extraordinarily valuable. Used without those things, it can produce confident, well-formatted errors.

Across the Research Program

The forecast-error paper is the most vivid example, but the same pattern — discussion first, then execution, with accumulated context throughout — extends across my entire research program.

I maintain a curated library of papers organized by author, year, and topic, with summaries and literature reviews explicitly linked to my research agenda and BibTeX entries managed by Claude Code. When I write a paper or prepare the Dispatch, it draws on this library to suggest references — not generic citation suggestions, but recommendations informed by its knowledge of what I am arguing and how each paper relates to it.

I keep a folder for early-stage research ideas. When an idea strikes — usually vague and half-formed — I describe it to Claude Code and we go through multiple rounds of discussion. It asks follow-up questions, identifies the gap in the literature, pushes back on weak points. By the end, it synthesizes my initial sketch into a structured description: what the paper would argue, how it fits the existing literature, a rough outline, and an honest assessment of whether it could have impact. Most ideas never materialize. But this process provides instant intellectual feedback that would otherwise require scheduling a meeting with a colleague.

The Knightian Uncertainty Dispatch — a monthly curated reading list I started before AI — is now produced with Claude Code participating in the curation. Based on its knowledge of the research program and the curation criteria we have developed together, it helps identify relevant papers and surfaces connections I might have missed. The INET Center website was redesigned with Claude Code’s knowledge of the center’s strategy and projects informing every decision.

In each case, context flows between activities. The library informs the literature notes, which inform the empirical strategy of a research paper. Findings from research inform what I curate in the Dispatch and what I write about on Substack. AI didn’t create any of these activities. It made them dramatically better by understanding the connections between them — and by being available for discussion at every step.

An Invitation

This Substack chronicles research on how economies undergo structural change — nonrepetitive shifts that cannot be fully anticipated from past experience. AI is itself such a change, in how research is done and communicated. The tools, the institutions, and the formats we have inherited were built for a different era. Some will persist. Others will be replaced by things we cannot yet fully envision.

I don’t claim to have the answers to all the institutional questions this raises — though I have written about what I think should happen to peer review. What I can offer is the report of someone who has been using AI as a deeply-informed collaborator across an entire research program. The experience has been transformative — not because AI replaced any part of what I do, but because it enabled me to do all of it at once, and to begin doing things I had long wanted to do but couldn’t.

If you are still thinking of AI as the browser-based chat experience — type a question, get an answer — you are thinking about a different technology. Modern agentic AI tools like Claude Code operate directly in your project, read your files, navigate your folder structure, and accumulate context over extended interactions. The difference is not incremental. Alexander Kustov suggested that academics should spend a dedicated day experimenting with agentic AI. I would go further: set up a workspace for your research. Create a CLAUDE.md file that describes your research program — your questions, your methods, your current projects. Then have a conversation about your work — not “write this for me” but “here is my argument, push back.”

And don’t let the setup intimidate you. You don’t need to read a stack of guides before getting started. Install Claude Code (Hannah Stulberg’s Claude Code for Everything has a step-by-step installation guide), open it, and start a conversation: tell it who you are, what you work on, and what you need help with. It will create the CLAUDE.md file for you — all you do is write to it in plain English, not code. Want to start a new project? Tell Claude Code what you want, and ask it to help you figure out what to do. It really is that simple. After a few days, once you have a feel for the tool, Hannah Stulberg’s Claude Code for Everything and her deep dive on CLAUDE.md files are excellent resources for going deeper.

See what happens when AI understands not just your current task, but your intellectual project. The results might surprise you.

A note on how this post was written. I fed Claude Code the links to the writings referenced above — Cunningham’s series, Kustov’s posts, Mollick’s analyses, Backman’s tools — and we spent an extended session discussing them: what the main themes were, where they converged, where they disagreed, and how they related to my own experience. I shared my thoughts, and Claude Code commented, pushed back, and helped me sharpen vague intuitions into concrete arguments. At some point, it asked: “Is this a Substack post in disguise?” We agreed it was, and collaborated on writing it — following the exact methodology described above. The ideas, the experiences, and the arguments are mine. The process of turning them into this text was collaborative. Which is, of course, the point.

Robin Good

Mar 9

Excellent reporting and insights Morten. Very useful indeed.

Luis Lozano Paredes

Mar 10

This is excellent! I really like the "room full of experts" metaphor.

I want to raise a question your piece opens but doesn't address: what does this model mean for qualitative research?

Your experience is structured around quantitative work, where the criteria for "better" (from my quant days, if I remember correctly) are well-defined, with more robustness checks, tighter linkage between theory and evidence, and cleaner code. Your proof-checking anecdote is telling: you caught the bounded-distribution error because there was a formal criterion for "wrong."

In qualitative inquiry, equivalent errors are epistemological and appear as rigour. When AI identifies "themes" across interview transcripts, it performs pattern recognition that can easily be mistaken for interpretation. There's growing concern in the qualitative methods literature that AI-assisted analysis risks a quiet drift back toward positivism: treating meaning as something to be extracted from data rather than constructed through the researcher's situated engagement with it.

Your point about AI as an interlocutor (helping you think more clearly before execution) is where I think the most interesting stuff is. In qualitative work, what the AI cannot access is often precisely what matters: the embodied encounter, the researcher's own positionality shaping what counts as salient. When the AI becomes the colleague who helps sharpen your argument, it also becomes an actor that shapes what gets thought. And that might be good, but it's not a neutral contribution; it reconfigures the knowledge-production assemblage itself.

None of this is an argument against using AI in qualitative research. It's an argument that the "collaborative methodology" you describe so well here needs a different epistemological grammar when the research instrument is the researcher themselves. Your piece is the best articulation of the quantitative case I've seen, but I think the qualitative case is when the genuinely hard design questions remain unresolved! Have you engaged at all with qual questions or views at all? Would be very interested to know!

Discussion about this post

Ready for more?