Perhaps against my better judgement I’ve started working on a paper studying the use of LLMs as stand-ins for humans for, e.g., polling, A/B testing, and social science research. It’s been interesting to read the literature; in social science it seems that a number of prominent members of the field are interested in using LLMs to “fully automate” social science.1 If you think I’m exaggerating, here’s an abstract of a paper by Manning, Zhu and Horton:
We present an approach for automatically generating and testing, in silico, social scientific hypotheses. This automation is made possible by recent advances in large language models (LLM), but the key feature of the approach is the use of structural causal models. Structural causal models provide a language to state hypotheses, a blueprint for constructing LLM-based agents, an experimental design, and a plan for data analysis. The fitted structural causal model becomes an object available for prediction or the planning of follow-on experiments.
While I have many qualms about using LLMs to stand in for humans in the context of social science research, let’s suppose that we do, in fact, have a magic box that will tell us what humans will or will not do in every possible combination of scenarios, and we can list those scenarios in the form of structural causal models. Of course, humans are not all the same, so we should presume that the black box can actually tell us how often such a thing occurs and even what type of people tend to do this kind of thing, and how the context of the situation generates or changes such outcomes.
Most of the hypothesis space we can hand over to the oracle is made up of things like “would American men between ages 35 and 45 from Idaho on average bet 4 dollars out of 10 when they have a 50% chance of tripling their money” or “would a person be more likely to lie about the age they gave up sleeping with stuffed animals after I made them watch a animated movie” or “do mice grow faster when you read them Chaucer.”
This is why I say that the hypothesis space is boring.2 The oracle that that promises and threatens to “automate social science” quickly becomes data! All that’s left to do is the social science.
Footnotes
-
The most charitable read of the endeavor is as a kind of multiverse analysis; the scientists would list every possible hypothesis or combination thereof and produce estimates for every plausible causal model within the system. The less charitable read is that such papers are a prayer to the machine god who will inevitably come to free us of our worldly burdens, in this case that of running randomized controlled trials on MBAs. ↩
-
For a the reader interested in things that are not boring, go check out Murray Davis’s classic article “That’s Interesting!” There he notes that “Interesting propositions all have the form ‘What seems to be X is actually non-X’ or ‘What is accepted as X is actually non-X.’ ” Such propositions challenge a social or scientific constructed understanding of the world. Fields of science have models that we use to understand and interpret findings, and interesting propositions are those which are grounded in, challenging, and reacting to our current understanding. ↩