Two people can interview the same expert and produce completely different value.
One conversation feels flat. The expert repeats familiar points, gives generic explanations, and never quite reaches the interesting layer.
Another conversation with the same expert feels dense and specific. Suddenly there are trade-offs, edge cases, stories, contradictions, and insights you had not heard before.
The expert is the same person. What changed was the quality of the interview.
That is the "interviewer problem": what becomes visible depends heavily on the person asking. You can see this in journalism, education, coaching, consulting, research, hiring, medicine, therapy, and now daily AI use.
When we talk to a large language model, we are interviewing a system, whether we think of it that way or not. The same model can give shallow, generic answers or useful reasoning depending on how we frame the interaction.
I do not buy the tidy version of this idea, where "the answer is only as good as the question." Some experts are better than others. Some AI models are stronger than others. Some topics require prior knowledge before you can even recognize a good answer.
Still, the core point holds: question quality is not a soft communication skill. It is a thinking skill. And it may be one of the most important skills for getting value from AI.
The hidden variable: the interviewer
Survey researchers have studied interviewer effects for decades. The basic finding is simple: interviewers are not neutral pipes through which information flows. Their wording, timing, tone, behavior, demographic characteristics, and follow-up choices can influence what respondents say and how data is collected.
A major research synthesis on interviewer effects describes human interviewers as influencing multiple parts of survey data collection, including recruitment, measurement, and data processing. [1]
This becomes especially visible when the topic is sensitive. In public health surveys, for example, interviewer effects are more likely when people are asked about racial attitudes, substance use, or other topics where social desirability matters. [2]
The practical implication is obvious, but often ignored: the same person may give different answers depending on how, when, and by whom the question is asked.
That does not mean people are dishonest by default. It means human answers are built in context. We respond to the wording of the question, the perceived intent of the questioner, the social setting, the level of trust, the available time, and our own mental state.
Now translate that to everyday life.
When someone asks an expert, "What should people know about nutrition?" they get broad advice. When they ask, "What are the three most common mistakes intelligent amateurs make when interpreting nutrition studies, and how would you correct them?" they get a different answer.
The expert's knowledge was there in both cases. The second question simply reached deeper.
Expertise is not automatically available
There is a naive model of expertise where the expert "has knowledge," you ask a question, and the knowledge comes out.
Real expertise is messier than that.
Expert knowledge is often tacit. Experts know what matters, but they may not immediately explain how they know. They may skip steps that feel obvious to them. They may compress years of pattern recognition into a sentence that sounds simple but hides a complex decision process.
This is why knowledge elicitation became a serious research area. Hoffman and colleagues describe knowledge elicitation as a central part of studying and preserving expertise. Their review places elicitation methods into categories such as task analysis, different types of interviews, and designed tasks that reveal reasoning processes without simply asking experts to introspect directly. [3]
That last point matters. Sometimes the best way to understand an expert is not to ask, "How do you think?" A case walkthrough, a comparison, an exception, a prediction about what might go wrong, or a critique of a flawed example will often reveal more.
A weak interviewer asks for conclusions. A stronger interviewer elicits structure.
For example:
"What do you think about remote work?"
is much weaker than:
"In which types of work does remote work improve outcomes, in which does it degrade outcomes, and what signals would tell you a team is managing it badly?"
The second question asks the expert to classify, compare, diagnose, and identify signals. It does not merely request an opinion.
That is the difference between extracting surface statements and eliciting judgment.
Good questions are designed, not improvised
Research on cognitive interviewing makes another useful point: even the questions themselves need testing. Cognitive interviewing is used to identify whether survey questions are actually producing the information their designers intend. Beatty and Willis define it as administering draft questions while collecting additional verbal information about how respondents interpret and answer them. [4]
That idea transfers well beyond formal survey design. We often assume that a question is clear because it is clear to us. But the listener may interpret a key word differently. They may answer a narrower version of the question than we intended. They may miss the hidden assumption. They may comply with the literal wording while ignoring the real issue.
This is one reason "bad AI answers" are not always pure AI failures. Sometimes the model did exactly what was requested; the request was just vague, under-specified, or aimed at the wrong target.
A bad question asks:
"Is this a good idea?"
A better question asks:
"Evaluate this idea from three perspectives: user need, implementation risk, and long-term maintenance. Identify the strongest argument for it, the strongest argument against it, and the assumption that would most likely make the whole idea fail."
The first question invites reassurance. The second gives the other side something to analyze.
Learning is also an interrogation problem
Education research points in the same direction. Questions do not merely test what students know; they shape what students think about.
A review of questioning as a teaching tool notes that teachers often ask lower-order, convergent questions focused on recall, while higher-order, divergent questions are better suited to analysis, evaluation, and deeper thinking. [5]
There is also a specific learning technique called elaborative interrogation. In plain language, it means asking "why?" questions that force the learner to explain why a fact or idea is true. Dunlosky and colleagues reviewed multiple learning techniques and included elaborative interrogation and self-explanation among the approaches with promise, while also noting limits and context-dependence. [6]
The mechanism is intuitive: when you ask "why does this make sense?" you force yourself to connect new information to prior knowledge. The answer becomes less like a sentence you memorized and more like a structure you understand.
Self-explanation research makes a similar point. Chi and colleagues found that prompting students to explain material to themselves improved understanding, with stronger gains among students who generated more self-explanations. [7]
This is where the connection to AI use gets interesting. If you ask an AI tool:
"Explain this."
you often get a fluent summary. If you ask:
"Explain this, then ask me three diagnostic questions to test whether I actually understood it."
you create a learning loop. If you ask:
"Show me where my current understanding is probably incomplete."
you move from passive consumption to active interrogation.
That is when AI stops being only a convenient answer machine. Used carefully, it becomes a way to expose weak assumptions, missing distinctions, and shallow understanding.
AI makes the interviewer problem visible
Large language models make the interviewer problem obvious because the feedback loop is so immediate.
Ask a vague question and you get a vague answer. Ask a structured question and the answer often improves. Ask for assumptions, trade-offs, counterarguments, examples, edge cases, and verification criteria, and the model has more opportunity to produce useful structure.
There is research behind this pattern too. Chain-of-thought prompting showed that asking models to generate intermediate reasoning steps can improve performance on complex reasoning tasks such as arithmetic, commonsense, and symbolic reasoning. [8]
More broadly, prompting has become a large research area. The Prompt Report, a systematic survey of prompting techniques, describes prompt engineering as widely used but still fragmented, and organizes dozens of prompting techniques into a structured taxonomy. [9]
For ordinary users, the lesson is simpler than the technical literature: a prompt is a small piece of task design. It tells the model what role to play, what context matters, what output format is useful, what constraints apply, what trade-offs to consider, and how much uncertainty to expose.
A weak prompt says:
"Tell me about sleep."
A stronger prompt says:
"I want to understand sleep quality as a non-expert. Explain the main factors that affect it, separate strong evidence from weaker claims, identify common myths, and give me a practical checklist that does not require buying devices."
That is not "prompt engineering" in the gimmicky sense. It is closer to good interviewing.
Better questions do not remove the need for judgment
There is a danger here. Once people discover that better prompts produce better answers, they sometimes overcorrect and start treating every AI failure as a prompting failure. Some failures are simply limits of the model, the evidence, or the task.
A better question can reveal more of what a system can do. It cannot create knowledge that is not there. It cannot guarantee truth. It cannot remove hallucination risk. It cannot replace evidence, domain expertise, or verification in high-stakes contexts.
Hallucination is a known problem in natural language generation. A survey by Ji and colleagues describes how modern generation systems can produce fluent and coherent text while still generating unintended or nonfactual content, which degrades reliability in real-world use. [10]
The mature position is more boring and more useful: ask better questions so you can get better structure, then verify what matters.
That distinction matters. AI is useful for exploration, reframing, comparison, drafting, summarizing, and generating candidate explanations. But when consequences are high, whether medical, legal, financial, safety-related, or reputational, the output must be checked against reliable sources or expert judgment.
Better questions increase leverage, but they do not remove responsibility.
The three ceilings of any answer
A useful way to think about this is that every answer has three ceilings.
The first is the source ceiling. The expert, book, teacher, or AI model must have access to relevant knowledge. If the source is shallow, outdated, biased, or unreliable, excellent questioning can only do so much.
The second is the question ceiling. The question must activate the right kind of response. A vague question usually triggers a generic answer. A precise question can trigger comparison, diagnosis, synthesis, or critique.
The third is the listener ceiling. You need enough prior knowledge to understand the answer. This is uncomfortable but unavoidable. A beginner may receive a high-quality explanation and still miss the most important nuance because they lack the mental schema to interpret it.
This is why learning with AI should not be only about getting answers. It should also build your ability to ask better next questions.
A good learning conversation usually moves through stages:
- Map the territory.
- Clarify the vocabulary.
- Identify the core mechanisms.
- Compare competing explanations.
- Apply the idea to concrete cases.
- Look for exceptions.
- Test your understanding.
Most people stop at stage one, which is why they get summaries instead of insight.
A practical framework for better questions
Here is a simple framework you can use with experts, teachers, coaches, colleagues, and AI tools.
1. Start with context
Bad question
"What should I do?"
Better question
"Here is the situation, the goal, the constraints, and what I have already tried. Given that context, what are the most reasonable next options?"
Context prevents generic advice. It gives the other side something to reason with.
2. Ask for distinctions
Bad question
"Is this good?"
Better question
"When is this good, when is it bad, and what conditions change the answer?"
Distinctions are where expertise lives. Experts rarely think in universal rules. They think in conditions, thresholds, exceptions, and trade-offs.
3. Ask for mechanisms
Bad question
"Does this work?"
Better question
"What mechanism would make this work, and what evidence would we expect to see if that mechanism is real?"
Mechanism questions prevent shallow pattern matching. They force causal thinking.
4. Ask for counterarguments
Bad question
"Why am I right?"
Better question
"What is the strongest case that I am wrong?"
This is one of the most useful AI prompts because models are often too agreeable by default. Asking for friction changes the shape of the answer.
5. Ask for uncertainty
Bad question
"Give me the answer."
Better question
"Separate what is well-established, what is plausible but uncertain, and what would need verification."
This is especially important with AI. A confident tone is not the same as confidence-worthy evidence.
6. Ask for reframing
Bad question
"Answer my question."
Better question
"Before answering, check whether my question is poorly framed. If it is, rewrite it into a better question and answer that."
This gives the expert, or the AI system, permission to challenge the frame. Often, the useful answer starts there.
Examples: weak questions vs stronger questions
Here are a few general examples.
Weak
"How do I get healthier?"
Stronger
"Assume I have limited time, inconsistent sleep, and no interest in extreme routines. What are the highest-leverage health changes, what trade-offs do they involve, and what should I avoid because it looks productive but probably is not?"
Weak
"Explain inflation."
Stronger
"Explain inflation as a system of causes and feedback loops. Separate demand-side, supply-side, monetary, and expectation-driven explanations. Then give a simple example of how each one would show up in daily life."
Weak
"Should I change careers?"
Stronger
"Help me evaluate a career change. Compare upside, downside, reversibility, skill transfer, income risk, identity risk, and opportunity cost. Then list the questions I should answer before deciding."
Weak
"Summarize this article."
Stronger
"Summarize this article, identify its main claim, list the assumptions behind that claim, explain what evidence would strengthen or weaken it, and point out what a skeptical reader should notice."
The stronger versions are longer, but length is not the point. They work because they define the kind of thinking required.
The real AI skill is not prompting. It is interrogation.
"Prompting" is often presented as a technical trick: use the right phrase and get the magic output.
This framing misses the harder part. The deeper skill is knowing how to interrogate a topic: how to move from vague curiosity to precise inquiry, ask for structure instead of content, request uncertainty instead of confidence, use follow-up questions to expose assumptions, and check whether you understood the answer.
None of this is new. It draws from survey methodology, expert interviewing, education, cognitive psychology, and scientific reasoning. AI simply makes the skill more visible because almost everyone now has access to a responsive system that rewards better questioning immediately.
The same principle shows up outside AI as well. Better questions can make a podcast sharper, a meeting shorter, a teacher more useful, an expert more precise, and an AI answer less generic.
And perhaps most importantly, a better question can make your own thinking less passive.
A simple rule to end with
Before asking any expert or AI tool for an answer, ask yourself:
What kind of thinking do I want to trigger? Maybe you need a definition, a comparison, a diagnosis, a critique, a causal explanation, a decision framework, a list of risks, a plan, or a test of your understanding. Those are different requests, and they produce different answers.
Many disappointing answers come from asking for one thing while secretly hoping for another. Answer quality is not only about the person, expert, or model on the other side. It also depends on whether your question gives them a path to the level of thinking you actually need.
That is the interviewer problem. In the age of AI, more people are running into it every day.
References
- West, B. T., & Blom, A. G. Explaining Interviewer Effects: A Research Synthesis. Journal of Survey Statistics and Methodology, 2017.
- Davis, R. E., Couper, M. P., Janz, N. K., Caldwell, C. H., & Resnicow, K. Interviewer Effects in Public Health Surveys. Health Education Research, 2010.
- Hoffman, R. R., Shadbolt, N. R., Burton, A. M., & Klein, G. Eliciting Knowledge from Experts: A Methodological Analysis. Organizational Behavior and Human Decision Processes, 1995.
- Beatty, P. C., & Willis, G. B. Research Synthesis: The Practice of Cognitive Interviewing. Public Opinion Quarterly, 2007.
- Tofade, T., Elsner, J., & Haines, S. T. Best Practice Strategies for Effective Use of Questions as a Teaching Tool. American Journal of Pharmaceutical Education, 2013.
- Dunlosky, J., Rawson, K. A., Marsh, E. J., Nathan, M. J., & Willingham, D. T. Improving Students' Learning With Effective Learning Techniques. Psychological Science in the Public Interest, 2013.
- Chi, M. T. H., de Leeuw, N., Chiu, M.-H., & LaVancher, C. Eliciting Self-Explanations Improves Understanding. Cognitive Science, 1994.
- Wei, J., Wang, X., Schuurmans, D., et al. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv, 2022.
- Schulhoff, S., Ilie, M., Balepur, N., et al. The Prompt Report: A Systematic Survey of Prompt Engineering Techniques. arXiv, 2024.
- Ji, Z., Lee, N., Frieske, R., et al. Survey of Hallucination in Natural Language Generation. ACM Computing Surveys, 2023.
