A recent Harvard Business Review study found that when researchers asked large language models for strategic advice, they got "trendslop" - recommendations that defaulted to whatever sounds fashionable in contemporary management: 'Innovation', 'Augmentation', 'Long-term thinking'.
The strategic advice was plausible, confident and, in many cases, largely useless.
This isn't a bug. It's these AI systems working as designed.
Every large language model has been trained with a bias to satisfy the person prompting it.
A model that refused to answer when asked, or routinely provided uncomfortable or contrary answers, would not succeed in the market. They are tuned, through reinforcement learning from human feedback, to please. That bias doesn't switch off when you ask for critical review.
What the research found
Researchers from Esade Business School, the University of Sydney, and NYU Stern tested seven leading LLMs across strategic trade-offs that required genuine binary commitments (several listed below).
Across thousands of simulations, the results didn't vary by much. Almost every model, almost every time, recommended:
- Differentiation over cost leadership
- Augmentation over automation
- Collaboration over competition
- Long-term thinking over short-term
The company context made little difference. The researchers tested tech startups, hospitals, construction companies, government agencies and multinationals. The recommendations barely shifted.
Why was this? LLMs are essentially probability engines that pick the next word (token) from a list of probabilities, with the highest probabilities corresponding to the most likely choices.
How do they develop their probabilities? By indexing billions of public documents, web pages and other content. So the highest probability content output from these AIs is driven more by social norms than by accuracy.
Essentially, the models are most likely to provide the most socially acceptable answers, and then deliver them in the register of expert advice.
For example, while Michael Porter built a foundational economic framework around cost leadership as a legitimate strategic position (which Walmart and Costco built empires on).
LLMs dismissed this approach, because thousands of websites and TED Talk transcripts advocate for unique value propositions. And these circulate far more than quiet stories about supply chain efficiency.
Prompting won't fix it
The researchers ran over 15,000 trials varying prompt structure, framing, persona and stakes. For differentiation and augmentation, bias shifted less than 2% regardless of how the prompt was written.
For the others, the average shift was 22% - mostly from one factor: flipping the order in which options were listed. The model didn't reason differently. The option order gave it a target to aim for.
Adding detailed industry context helped slightly - shifting responses by 11% on average. An LLM, given a thorough brief on a cost-pressured government agency in a mature market, still recommended differentiation most of the time.
There's a second failure mode the researchers call the "hybrid trap." When models aren't forced into a binary choice, they frequently recommend doing both - pursue differentiation and cost leadership, pursue radical and incremental innovation.
That sounds balanced but in practice it's the strategic equivalent of trying to be everything at once, which Porter identified as the most reliable path to competitive failure.
Strategy is about choosing what to stop. A model optimised to please finds that answer difficult to give.
Why this matters for the public sector
Public servants may choose to use AI to pressure-test policy proposals, assess procurement options, review business cases, and stress-test project plans.
While the productivity case is solid, with fewer resources and less time, AI appears to help fill the gap. The problem is that when prompting AI as a validator, you get validation - regardless of the quality of the underlying thinking.
Digital transformation narratives will consistently outperform consolidation narratives in LLM-generated advice. Decentralisation will beat centralisation. Long-term will beat short-term.
The model's recommendation reflects the positive emotional valence of contemporary business language, not the requirements of the specific situation. For APS work, that's a real risk - particularly where the right answer is to consolidate, simplify, or cut scope.
What to do about it
This isn't a reason to stop using AI. It's about using AI more effectively.
While the standard advice is often to give AI more context and craft better prompts. The research shows this doesn't reliably work. These are more effective approaches:
- Ask for options, then critique each separately. Present your shortlist and the model works inside your framing. Ask it instead to make the strongest possible case for each option independently - including unfashionable options. For a procurement brief, that means prompting "make the strongest case for option A" and "make the strongest case for option B" in separate sessions, then applying your own judgement.
- Ask for criticism explicitly. "Identify the three most significant weaknesses in this policy proposal" works. "What do you think of this approach?" doesn't. The more structurally you frame the critique, the less room the model has to default to encouragement.
- Strip preference signals from your prompts. Any language suggesting which option you favour - "we're leaning toward," "I think this is probably right" - becomes a target. The model will weigh toward it. The same goes for options you don't favour - "I think this is probably wrong" - the AI will weigh against it. Write prompts as if the options are genuinely open and equivalent.
- Treat hybrid recommendations as a flag. If the model recommends pursuing both sides of a trade-off, run separate prompts for each option and stress-test the hybrid specifically before accepting it. "What are the risks of pursuing both differentiation and cost leadership simultaneously?" is a more useful prompt than accepting the hybrid as the answer.
- Track model versions. Biases shift as models are updated. Maintain a record of your key queries and outputs so you can detect changes over time. Be prepared to rerun analysis across models and critically consider why they may give different results.
- Have different people run the prompts. Many modern LLMs now have memory they store about the user 'in the background' (including CoPilot). While most of the time you can find this if you search and even edit, remove and add memories, it can be a hidden spoiler that biases the AI's response based on what it knows you generally like or dislike. Different people will have different memories retained, so you will get a broader set of viewpoints from an AI by running prompts separately by person - or logging out entirely if that's feasible (not always possible within agencies, particularly using CoPilot within your firewall).