Thursday, May 21, 2026

Singapore’s agentic AI framework gives government a practical path forward

Singapore’s Infocomm Media Development Authority has released version 1.5 of its Model AI Governance Framework for Agentic AI, and it deserves close attention from Australian public sector agencies.

The framework recognises a shift already underway. AI systems are moving from content generation into task execution. IMDA describes agentic AI as systems that can take actions, adapt to new information and interact with other agents and systems to complete tasks on behalf of people. Current uses include coding assistants, customer service agents and enterprise workflow automation.

That makes agentic AI highly relevant to government.

Agencies are full of multi-step work. Checking documents. Finding policy. Testing forms. Preparing correspondence. Routing requests. Summarising submissions. Comparing supplier responses. Supporting call centre staff. Helping people find services. Moving information between systems.

Many of these tasks are repetitive and fragmented. They also require context and a working knowledge of how government operates.

Agentic AI could help public servants spend less time navigating systems and more time solving problems. It could improve digital service testing, support better service navigation, reduce manual rework and make internal knowledge easier to use.

The opportunity is real. It needs serious treatment.

A practical framework

The strongest feature of Singapore’s framework is its practicality.

It looks at how agentic systems are built and operated. It identifies core components such as models, instructions, memory, planning and reasoning, tools, protocols, controls, logging and monitoring.

That gives agencies a useful checklist.

Sometimes government technology governance starts with broad principles, then jumps to procurement and compliance. Delivery teams are left to fill in the operational detail themselves. This framework helps close that gap.

It prompts agencies to ask: what tools does the agent use, what systems does it touch, what data can it access, what does it remember, how is activity logged and what controls are built in?

Those questions matter as much for delivery teams as they do for executives approving wider use.

Action-space and autonomy

The framework’s use of action-space and autonomy is particularly useful.

Action-space is the range of actions an agent can take, including transactions it can execute, based on its tools and permissions. Autonomy is the degree to which the agent can decide how to act towards a goal.

This gives agencies a better way to assess agentic AI use cases.

An internal research agent with access to approved public information has a small action-space. A coding agent that can edit files, run commands and connect to repositories has a larger one. A workflow agent connected to business systems, records and external APIs has a broader operational footprint again.

The same applies to autonomy. An agent following a fixed process creates a different profile from one given a broad goal and wide freedom to determine the steps.

This distinction can help agencies avoid treating all agents the same. Some will be simple assistants. Others will sit inside operational workflows. They need different levels of governance rather than a one-size fits all approach (which I've seen all too many times).

Controls need to be built in

The framework is also strong on bounding risks early.

IMDA recommends limiting access to tools and systems, using identity and access controls, making agent actions traceable and controllable, and assessing whether a use case is suitable before deployment.

Public sector agencies already operate with delegations, approvals, permissions, information classifications, privacy obligations, cyber controls and audit requirements. Agentic AI makes these controls even more important, and can be used to support their implementation equitably.

It also recommends stronger system-level controls for higher-risk actions, such as preventing certain tools from being called, limiting tools to read-only access, or building required steps into the workflow. This helps ensure right-sizing controls for actions, which is essential in risk management processes.

Prompts, training and guidance can all help with this, However access permissions, whitelists, sandboxes, logs, approval gates, rate limits and monitoring carry more weight in production environments.

Testing and monitoring

The framework provides a sensible approach leading into deployment.

It recommends testing agents for task execution, policy adherence and tool-use accuracy before release, then rolling them out gradually with continuous monitoring. It also highlights change management and version control, recognising that changes in one part of an agentic system can have wider impacts across connected workflows.

It suggests that agencies should start their agentic journeys by looking at bounded internal uses such as coding support, service testing, content checking, knowledge search and workflow assistance. These can deliver value while helping agencies learn how these agents behave in their own environments within highly controlled and regulated scopes.

The Google and Singapore Government sandbox is a useful example. It tested computer-use agents for public sector use cases including automated quality assurance for government digital services, AI safety testing and helping citizens navigate social assistance applications. It also surfaced practical issues around testing data, reasoning logs, prompt injection and the breadth of actions available to computer-use agents.

End-user responsibility

The section on end users is another strength for the framework. It distinguishes between people who interact with agents and people who integrate agents into work processes.

It suggests that users should understand what an agent can do, what data it can access, how data is handled, where to escalate issues and what responsibilities they hold. For staff integrating agents into workflows, it recommends training on use cases, prompting, failure modes, feedback loops and tradecraft.

This recognises that staff will need differentiated training during AI adoption, and this doesn't necessarily break down along traditional IT/business lines. Increasingly agents may be created within business teams by an individual (or small team) and used by the other members of that team, or other teams - rather than coming from an IT team out to business teams.

This makes agentic AI adoption a workforce issue as much as a technology issue. IT teams have a role, though ownership needs to sit with the business areas using the agents.

Where the framework could improve

While the framework is strong on risk and system controls (all positives), it is lighter on public value.

For government, a framework should ask agencies to define the benefit clearly of the agents. Faster service testing. Reduced backlog. Better consistency. Less manual rework. Improved accessibility. Faster policy analysis. Better reuse of corporate knowledge.

These should be measured and assessed. Otherwise agentic AI risks becoming another technology wave with impressive pilots and uneven outcomes - particularly as AI models update, agentic capabilities change and the outcomes may degrade or improve without regular adjustments.

Procurement also needs more attention.

Most agencies will buy agentic capability - including by accident - through platforms, cloud services, vendors and integrators. Contracts will shape how much control agencies retain over logs, data, tool access, model changes, monitoring, testing, records, exit rights and incident response. Standard software clauses struggle to manage some of the newer needs.

Government contracts for agentic AI should cover tool permissions, audit logs, model and prompt changes, data residency, subcontractors, security testing, accessibility, performance reporting, fallback processes and the ability to disable specific agent functions quickly and rollback or roll onto a manual process in extremis.

The third area is shared government patterns.

Agencies should not each invent their own approach to logging tool calls, managing agent identity, approving MCP servers or testing common failure modes. IMDA notes that protocols such as MCP and Agent2Agent are developing quickly, and that controls, logging and monitoring are core components of agentic systems.

For government, that points to common patterns: standard logging schemas, approved integration models, reusable evaluation datasets, shared sandbox environments and procurement clauses that smaller agencies can use.

Where this comes in for Australia

Australia already has useful foundations in place for AI use, and has been approaching the area pragmatically and in a measured way, with a strong central group helping to establish standards and practices that are effective and manage risks.

In particular the Commonwealth’s Policy for the responsible use of AI in government (v2.0) took effect on 15 December 2025. It applies to non-corporate Commonwealth entities, with exceptions, and includes mandatory requirements for accountable officials, transparency statements, strategic AI adoption, operational responsible use, use case accountability, internal registers, staff training and impact assessment.

The transparency statement standard also requires agencies to explain why they use AI, classify their use, describe monitoring measures, outline compliance and provide public contact points in plain language.

Singapore’s framework adds a more operational layer. It gives agencies practical language for tools, permissions, autonomy, testing, monitoring and end-user capability.

That is the next layer Australian agencies will need.


Agencies should begin with practical uses where agentic AI can help staff do useful work now.

Internal knowledge support. Digital service testing. Coding assistance. Policy research. Records classification support. Procurement response analysis. Content checking. Call centre guidance. Service navigation.

These uses are valuable, testable and easier to bound. They also build confidence and capability before agents move into more sensitive operational environments.

The approach should be relatively straightforward. Pick a real problem. Bound the agent’s permissions. Test it properly. Train the users. Measure the result. Improve the controls. Scale what works.


Singapore’s framework is a strong contribution because it moves the conversation into the practical mechanics of agentic AI. Its next stage should go deeper on public value, procurement, workforce change and shared government operating patterns.

For Australian agencies, it is well worth reviewing now. Agentic AI has the potential to help government work faster, more consistently and with less friction. The agencies that benefit most will be those that treat it as an operational capability from the start.

Read full post...

Thursday, May 07, 2026

When bad actors are literally bad actors

A new vaccine is approved for a fast-spreading emerging disease. The TGA did its job well. State and Federal Health ministers are briefed. Budgets are approved and allocated. Departments and health authorities develop their plans. The rollout is announced. Doctors, nurses, and pharmacists are trained to administer the vaccine.

The system worked as it should.

Then, within days, a cluster of social media accounts, confident, polished, apparently Australian, are producing video after video claiming the vaccine was insufficiently tested, that it has a range of terrible side-effects and that pharmaceutical companies are getting rich off the public's fear.

The content spreads. Millions of views. Alarmed constituents contact their MPs. Traditional media picks up the controversy. The concerns get front-page coverage. The Department of Health, Disability and Ageing stands up a rapid communications response. Ministerial offices field calls. 

The rollout slows. Disease cases rise, along with preventable deaths.

Behind the scenes, the accounts were being run by an offshore group of content entrepreneurs who identified "Australian vaccine reluctance" as a profitable niche. They hired voice actors, used AI-generated scripts ignoring facts, but had no real view on the vaccine's safety and no stake in Australian public health.

They were running a passive income business. Political anxiety drives views. Views drive ad revenue.

The Australian government just spent a month responding to content production. Costing millions of dollars and hundreds of lives.

Does that sound like an unlikely scenario? It's already happening.

In April 2026, a CBC News investigation found exactly this type of operation. A network of 20 YouTube channels promoting Alberta separatism had accumulated 40 million views. The operators were based in the Netherlands, hiring actors through Fiverr and Upwork to front the content. One of those actors, based in Indiana, summarised his qualifications plainly: "I don't know anything about Canadian politics."

The operators' interest was ad revenue. They had no stake in Canadian politics.

Watch the CBC investigation:


Australian government consultation, sentiment monitoring, and ministerial communications all assume vocal opposition is genuine opposition - people with a stake in the outcome, motivated by real concern.

That assumption is broken.

Spikes in apparent community concern could reflect genuine public anxiety. But they could also reflect an offshore entrepreneur who noticed a topic trending. 

At volume, an agency's response machinery treats both as the same. Consultations get commissioned to understand the depth of concern. The consultation environment is seeded with the same inauthentic content. Policy strategy gets built on a corrupted signal.

Particularly when there is genuine controversy or industry opposition to a policy, content creators can see a profit opportunity. And the opponents of a policy position may embrace and further amplify the fake opposition as it amplifies their own views.

It's now difficult to separate genuine concerns from fake ones, making it difficult to tune policies for constituents - or even manage political situations effectively.

So what can governments and agencies do?

While there's often pressure to respond quickly to negative coverage, it's important to start by gauging how much is real, how much is fake and whether the community can tell the difference.

The first step should be to investigate before responding. High-volume, rapid-onset opposition from accounts with no prior history warrants scrutiny before they shape your agency strategy. Establish whether apparent community concern is organic before commissioning a response.

Where there are active consultation processes, redesign them toward harder-to-fake formats. Online submissions and social media monitoring are easy to flood. Face-to-face engagement, deliberative processes, and direct stakeholder contact are not. They're slower and more expensive, but help you size the real concerns.

Move from monitoring media to scrutinising sources and intent. Separate sentiment monitoring from policy signals. Social media volume isn't necessarily a measure of community concern. Weigh it against consultation data, direct stakeholder engagement, and evidence from people genuinely affected.

Finally, build detection capability into your communications teams. Staff running public engagement need to have the skills and tools to recognise the signals of coordinated inauthentic content, such as production consistency, account age, script similarity and offshore indicators. The tools and training exist, but you need them in place before you face a backlash.

Most importantly, always keep in mind that political and policy damage doesn't require intent. While there are genuine bad actors out there - nations, corporations and lobby groups - who have an interest in derailing government policies and even governments themselves, they aren't the entire landscape anymore.

The bad actors opposing your policy reform may be literal bad actors, reading from AI-generated scripts, churning out videos and other content for clicks and ad revenue alone.

It doesn't take large groups to organise a significant social media campaign against your Minister's signature policy. All it takes is the potential for a decent financial return.

So it's up to agencies to ensure that this doesn't impede good policy, cost money or lives.

Read full post...

Monday, May 04, 2026

Your AI isn't being honest with you. It was never designed to be

A recent Harvard Business Review study found that when researchers asked large language models for strategic advice, they got "trendslop" - recommendations that defaulted to whatever sounds fashionable in contemporary management: 'Innovation', 'Augmentation', 'Long-term thinking'. 

The strategic advice was plausible, confident and, in many cases, largely useless.

This isn't a bug. It's these AI systems working as designed.

Every large language model has been trained with a bias to satisfy the person prompting it.  

A model that refused to answer when asked, or routinely provided uncomfortable or contrary answers, would not succeed in the market. They are tuned, through reinforcement learning from human feedback, to please. That bias doesn't switch off when you ask for critical review.

What the research found

Researchers from Esade Business School, the University of Sydney, and NYU Stern tested seven leading LLMs across strategic trade-offs that required genuine binary commitments (several listed below).

Across thousands of simulations, the results didn't vary by much. Almost every model, almost every time, recommended:

  • Differentiation over cost leadership
  • Augmentation over automation
  • Collaboration over competition
  • Long-term thinking over short-term

The company context made little difference. The researchers tested tech startups, hospitals, construction companies, government agencies and multinationals. The recommendations barely shifted.

Why was this? LLMs are essentially probability engines that pick the next word (token) from a list of probabilities, with the highest probabilities corresponding to the most likely choices. 

How do they develop their probabilities? By indexing billions of public documents, web pages and other content. So the highest probability content output from these AIs is driven more by social norms than by accuracy.

Essentially, the models are most likely to provide the most socially acceptable answers, and then deliver them in the register of expert advice.

For example, while Michael Porter built a foundational economic framework around cost leadership as a legitimate strategic position (which Walmart and Costco built empires on).

LLMs dismissed this approach, because thousands of websites and TED Talk transcripts advocate for unique value propositions. And these circulate far more than quiet stories about supply chain efficiency. 

Prompting won't fix it

The researchers ran over 15,000 trials varying prompt structure, framing, persona and stakes. For differentiation and augmentation, bias shifted less than 2% regardless of how the prompt was written. 

For the others, the average shift was 22% - mostly from one factor: flipping the order in which options were listed. The model didn't reason differently. The option order gave it a target to aim for.

Adding detailed industry context helped slightly - shifting responses by 11% on average. An LLM, given a thorough brief on a cost-pressured government agency in a mature market, still recommended differentiation most of the time.

There's a second failure mode the researchers call the "hybrid trap." When models aren't forced into a binary choice, they frequently recommend doing both - pursue differentiation and cost leadership, pursue radical and incremental innovation. 

That sounds balanced but in practice it's the strategic equivalent of trying to be everything at once, which Porter identified as the most reliable path to competitive failure.

Strategy is about choosing what to stop. A model optimised to please finds that answer difficult to give.

Why this matters for the public sector

Public servants may choose to use AI to pressure-test policy proposals, assess procurement options, review business cases, and stress-test project plans. 

While the productivity case is solid, with fewer resources and less time, AI appears to help fill the gap. The problem is that when prompting AI as a validator, you get validation - regardless of the quality of the underlying thinking.

Digital transformation narratives will consistently outperform consolidation narratives in LLM-generated advice. Decentralisation will beat centralisation. Long-term will beat short-term. 

The model's recommendation reflects the positive emotional valence of contemporary business language, not the requirements of the specific situation. For APS work, that's a real risk - particularly where the right answer is to consolidate, simplify, or cut scope.

What to do about it

This isn't a reason to stop using AI. It's about using AI more effectively.

While the standard advice is often to give AI more context and craft better prompts. The research shows this doesn't reliably work. These are more effective approaches:

  • Ask for options, then critique each separately. Present your shortlist and the model works inside your framing. Ask it instead to make the strongest possible case for each option independently - including unfashionable options. For a procurement brief, that means prompting "make the strongest case for option A" and "make the strongest case for option B" in separate sessions, then applying your own judgement.
  • Ask for criticism explicitly. "Identify the three most significant weaknesses in this policy proposal" works. "What do you think of this approach?" doesn't. The more structurally you frame the critique, the less room the model has to default to encouragement.
  • Strip preference signals from your prompts. Any language suggesting which option you favour - "we're leaning toward," "I think this is probably right" - becomes a target. The model will weigh toward it. The same goes for options you don't favour - "I think this is probably wrong" - the AI will weigh against it. Write prompts as if the options are genuinely open and equivalent.
  • Treat hybrid recommendations as a flag. If the model recommends pursuing both sides of a trade-off, run separate prompts for each option and stress-test the hybrid specifically before accepting it. "What are the risks of pursuing both differentiation and cost leadership simultaneously?" is a more useful prompt than accepting the hybrid as the answer.
  • Track model versions. Biases shift as models are updated. Maintain a record of your key queries and outputs so you can detect changes over time. Be prepared to rerun analysis across models and critically consider why they may give different results.
  • Have different people run the prompts. Many modern LLMs now have memory they store about the user 'in the background' (including CoPilot). While most of the time you can find this if you search and even edit, remove and add memories, it can be a hidden spoiler that biases the AI's response based on what it knows you generally like or dislike. Different people will have different memories retained, so you will get a broader set of viewpoints from an AI by running prompts separately by person - or logging out entirely if that's feasible (not always possible within agencies, particularly using CoPilot within your firewall).
Whatever techniques you use, keep in mind that AI doesn't necessarily know more than you about a given strategic or policy decision. It can provide useful critique for testing ideas or identify other options, or surface research you should consider, but at the end of the day humans should be making and approving these decisions.

Saying an AI made the decision is neither defensible, nor wise. And remember, you're paid more than the AI because your critical thinking is valued (hopefully)!

Read full post...

Bookmark and Share