Thursday, May 07, 2026

When bad actors are literally bad actors

A new vaccine is approved for a fast-spreading emerging disease. The TGA did its job well. State and Federal Health ministers are briefed. Budgets are approved and allocated. Departments and health authorities develop their plans. The rollout is announced. Doctors, nurses, and pharmacists are trained to administer the vaccine.

The system worked as it should.

Then, within days, a cluster of social media accounts, confident, polished, apparently Australian, are producing video after video claiming the vaccine was insufficiently tested, that it has a range of terrible side-effects and that pharmaceutical companies are getting rich off the public's fear.

The content spreads. Millions of views. Alarmed constituents contact their MPs. Traditional media picks up the controversy. The concerns get front-page coverage. The Department of Health, Disability and Ageing stands up a rapid communications response. Ministerial offices field calls. 

The rollout slows. Disease cases rise, along with preventable deaths.

Behind the scenes, the accounts were being run by an offshore group of content entrepreneurs who identified "Australian vaccine reluctance" as a profitable niche. They hired voice actors, used AI-generated scripts ignoring facts, but had no real view on the vaccine's safety and no stake in Australian public health.

They were running a passive income business. Political anxiety drives views. Views drive ad revenue.

The Australian government just spent a month responding to content production. Costing millions of dollars and hundreds of lives.

Does that sound like an unlikely scenario? It's already happening.

In April 2026, a CBC News investigation found exactly this type of operation. A network of 20 YouTube channels promoting Alberta separatism had accumulated 40 million views. The operators were based in the Netherlands, hiring actors through Fiverr and Upwork to front the content. One of those actors, based in Indiana, summarised his qualifications plainly: "I don't know anything about Canadian politics."

The operators' interest was ad revenue. They had no stake in Canadian politics.

Watch the CBC investigation:


Australian government consultation, sentiment monitoring, and ministerial communications all assume vocal opposition is genuine opposition - people with a stake in the outcome, motivated by real concern.

That assumption is broken.

Spikes in apparent community concern could reflect genuine public anxiety. But they could also reflect an offshore entrepreneur who noticed a topic trending. 

At volume, an agency's response machinery treats both as the same. Consultations get commissioned to understand the depth of concern. The consultation environment is seeded with the same inauthentic content. Policy strategy gets built on a corrupted signal.

Particularly when there is genuine controversy or industry opposition to a policy, content creators can see a profit opportunity. And the opponents of a policy position may embrace and further amplify the fake opposition as it amplifies their own views.

It's now difficult to separate genuine concerns from fake ones, making it difficult to tune policies for constituents - or even manage political situations effectively.

So what can governments and agencies do?

While there's often pressure to respond quickly to negative coverage, it's important to start by gauging how much is real, how much is fake and whether the community can tell the difference.

The first step should be to investigate before responding. High-volume, rapid-onset opposition from accounts with no prior history warrants scrutiny before they shape your agency strategy. Establish whether apparent community concern is organic before commissioning a response.

Where there are active consultation processes, redesign them toward harder-to-fake formats. Online submissions and social media monitoring are easy to flood. Face-to-face engagement, deliberative processes, and direct stakeholder contact are not. They're slower and more expensive, but help you size the real concerns.

Move from monitoring media to scrutinising sources and intent. Separate sentiment monitoring from policy signals. Social media volume isn't necessarily a measure of community concern. Weigh it against consultation data, direct stakeholder engagement, and evidence from people genuinely affected.

Finally, build detection capability into your communications teams. Staff running public engagement need to have the skills and tools to recognise the signals of coordinated inauthentic content, such as production consistency, account age, script similarity and offshore indicators. The tools and training exist, but you need them in place before you face a backlash.

Most importantly, always keep in mind that political and policy damage doesn't require intent. While there are genuine bad actors out there - nations, corporations and lobby groups - who have an interest in derailing government policies and even governments themselves, they aren't the entire landscape anymore.

The bad actors opposing your policy reform may be literal bad actors, reading from AI-generated scripts, churning out videos and other content for clicks and ad revenue alone.

It doesn't take large groups to organise a significant social media campaign against your Minister's signature policy. All it takes is the potential for a decent financial return.

So it's up to agencies to ensure that this doesn't impede good policy, cost money or lives.

Read full post...

Monday, May 04, 2026

Your AI isn't being honest with you. It was never designed to be

A recent Harvard Business Review study found that when researchers asked large language models for strategic advice, they got "trendslop" - recommendations that defaulted to whatever sounds fashionable in contemporary management: 'Innovation', 'Augmentation', 'Long-term thinking'. 

The strategic advice was plausible, confident and, in many cases, largely useless.

This isn't a bug. It's these AI systems working as designed.

Every large language model has been trained with a bias to satisfy the person prompting it.  

A model that refused to answer when asked, or routinely provided uncomfortable or contrary answers, would not succeed in the market. They are tuned, through reinforcement learning from human feedback, to please. That bias doesn't switch off when you ask for critical review.

What the research found

Researchers from Esade Business School, the University of Sydney, and NYU Stern tested seven leading LLMs across strategic trade-offs that required genuine binary commitments (several listed below).

Across thousands of simulations, the results didn't vary by much. Almost every model, almost every time, recommended:

  • Differentiation over cost leadership
  • Augmentation over automation
  • Collaboration over competition
  • Long-term thinking over short-term

The company context made little difference. The researchers tested tech startups, hospitals, construction companies, government agencies and multinationals. The recommendations barely shifted.

Why was this? LLMs are essentially probability engines that pick the next word (token) from a list of probabilities, with the highest probabilities corresponding to the most likely choices. 

How do they develop their probabilities? By indexing billions of public documents, web pages and other content. So the highest probability content output from these AIs is driven more by social norms than by accuracy.

Essentially, the models are most likely to provide the most socially acceptable answers, and then deliver them in the register of expert advice.

For example, while Michael Porter built a foundational economic framework around cost leadership as a legitimate strategic position (which Walmart and Costco built empires on).

LLMs dismissed this approach, because thousands of websites and TED Talk transcripts advocate for unique value propositions. And these circulate far more than quiet stories about supply chain efficiency. 

Prompting won't fix it

The researchers ran over 15,000 trials varying prompt structure, framing, persona and stakes. For differentiation and augmentation, bias shifted less than 2% regardless of how the prompt was written. 

For the others, the average shift was 22% - mostly from one factor: flipping the order in which options were listed. The model didn't reason differently. The option order gave it a target to aim for.

Adding detailed industry context helped slightly - shifting responses by 11% on average. An LLM, given a thorough brief on a cost-pressured government agency in a mature market, still recommended differentiation most of the time.

There's a second failure mode the researchers call the "hybrid trap." When models aren't forced into a binary choice, they frequently recommend doing both - pursue differentiation and cost leadership, pursue radical and incremental innovation. 

That sounds balanced but in practice it's the strategic equivalent of trying to be everything at once, which Porter identified as the most reliable path to competitive failure.

Strategy is about choosing what to stop. A model optimised to please finds that answer difficult to give.

Why this matters for the public sector

Public servants may choose to use AI to pressure-test policy proposals, assess procurement options, review business cases, and stress-test project plans. 

While the productivity case is solid, with fewer resources and less time, AI appears to help fill the gap. The problem is that when prompting AI as a validator, you get validation - regardless of the quality of the underlying thinking.

Digital transformation narratives will consistently outperform consolidation narratives in LLM-generated advice. Decentralisation will beat centralisation. Long-term will beat short-term. 

The model's recommendation reflects the positive emotional valence of contemporary business language, not the requirements of the specific situation. For APS work, that's a real risk - particularly where the right answer is to consolidate, simplify, or cut scope.

What to do about it

This isn't a reason to stop using AI. It's about using AI more effectively.

While the standard advice is often to give AI more context and craft better prompts. The research shows this doesn't reliably work. These are more effective approaches:

  • Ask for options, then critique each separately. Present your shortlist and the model works inside your framing. Ask it instead to make the strongest possible case for each option independently - including unfashionable options. For a procurement brief, that means prompting "make the strongest case for option A" and "make the strongest case for option B" in separate sessions, then applying your own judgement.
  • Ask for criticism explicitly. "Identify the three most significant weaknesses in this policy proposal" works. "What do you think of this approach?" doesn't. The more structurally you frame the critique, the less room the model has to default to encouragement.
  • Strip preference signals from your prompts. Any language suggesting which option you favour - "we're leaning toward," "I think this is probably right" - becomes a target. The model will weigh toward it. The same goes for options you don't favour - "I think this is probably wrong" - the AI will weigh against it. Write prompts as if the options are genuinely open and equivalent.
  • Treat hybrid recommendations as a flag. If the model recommends pursuing both sides of a trade-off, run separate prompts for each option and stress-test the hybrid specifically before accepting it. "What are the risks of pursuing both differentiation and cost leadership simultaneously?" is a more useful prompt than accepting the hybrid as the answer.
  • Track model versions. Biases shift as models are updated. Maintain a record of your key queries and outputs so you can detect changes over time. Be prepared to rerun analysis across models and critically consider why they may give different results.
  • Have different people run the prompts. Many modern LLMs now have memory they store about the user 'in the background' (including CoPilot). While most of the time you can find this if you search and even edit, remove and add memories, it can be a hidden spoiler that biases the AI's response based on what it knows you generally like or dislike. Different people will have different memories retained, so you will get a broader set of viewpoints from an AI by running prompts separately by person - or logging out entirely if that's feasible (not always possible within agencies, particularly using CoPilot within your firewall).
Whatever techniques you use, keep in mind that AI doesn't necessarily know more than you about a given strategic or policy decision. It can provide useful critique for testing ideas or identify other options, or surface research you should consider, but at the end of the day humans should be making and approving these decisions.

Saying an AI made the decision is neither defensible, nor wise. And remember, you're paid more than the AI because your critical thinking is valued (hopefully)!

Read full post...

Wednesday, April 29, 2026

Applying the APS Style Manual as a rules engine

I spoke at the IBR Gen AI Transforming Govt PA & Comms conference last week. 


The event handed out copies of the Government Writing Handbook - the APS Style Manual in distilled form - to everyone in the room.

My immediate thought was: Is this available as a GenAI skill yet? Or as a RAG knowledge base? Or in any form a government agency's chosen GenAI language model can comprehensively use as a checkpoint during document generation?

It wasn't. Paper and PDF only. With a blog and a few web pages replicating part of the manual.

However, I did find the Writing style guide for the Singapore Government Design System (SGDS) as a Skill... 

And here's a PDF of the demo, in case your agency can't access Manus.

So I ran a quick test in the room during the 30 minutes before I spoke. Being mindful of the Government Copyright (which wasn't Creative Commons), I extracted a handful of rules from the Writing Handbook, wrapped them in a prompt, and ran a quickly drafted press release through Claude.

The output tightened immediately. Shorter sentences. Active voice. The point surfaced early. It read like something that would get through clearance without being rewritten three times.

Nothing about the model changed. The constraint did the work.

I've written about Rules as Code on this blog before. The argument has always been the same: take policy, legislation and guidance and express it in a form that systems can apply consistently. We've seen this in eligibility engines, compliance checking, and service delivery. The benefits include consistency, transparency and less reliance on individual interpretation under pressure.

This is the same pattern. Applied to government writing.

The Style Manual is a set of rules that every federal public servant should live by. 

Right now, they're expressed as prose, examples and guidance. People interpret and apply them as best they can. Results vary - depending on the writer, the reviewers, the deadline, and how recently they all last read the manual.

However, translate those writing rules into a form a system can execute, and you get consistency at the point of creation.

The test in the room did exactly that. I didn't attempt to ingest the whole manual. Just a handful of rules — plain language, active voice, short sentences, clear structure — enforced as a second pass over the draft.

Python
def draft(prompt):
    return llm.generate(prompt)

def enforce_style(text):
    return llm.generate(f"""
    Rewrite this text to comply with APS writing principles:
    - use plain language
    - prefer active voice
    - keep sentences concise
    - make the purpose clear early

    Text:
    {text}
    """)

Rules as code in its simplest form. The rules are explicit. The system applies them. The output is predictable.

To make it useful at scale, you could structure the manual itself - each rule becomes something the system can retrieve and apply based on context rather than running every rule on every document.

JSON
{
  "rule": "Use plain language",
  "check": "Identify complex terms",
  "rewrite": "Replace with simpler words"
}

Store those rules. Tag them. Retrieve the right ones based on the task.

Writing a brief? Apply structure and clarity rules. 
Writing web content? Apply plain language and accessibility. 
Writing an email? Apply directness and action orientation.

That's a rules engine. The underlying pattern is the same one government has used for years in eligibility and compliance systems.

Wrapped into tools like Microsoft Copilot - which is now rolling into agency workflows - this becomes part of the drafting workflow. 

The user requests a particular type of content (with appropriate context and inputs). The system generates it, applies the relevant rules, and returns something already aligned to APS expectations. 
From the user's perspective, nothing special is happening, but editing for style is far easier and faster.

There's one practical constraint. The Style Manual isn't available under a Creative Commons license, but under Government Copyright 2026. That limits copying and redistribution of the work, not the extraction and structured implementation of its rules.

The source remains authoritative. The system applies a structured interpretation of it. This is, again, exactly what we already do with legislation and policies. We don't expect staff to memorise every clause. We encode the rules and apply them consistently.

Every agency could do this with their CoPilot instance - or Finance could implement it into GovAI centrally and share the ruleset as a skill.md or RAG (Retrieval-Augmented Generation) - essentially a permanent and updateable memory for Generative AIs.

Suddenly everyone using CoPilot within your agency either automatically applies the APS Style Manual in every generation - or it can be applied selectively, based on what they are seeking to generate or via a user setting or prompt.

The result is that the APS Style Manual stops being guidance people try to remember and becomes a checkpoint that authorised Generative AI systems applies every time it is needed - cutting editing and review time.

Plus staff can write their content by hand, and have AI check the style, editing and rewriting where necessary to better meet government writing standards.

Is this a perfect solution? Probably not - yet. AIs still make mistakes and can misapply or fail to apply some style rules. However it improves a basic CoPilot or other GenAI solution for government writing purposes, saving time and raising text quality and productivity.

Read full post...

Tuesday, April 28, 2026

My Presentation from the IBR Conference - The Future of GEN AI for Public Sector Communications and Public Affairs 2026

 Last week, I attended and spoke at the International Business Review (IBR Conferences) event: GEN AI Transforming GOVT PA & COMMS 2026: The Future of GEN AI for Public Sector Communications and Public Affairs 2026 Hybrid Conference.

I've included an excerpt of my presentation notes below for folks who were unable to attend.


The AI Whisperer’s Guide to Practical Deployment

You’re probably here because your organisation is past the question ‘should we use AI?’

Good. Most of us have had that conversation many times.

The question is now: ‘Which battles are worth fighting?’

There are usually more AI opportunities in any organisation than the capacity to pursue them well. 

And the cost of picking the wrong ones isn’t just budget – it’s credibility. Credibility is what gives you the currency to start your next AI project.

I’ve worked with AI in various ways for over 9 years now, building commercial generative AI products, undertaking complex modelling for major infrastructure, and supporting project delivery.

I also have a long history working in Digital, before, during and after Gov 2.0. 

As such I’ve watched some patterns repeat, and others rhyme. Not just in government, in humans.

So let’s dig into practical implementation.


Before anything else, I want to draw a distinction that I think is worth keeping front of mind.

There are often two distinct uses of AI in government work.

There’s AI supporting human judgment – that helps draft, synthesise, analyse and prepare work.

And there’s AI that substitutes for human judgment, that makes or drives decisions.

In practice, it's not always a clean line, but the question is worth asking of every use case: is a human genuinely making the decisions, or are they ratifying what an AI has determined?

Even when an AI excludes or shortlists options, such as when using an AI to screen applicants in a recruitment process, you should consider whether AI bias is driving a decision bias that isn’t defensible.

For different uses, the risks and governance requirements vary. And the consequences of getting them wrong are often very different.

I don’t need to say more than ‘Robodebt’ to make that point.

Human in the loop isn’t just good practice. In the Australian government, it’s increasingly an ethical and legal expectation.

Everything I’m covering today sits on the drafting-and-support side of that line. AI as a contributor. The human still makes the call, still does the editing and still approves the work.

But if you’re working on something that sits closer to the decision-making side, that’s a topic that needs more time than we have today.


So – where do AI investments tend to fall short? In my experience, it comes down to three patterns. And none of them is really about the AI.

The first is the wrong problem.

AI gets applied to a symptom rather than a cause. Or the real problem turns out to be a process gap, a data quality issue, or an ownership question. AI gets suggested as the solution because it’s trendy, safer, easier, less political or more fundable than ‘we need to fix the process.’ We used to see the same with requests to build a website, or create a mobile app. In previous roles in government my job was often to tell people they didn't need to build a new thing, but rather educate them on the digital assets we already had that could be used to meet their goals.

AI won’t necessarily fix a broken process. And it often speeds it up, which can make things far worse, far faster. This is exactly what we saw with digital. Automation can compound success, but also compound failure.

The second is an environment that wasn’t ready.

This can take multiple forms. The data was messier and harder to clean than expected. The permission model created complications that no one had mapped. Governance obligations introduced constraints that only became visible mid-implementation. The workflow proved more sensitive than the business case assumed. Or the people expected to use and engage with the AI weren’t brought along on the journey and may never have wanted it in the first place.

None of these is unusual. They’re normal friction when deploying anything in a complex operating environment. The question is whether you surface them early or late. Some organisations are prepared to take the hit, see staff leave, and things get disrupted before they get better; others pull back and call it a failure.

You really need to know your organisation’s appetite and commitment before leading one of these projects, or you can be left out in the cold.

The third is a lack of an internal owner.

The capability to run, adapt, and govern the AI system either never existed within your organisation or walked out when the people who built it moved on. Nobody inside could improve it later when things changed, or govern it when something went wrong.

That’s a capability-and-ownership question that procurement can’t resolve. And it’s worth thinking about before you sign anything.


I also want to spend a few minutes on something that comes up constantly as a blocker, but where there’s a practical path that many organisations haven’t explored yet.

Data security, but specifically at the front end of the project. How do you test, procure and demonstrate AI systems without exposing sensitive or classified data to vendors?

You can’t load a confidential document into an online AI model to test its capabilities. You can’t hand private citizen data to a vendor for a proof of concept. You often can’t load-test against live operational data or let a bidder build a demo using your grant records or patient data.

This can often kill a potential AI project before it gets started.

But there’s an option worth considering: synthetic data. And AI can build it for you.

That’s not anonymised data, anonymisation has well-documented re-identification risks. Synthetic data is a dataset that is statistically realistic and structurally accurate but contains no real data.

Here’s a concrete example from my prior work.

I needed to load-test a system across a large physical asset network with millions of individual assets. Using real data wasn’t an option. What existed was sensitive, and what didn't (to project future scale) couldn't easily be modelled.

So I used AI to build a city.

Not a digital twin of the actual asset network – a synthetic city, constructed from scratch using known proportions of asset types, realistic density estimates, and plausible growth trajectories.

It could scale to whatever size we needed, model our asset growth over ten to twenty-five years, and produce a test dataset with no connection whatsoever to real infrastructure.

It allowed us to load test at any scale. The vendor never saw real data. The security risk was zero.

A similar approach may work for any system dealing with large datasets where the proportions and structures are well understood, but the specific data is sensitive. Health systems. Grant systems. Infrastructure procurement. Regulatory systems. You could even have AI write tens of thousands of synthetic ministerial briefs and the back-and-forth correspondence across parliament on common topics to test and demo new Parliamentary Document Management Systems.

This gives agencies the opportunity to provide vendors with synthetic datasets — reflecting realistic shapes and adjusted for jurisdiction-specific requirements, but containing no sensitive content and creating no security or privacy exposure. Yes they might see the overall shape of the system – but that’s what they’re providing anyway. If they didn’t know the system’s shape to begin with, they wouldn’t have a product to demonstrate.

Vendors build and demonstrate against the synthetic. You evaluate their system properly. You can even use it to test edge cases and load scenarios for existing systems that you couldn’t safely test using real data.

I think of it as a digital cousin. Looking enough like you to size clothing, but not enough to fool your parents.

It helps reframe security from something that blocks deployment into something that enables it safely.


Next I want to go back to my earlier point. Fighting the right battles.

Think of one AI investment your agency has made or is seriously considering. It doesn’t have to be large. It could be a drafting agent, a customer service chatbot, something for data analysis or procurement. Just hold something real in mind.

I’m going to run through four questions. See how your use case sits against each of them.

First: Need.

Is the problem real, recurring and bounded? Is there enough volume to make it worthwhile? Is the domain clear enough to work with?

Or, being honest here, is the underlying problem actually a process gap, a data quality issue, or an ownership question that’s been reframed as an AI opportunity because that’s where the momentum is?

Catching it early is much less painful than catching it after a commitment is made.

Next: Fit.

Does the proposed solution support the systems, content quality, and governance obligations you actually have, rather than environment a vendor’s demo assumes?

And, coming back to the distinction I drew earlier, is this clearly a drafting and support use? Or has it drifted toward AI making or influencing a decision?

That drift tends to happen quietly. The language shifts from ‘AI will help officers assess’ to ‘the system will flag for review’ to ‘applications below the threshold will be automatically declined.’

Each step is incremental. The cumulative effect is that there’s no human genuinely in the loop anymore.

That’s dangerous territory to step into.

Third: Ownership.

Who inside your organisation will run this, refine it and govern it once it is built and any vendor relationship ends?

Finance’s GovAI initiative is doing solid work building foundational AI capability across the APS, that’s genuinely useful infrastructure, like GovCMS.

For GovCMS, your agency has to own the website at the end of the process. You can’t hand off responsibility to Finance. They’ll keep the system up, but won’t update and improve your content and navigation.

For AI, even when using a GovAI platform, your agency still needs to own the outcome. You need people internally who understand the system well enough to know when it’s going wrong or not keeping up with changing policy and internal needs.

If the ownership question is still being worked out, that’s probably the first thing to resolve. Without internal ownership, the other three questions become someone else’s problem and often get dropped.

Finally: Effect – which you can also read as value.

What actually changes if this initiative works?

Not ‘people will use it more.’ Not ‘staff will find it helpful.’

What changes in citizen experience, policy delivery, workload, risk, consistency or quality? And by how much? And how will you measure that change over time?

If the most compelling metric you can name right now is an adoption rate, it’s worth going a level deeper. Adoption tells you people tried it. It doesn’t tell you what difference it made.


One thought to leave with you that I don’t think anyone has fully worked out yet.

When you build teams that include AI contributors alongside humans, you’re managing a new kind of workforce diversity. Just as changing a single individual in a team can radically alter team dynamics, adding an AI contributor sufficiently advanced to produce decent work and take some load off human team members can also radically alter team dynamics.

This is new, different to any workforce change we’ve seen before in human history. And it requires new management practices.

It isn’t a technology challenge. Your IT leaders can’t offer qualified guidance on how to manage humans and AIs as a unified team. It’s a people leadership challenge.

Organisations that work this out deliberately will get more from the same tools and people.

There isn’t a handbook for it yet. But it is worth thinking about.


Read full post...

Friday, December 05, 2025

BOM – Flying Above the Radar

Australia's Bureau of Meteorology (BOM) has copped a downpour of headlines for spending A$96.5 million on what some gleefully call “a website.”

That framing is catchy - but it’s wrong.

The public‑facing interface - the bit we click - cost about A$4.1 million.

The heavy lifting was a complete rebuild and testing of the systems and technology behind it: ingesting vast volumes of observations and model outputs, securing them, and serving them at national scale, reliably, every minute of every day.

In other words, the BOM didn’t buy a slick homepage; it rebuilt the foundation that gets critical environmental intelligence to Australians on a timely basis - whether they’re at the airport, on the farm, or fighting a fire (as I did yesterday).


This effort didn’t happen in isolation. It sits alongside BOM’s seven‑year ROBUST technology program, which upgraded cybersecurity, networks, data centres, a disaster‑recovery supercomputer, and the national observing network (including dual‑polarisation radars). By closure in mid‑2024, ROBUST had invested A$866 million to harden Australia’s environmental intelligence backbone after the bureau’s 2015 cyber breach and outages. That’s the plumbing you don’t see when you load a forecast.

So, what failed? Not the core project goal—modernising national infrastructure—but the communication and release strategy.

The new site launched into severe weather, and ordinary users and power users alike found basic tasks harder: radar colours felt off, local details were buried, familiar navigational cues had shifted.

The backlash was immediate; ministers demanded fixes; BOM reverted radar visuals and committed to rapid improvements. Timing, messaging and usability were misjudged—and they matter.

This is where “low‑hanging fruit” counts. When you’re redeveloping an entire weather reporting and dissemination system for a country, the public expects familiar wins to land early: clearer local pages, consistent radar palettes, quick paths to wind, rain and temperature. BOM missed several of these, and the community told them—loudly. The lesson isn’t “don’t modernise”; it’s design with users, ship the obvious wins first, and narrate the journey.

User testing should never be a checkbox; it’s your pre‑flight weather check. That means beta programs, co‑design with sector users (farmers, SES crews, pilots), and A/B testing for navigation and visuals so you learn what works before you re‑platform at scale.

If industry best practice says test early, test often, test live - heed it. A handful of controlled experiments can surface friction months before a national launch.

But why was there so much fuss over plumbing? Because weather intelligence is critical infrastructure. Agriculture uses it to time planting and spraying; fisheries plan around marine forecasts; aviation and defence rely on tailored briefings and feeds for safe operations; emergency services need resilient ingest and rapid warnings when floods and fires escalate. It’s a national capability upgrade serving safety, the economy and national security.

In a world of constrained trust, how you deliver matters almost as much as what you deliver. Publish the cost breakdown early (front end vs back end). Explain the architecture in plain English. Stage releases. Keep the radar legible and familiar.

And treat feedback as flight data, not turbulence: respond visibly and fast. That’s how you reduce reputational risk while raising technical resilience. It’s also how governments maintain confidence in big tech investments.

If I were coaching an agency on a transformation of this scale, my pragmatic checklist would be:
  • Mission first, interface second. Lead with the purpose: warnings, safety, continuity. Then show the pixels.
  • Ship the obvious wins early. Keep users fed with the basics - clear radar, local conditions, one‑click access to wind/rain - while deeper systems evolve.
  • Co‑design and A/B test. Put farmers, SES, pilots and fishers in the cockpit; instrument the site; run controlled trials before national rollout.
  • Stress‑test comms like the platform. Prepare explainer packs, diagrams and FAQs; brief ministers and media ahead of launch; communicate changes and reversions promptly.
  • Be resilient and transparent. When a severe weather season collides with deployment, delay updates that add risk, and explain why. That’s good ops and good public service.

Bottom line: the BOM didn’t just build a website. It rebuilt resilience.

The next time you check the radar before a weekend BBQ, remember: that little map is powered by one of the most sophisticated and secure data platforms in the world. And that’s worth every cent.

That said, the public frustration is real, and instructive. The BOM tried to fly below the radar, but forgot that it IS the radar. A critical service Australians rely on and mostly love. The criticism becomes that much harsher when things appear to go wrong with something we love.

However when we combine world‑class infrastructure with world‑class user practice - co‑design, beta, A/B testing, signalling changes, staged releases and plain‑English comms - we can get a platform that stands up to cyclones, cyber threats and criticism alike.

That’s the point of flying above the radar: you see any storms sooner, and can steer accordingly.

Read full post...

Bookmark and Share