By now, most people have likely used generative AI for fun, research, or work: generating images, writing content, reviewing code, or getting answers to questions. Generative AI is not new, but the public exposure from ChatGPT and other Large Language Models (LLMs) has changed expectations overnight for creating, retrieving, and summarizing content. Following the White House's recent fact sheet on promoting responsible AI, those supporting State, Local, or Federal government organizations are starting to think about how to safely and effectively leverage technology like Generative AI.
What Is Generative AI?
Generative AI is powered by large language models (LLMs), which are prediction algorithms. Using machine learning, these models are trained on enormously large sets of data to predict the next output following a series of inputs. The next output could be a word, sentence, image, or other content. The goal of these models is to mimic how humans approach language.
For example, look at these words:
"it's raining cats and ____"
Because this is a common colloquial phrase, most people would know the next most probable word in the sequence is "dogs." In the same manner, generative AI uses LLMs to deliver complete, complex probabilistic predictions on the next word in a sentence or pixel in an image. Generative AI can produce not only the next word (or pixel) in a sequence, but it can also deliver complete responses to a query or other input.
As these models continue to scale – thanks to larger datasets and better infrastructure – the complexity, accuracy, and "human likeness" of the responses are becoming more robust. Today, generative AI solutions are writing content, creating new songs, and passing complicated standardized tests like the Bar Exam, LSAT, and GRE.
Commercial organizations are already using generative AI to build more immersive digital experiences, reduce time spent producing or maintaining content or code, and service customers in more cost-effective manners. These organizations are exploring a variety of use cases for incorporating LLMs.
Common corporate use cases for generative AI include:
Creating new text, image, or video content
Summarizing and styling content
Reviewing and editing content, code, or images
Supporting question and answer dialogues
What can generative AI do for government organizations?
Like commercial organizations, government organizations are facing increased expectations of their digital experiences from their constituents and beneficiaries. And, just like corporate customers, these users want to find, engage with, and trust federally approved information across government websites and applications.
Although most would argue generative AI is not even close to its full maturity, the most recent releases of generative AI technologies have shifted consumers' experiences, namely in the area of asking questions and receiving robust, direct answers. Corporate or government, these consumers now expect to receive aggregated and summarized information with a structure and style that makes it both easy to consume and feels "human-like".
Chatbots and other conversational AI solutions have been the status quo for fielding consumer questions for years, and many State and Federal organizations have already implemented these solutions.
During the COVID-19 pandemic alone almost three-quarters of states launched a chatbot for facilitating resolution of COVID-19 related inquiries. Technologies like generative AI can accelerate legacy chatbots – which are both difficult to manage and difficult to have a conversation with – into the future.
But without the proper safeguards, generative and conversational AI can also come with risks.
The Dangers of AI in Government
"The best model was truthful on 58% of questions. Models generate many false answers that mimic popular misconceptions and have the potential to deceive humans." (TruthfulQA: Measuring How Models Mimic Human Falsehoods)
When it comes to delivering authoritative, accurate information, generative AI models have had some trouble. On average, generative models produce accurate responses 25% of the time. These models are extremely good at piecing together information from across the web to create convincing, but oftentimes, wrong answers to users' questions.
This is because LLMs are built on probabilities: they do not technically "know" anything. This causes what the AI industry refers to as "hallucinations", or confident responses by generative AI models that are not accurate and not aligned with the model's training. This could include misrepresenting information, or even pure fabrication of information the system is not scoped to respond to.
In addition to the fabrications in AI-generated content, there are other issues that have caused a series of difficulties and odd responses during the roll-out of use cases:
The sources LLMs draw from are often from the open web; as such, even a response that is "correct" might not be complete, up-to-date, or the best answer for a particular organization.
What each model deems as "important" might not align with your priorities - when building your site content you're usually assessing engagement, freshness, and other factors before returning information back to a user.
The data used to train these models predominantly comes from the open web. As such, informal and colloquial language from sources like social media can guide the structure and syntax these models use in formulating their responses.
These concerns, along with risks around content governance can make generative AI seem like a technology that doesn't have a role in government. But used properly and with the appropriate safeguards, generative AI can add capacity to government organizations by producing and serving information to their constituents – at scale.
Methods for Safeguarding Generative AI
Although generative AI is not likely to create the scenarios in movies like the Matrix or iRobot, there have already been a series of concerning conversations recorded between users and various conversational AI applications.
To safeguard against some of the "weirdness" that can ensue from unfettered conversations, and more importantly, protect users from inaccurate, misleading, or non-authoritative information, the following approaches are being used to enforce stronger governance over generative AI:
A Curated Knowledge Base
One of the biggest issues with the Generative AI technologies available today is where they draw their knowledge from.
As agencies explore various Generative AI use cases, they will have to consider solutions that enable each organization to "gate" the sources of data or content that the models leverage. This practice will ensure that users receive information that is accurate and relevant to their query.
Organizations have invested resources into building out robust government content management systems. Generative AI should be a tool that works with this existing approved and structured data, rather than relying on information from the web.
Clear Citations and Sources
It's important that users know that their information is coming from verified, reputable sources. Even when you've fully curated the sources a generative model can pull from, users still may want to reference back to the source information, or dive into these sources themselves. Generative models also have a tendency to "generate" citations, regardless of whether the source information exists or not.
To promote transparency and encourage navigation of other resources and content, government organizations should ensure that conversational AI solutions leveraging generative AI include detailed citations, as well as a direct pathway to the content location. This enables users to further browse through or engage with that content.
Diverse Large Language Models (LLMs)
Different large language models have various strengths, biases, and optimal use cases. Relying on a single model can limit the ability for a particular solution to effectively service all users, and can also put that solution at risk of an outage or other issue with that particular model.
For example, the following LLMs each have specific purposes as defined by the creators:
Regardless of the use case and desired outputs, government organizations should consider leveraging multiplier LLMs to ensure a more reliable solution, as well as more tailored outputs.
Representative and Appropriate Training Data
As mentioned above, LLMs formulate their responses based on training data available to them. This is referred to as "AI alignment". Most LLMs are pre-trained on a large, all-encompassing corpus of data that may or may not be relevant to your organization.
If training data comes from public sources, like social media, a conversational AI solution could return answers in a language or tone that does not align with the organization's standards and policies.
As government organizations explore utilizing LLMs for conversational use cases and performing fine tuning, they should ensure that the data used to train the model – both initially and on an ongoing basis – aligns with the tone and type of language the organization seeks to convey to its customers.
Detailed Conversational Logs
Like any solution that interacts with customers, an organization should know how and why users are utilizing it – and if they are getting the outputs they expect and need. This information is important for audits and traceability, and also to improve responses and content over time.
Considerations for Government When Using LLMs for Conversational and Generative AI
As government organizations explore use cases and solutions for conversational and generative AI, the following considerations can help guide their assessment of different technologies and approaches:
Linkages to Content: Does the user know where the provided response was sourced from and can they easily navigate to that content?
Response Rationale: Can the user easily understand or infer why they received the response provided?
Knowledge Scope: Are there limitations on where the solution can draw answers and information from?
Multi-Model Approach: Is the solution utilizing a single model or a single call to a models API, or is the response drawn from multiple sources?
Model Training: What data sets were used to train the models?
Conversation Logs: Is there the ability to understand how users interacted with the solution? Does the LLM own the content and responses produced by the solution?
Personalization: Are the responses of the solution tailored to the user and/or the context of the conversation?
Model Improvement: Can an admin influence or override the responses that the solution provides to correct for errors or match to organizational priorities?
For more on the challenges and benefits of AI in the federal government, register for the virtual event with featured guest Emily Vose, partner at IBM. For specific examples of how state governments can safely take advantage of AI-enabled products, download the ebook.
Bringing AI and Natural Language to State Government
Bringing AI and Natural Language to State Government
Exceed public users' expectations with this step-by-step guide on how state governments can deliver a better citizen experience.