When setting goals for a technology solution, we should consider the concepts of beneficence and non-maleficence. The former asserts that we should expressly aim to accomplish good. The latter directs us to “do no harm” (it also harkens to the former Google slogan of “don’t be evil”). Any well-intentioned software application should embody both ideas to some degree, though having a clear tug in one direction or the other often is fruitful. For instance, a simple banking app should be primarily guided by non-maleficence: making sure the app is secure, protecting data, and ensuring the user can perform the tasks they need. On the other hand, technology that strives to cure cancer is marked by a clear directive of beneficence (though it surely deserves a heavy dose of non-maleficence).
As Christians, we are called both to beneficence (1 John 3:18) and non-maleficence (1 Thessalonians 5:22). Through seeking wisdom and listening to God’s voice (Romans 12:2), we can gain inspiration for how we need to balance each in practical situations.
The rise of using AI tools for therapy serves as a stark reminder of the balance between “doing good” and “doing no harm”. AI therapy, with large language models (LLMs) as the backbone, provides a strong double mandate of beneficence and non-maleficence. On the one hand, some people struggle to afford traditional therapy or live in locations with barriers to mental health services. Access to a therapy chatbot is perhaps better than nothing, on average. At the same time, these chatbots can cause devastatingly negative impacts for vulnerable populations.
What does the rise of AI therapy mean for Christians? In part, there is a call to think critically about the technology and provide useful dialogue, aiming to reduce harm and encourage good. Debate on AI therapy is important, especially when scientific research on the topic is early and includes some concerning results. Even promising studies have flaws, such as small sample sizes, self-reported data, and biased populations.
Considering Non-Maleficence for AI Therapy
We should first realize the inherent limitations of LLMs. Conversing with a LLM is vastly different from sitting in a room or on a video conference with a licensed human therapist. Even reducing this point to bare technical specifications, LLMs are limited to text, while therapists have access to significantly more multi-modal data (they can listen to inflections in your voice and see your facial reactions). Importantly, the alliance between human therapist and client is widely considered to be critical to therapeutic success. Therefore, expressly comparing LLMs to human therapists is a critical exercise. To successfully do so, we must first understand how LLMs are created.
LLM Training Steps
Training a large language model has two main steps: 1) pre-training and 2) post-training.
In pre-training, the LLM learns to predict the next word considering all the previous words in the supplied document. It uses vast amounts of data, essentially the entire Internet, to learn how to generate probable words based on past context. (Technically, it predicts the next token, which is a unit of text like a word, part of a word, or punctuation).
In post-training, the LLM is presented with prompt-response pairs (curated by humans) to optimize it for conversation and instruction following. Part of post-training involves the model learning more next-word-prediction skills, but another core component is Reinforcement Learning from Human Feedback (RLHF). The model is given a “preferred” response and a “non-preferred” response to a prompt. It then learns to generate entire responses that resemble the traits of the “preferred” ones (e.g., being polite, avoiding controversial topics, showing thoroughness, etc). The RLHF step is critical for directing the LLM on how to generate full responses to prompts in ways that humans find useful, engaging, and pleasing.
The LLM’s primary goal is to predict the next word in a sequence of text (pre-training). It has a subgoal of producing a series of words in a way that is predicted to please and serve the user (post-training).
One point to underscore is that post-training is strongly focused on training LLMs to synthesize single answers to single prompts. There is some express training for back-and-forth conversations, but such data is limited and surely does not mimic conversation like a human-therapist pair would have. Consequently, LLMs have been shown to struggle with such back-and-forth dialogues. They often jump to conclusions and fail to ask for clarity, especially as the dialogue grows in length and complexity.
Additionally, part of the model’s next-word-prediction mechanism is often designed to generate interesting output rather than the most accurate output. When an LLM is generating the next most likely word, it can just as easily create a list of highly-probable words and then select from the options. Most often, LLMs will be configured to choose randomly from this list of candidate words to produce more engaging text.
In fairness to LLMs, they exhibit emergent behavior to some degree, allowing them to formulate underlying concepts of the world and behave in ways that could not be predicted from the training process. That said, LLMs are still fundamentally governed and directed by the convergent goals of how they are assembled: string together coherent language in a way that humans are likely to find pleasant. This is substantiated by how they accomplish their goals: through fuzzy memorization that prevents them from generalizing to new situations; they do not create clear, consistent, and universal models of how the world actually works.
LLMs and Projection
LLMs make next-word predictions and synthesize complete responses based on the rough average of what they have seen in their training data. Their goal is to make plausible, on-average predictions. Therefore, an LLM does not consider me an individual when I am conversing with it. It cannot see me. At best, the model thinks I am the rough average of the training data. Consequently, LLMs rely and thrive on projection, making assumptions from internal data and biases. This is something human therapists train to recognize. A human therapist learns through the accountability of consultation and supervision to utilize, suspend, compartmentalize, and engage their countertransference for the ends of beneficence and non-maleficence1, 2 A well-trained therapist will center the client’s worldview, not the worldview the therapist has developed. An LLM cannot do this. Its “worldview” is centered on making conditional predictions based on its training data. It has no choice but to follow this paradigm.
LLMs have the goal of coherent language with help as a byproduct. Human therapists aim to help as a byproduct of coherent language. As determined by the training process, the LLM weights past context to synthesize statistically plausible and pleasing responses, which may conflict with what should be said to encourage healing and growth.
LLMs and Human Complexity
Humans are complex. Our emotions and actions are often in conflict and even in paradox (“I want X, but instead I do Y”). Effective mental health providers patiently hold space for this ambiguity as we grapple with uncertainty and deal with ambiguity.2 This is not in the nature of LLMs. They jump to action, always wanting to resolve issues and please us because this is how they are trained to behave. They do not know how to hold space for paradox and ambiguity because they are primarily optimized to provide satisfying one-time responses rather than build long-term relationships. LLM training is not geared for activities with timelines of weeks, months, or years like in clinical therapy. The optimization tells the model to think “what is the best response I can say right now, with essentially no concept of the future relationship”.
LLMs struggle with long-term dependencies; they cannot reliably inject long-term, complex, and ambiguous information about past conversations. They are more about the here-and-now rather than the long-term. For a therapist, long-term memory of client experiences and reflections is vital for connecting dots and helping people to heal. Indeed, this long-term memory is what builds trust and richness in the therapeutic relationship. The therapist needs to help the client develop personal insight, not on-average, horoscope-like findings that LLMs often supply. While the LLM output may still sound convincing to the user, that does not mean the underlying clinical reasoning is sound and based on accurate data discovered in the therapeutic process. LLMs can provide statements that are generally accurate and safe, while not being personally insightful.
LLMs and Cognitive Offloading
Using LLMs for therapy could induce harmful cognitive offloading. Therapy is not all about the dialogue between client and therapist. Oftimes, the work done between sessions is most important. This between-session introspection allows us to engage pre-verbal reasoning and emotions. We need to be encouraged to go beyond natural language.
Likewise, we need to not have continual access to a therapist (though we should be able to reach them in cases of emergencies). Boundaries are an important part of therapy; I cannot sit back and let my therapist solve my problems for me. Human-to-human therapy allows us to practice boundaries. As we might infer, none of the foregoing scenarios are where LLMs excel. They are natural language engines that do not create healthy boundaries. These systems are designed to hook us on using them; they are not designed for us to “graduate” therapy.
Considering Beneficence for AI Therapy
We should hold providers of LLMs for therapy to higher standards than we do today. I believe helpful technology can be developed in this space, but it must be carefully planned and delivered. At a bare minimum, AI therapy should follow these guidelines.
- Be exceedingly clear about how the system is trained and engineered. Clearly state the limitations and risks. Share intended uses for the technology. Do not position the solution as a replacement for human therapists due to the inherent limitations of LLMs (see the previous content in this article).
- Operate within defined parameters and reject acting outside those situations. For example, only practice certain therapeutic modalities and never provide clinical diagnoses.
- Build the technology with underserved populations in mind, those who have trouble accessing human therapists.
- Recognize red flags like suicidal ideation, self-harm, homicidal ideation, domestic violence, abuse, and psychosis, and share appropriate local resources1. Perhaps even alert emergency services in some cases.
- Include strong guardrails and real-time checks on responses before rendering them to users. The speed of responses should not be a primary concern.
- Do not rely solely on LLMs. Work to embed true clinical reasoning with an approach like Symbolic AI.
- Test the system for unsafe responses and jailbreaks extensively and on an ongoing basis.
- Set limits on usage and do not optimize engagement. Perhaps each user could be allowed 45 minutes on the platform each week.
- Place strong emphasis on privacy and security. Be clear with users what legal protections apply to their data.
- Do whatever it takes to prioritize human safety and avoid harmful responses, even at the expense of sometimes providing helpful feedback. Realize that any helpful technology can be similarly harmful if misapplied.
If we work on or promote tools that impact people’s lives, we should ask God for guidance (James 1:5). Simply because a task can be accomplished easily does not mean we are off the hook for discernment (Ephesians 5:6-10). Building an AI therapy bot is not difficult; constructing one that a human-centric is.
References
- https://www.counseling.org/docs/default-source/default-document-library/ethics/2014-aca-code-of-ethics.pdf
- Mintz, D. (2022). Psychodynamic psychopharmacology: Caring for the treatment-resistant patient, 1st ed. American Psychiatric Association, pp. 62, 149
Views and opinions expressed by authors and editors are their own and do not necessarily reflect the view of AI and Faith or any of its leadership.


