There has much discussion of the effectiveness of AA in recent weeks and months.
In order to address this complicated area of studying the effectiveness of a group which does not lend itself easily to being studied we will refer to cite and choose excerpts from an excellent study on AA from a number of years ago ( ).
This is one of the most definitive studies of how and why AA works.
“Research on the effectiveness of Alcoholics Anonymous (AA) is controversial and is subject to widely divergent interpretations. The goal of this paper is to provide a focused review of the literature on AA effectiveness that will allow readers to judge the evidence for AA effectiveness themselves.
The review organizes the research on AA effectiveness according to six criterion required for establishing causation: (1) magnitude of effect; (2) dose response effect; (3) consistent effect; (4) temporally accurate effects; (5) specific effects; (6) plausibility.
The evidence for criteria 1, 2, 3, 4 and 6 is very strong: Rates of abstinence are about twice as high among those who attend AA (criteria 1, magnitude); higher levels of attendance are related to higher rates of abstinence (criteria 2, dose-response); these relationships are found for different samples and follow-up periods (criteria 3, consistency); prior AA attendance is predictive of subsequent abstinence (criteria 4, temporal); and mechanisms of action predicted by theories of behavior change are present in AA (criteria 6, plausibility).
The Cochrane Group  review recommended that people considering attending AA or a twelve step facilation (TSF) should be made aware that there is a lack of experimental evidence on the effectiveness of such programs . This is despite optimal outcomes for TSF at 1 and 3 years for outpatients in the Project MATCH trial [2, 3].
At the other end of the spectrum, 12-step scholar Rudy Moos has recommended that referral agencies should consider referring people to AA first, rather than to treatment first. This is based on his own observational studies which have found that longer duration of AA attendance is associated with less drinking at 8 and 16 years , and that those who attend AA before attending treatment tend to attend AA longer than those who attend treatment first .
Prior efforts to summarize the findings on AA effectiveness have included literature reviews [6, 7] and meta analyses [8–10]. The most recent meta analysis  concluded that attending AA led to worse outcomes than no treatment at all. An earlier meta analysis focusing on moderating effects found that the evidence for AA effectiveness was stronger in outpatient samples, and that poorer quality studies (based on volunteers, self-selection rather than random assignment, no corroboration of self-report, etc.) somewhat inflated the case for AA effectiveness .
A review summarizing the state of the literature 7 years later  argued that there was a consistent, rigorous body of evidence supporting AA effectiveness. Again, there seems to be something for everybody, and the literature really does seem to be widely subject to interpretation. This may stem from the criterion being used to judge effectiveness.
At the heart of the debate is the quality of the evidence. Their concern is well-founded. As will be evident from this review, experimental studies represent the weakest of the available evidence.
However, the review also will highlight other categories of evidence that are overwhelmingly convincing with respect to AA effectiveness, including the consistency with established mechanisms of behavior change.
This review will organize the research on AA effectiveness according to six formal criterion for establishing causation , which should help readers to integrate the sometimes conflicting conclusions discussed above.
These criterion were first introduced to assist policymakers evaluate the totality of the evidence of a causal effect for smoking on lung cancer in the absence of experimental data.
[13, 14]. The criterion offer a framework for judging the “totality” of the evidence [12 p.191], implicitly acknowledging that the evidence may not be strong for all criteria, and leaving the final decision to the individual evaluator. These are the criterion:
The relationship between an exposure (here, exposure to AA) and the outcome (abstinence, as AA does not recommend any drinking for alcoholics) must be strong. According to this criteria, weak relationships between AA and abstinence would not be as convincing of causality as strong ones
There should be a dose-response relationship, such that more involvement in AA relates to higher levels of abstinence.
The consistency of the association matters. If some studies find a strong relationship between number of AA meetings attended and rate of abstinence, but many do not, this would call into question whether the dose-response relationship should be trusted, as evidence goes.
The timing of the purported influence must be correct. This means that the measurement of AA exposure must be prior to the period of abstinence that is being studied; otherwise, it could mean that abstinent people tend to go to AA, rather than AA causing people to be abstinent.
The specificity of the association must be demonstrated. One must be able to rule out other explanations than AA exposure for having led to abstinence. This addresses the concern that those who attend AA are a select sample who would be sober anyway, without ever going to AA. For example, if those who attend AA are highly motivated to do something about their drinking, it could be that this motivation is the cause of their abstinence; it would be unfair to credit AA for their successful outcome. Evidence of specificity ideally requires experimental manipulation of exposure to AA. For example, individuals in a study might be randomized to attend AA or to attend psychotherapy; they do not select their treatment. Because of randomization, motivated people would end up being randomized both to psychotherapy and to AA, so it would not be the case that the “deck was stacked” in favor of AA. If those randomized to attend AA were more likely than those randomized to psychotherapy to be abstinent 2 years later, this would demonstrate an effect specific to AA that could not be due to a selection bias in which only motivated people attend AA.
Coherence with existing knowledge is needed to establish causation. The notion of theoretical plausibility is suggested as a way of addressing coherence with existing knowledge; that is, are the mechanisms of action that explain behavior change present in AA? Several theories and different aspects of AA exposure will be considered in addressing this final criterion.
Criterion 1 – strength of association
. As shown in Figure 1, which draws on a longitudinal study of male inpatients in Veterans Administration programs, rates of abstinence are about twice as high for those who attended a 12-step group such as AA following treatment… The rates of abstinence were about twice as high among those who had attended AA or another 12-step group (but no other form of aftercare).
Criterion 2 – dose response relationship
. Do higher levels of AA attendance or involvement relate to higher levels of abstinence? There is evidence of a dose response relationship for number of 12-step meetings (Figure 2a), frequency of 12-step meetings (Figure 2b), and duration of AA meeting attendance (Figure 2c).
Again studying male residential patients in the VA system, and considering AA meeting attendance for the 90 days prior to the 1-year follow-up, the dose response curve looks almost linear (Figure 2a), with more 12-step meetings associated with higher rates of alcohol and drug abstinence 
In a smaller outpatient sample, over 70% of those attending 12-step groups weekly for the 6 months prior to the 2-year follow-up were alcohol abstainers, while alcohol abstinence rates among those attending less than weekly were the same as those who never attended during that period ; this suggests a threshold dose-response effect for weekly attendance at 12-step groups (Figure 2b).
In a longitudinal study of previously untreated problem drinkers, 70% of those with 27 weeks or more of sustained AA meeting attendance any given year (whether at year 1, at years 2–3, or at years 4–8) were abstinent from alcohol at the 16-year follow-up ; those with shorter duration of attendance had lower rates of abstinence, with the dose response most evident for AA attendance years 1 and years 4–8 (figure 2c). This study is the reason for Moos’ recommendation to send people to AA first, because those who went to AA first were more likely to be involved in AA for longer duration .
Criterion 3 – consistency of association
The similarities in abstinence rates between the weekly or near-weekly AA attenders (70%) in these two latter studies with different populations and follow-up periods is relevant to this criteria, consistency of association.
Another example is shown in Figure 3, which presents rates of abstinence for those who attended AA but no other treatment (third bar, labeled ‘AA only’), in two different samples (VA inpatients, and previously untreated problem drinkers in the general population), with different follow-up periods (1, 3, and 8 years).
The 1-year study considered alcohol and drug abstinence as a function of 12-step group attendance, while the 3- and 8-year data focused specifically on AA attendance and alcohol abstinence. About 50% of those who had attended AA/12-step meetings only were abstinent at 1 year  and at 3 and 8 years ; and about one-fifth of those who did not attend AA/12-step meetings or treatment were abstinent at the parallel follow-up interviews.
Another study of the general population  found that individuals with lifetime alcohol dependence who went to 12-step meetings but no formal treatment were more likely to be abstinent than those who did nothing (not shown).
Criterion 4- temporally correct association
Moos’ work that studied 16-year alcohol abstinence in a previously untreated problem drinking sample as a function of AA during years 2–3 and years 4–8  (Figure 2c) and met the 4th criterion for evidence of causality.
Criterion 5 – specificity
We will return to this issue in the conclusion as it concerns the mixed results among experimental evidence that has considered evidence of specificity. It is here that methodological differences play a role in clouding the results.
Criterion 6, coherence with existing knowledge
To evaluate the literature on AA effectiveness according to this criterion, theoretical plausibility will be discussed; that is, does AA work in a way that is consistent with major theoretical perspectives on health behavior and behavior change?
For example, a recent interpretation of contemporary psychodynamic theory has characterized alcoholism as an interaction between one’s abilities to express feelings and self-regulate one’s behavior . The theory argues that despite low self esteem, many alcoholics have a narcissistic personality  and a sense of omnipotence. They drink to self-medicate, as a way of addressing unmet needs and uncomfortable psychological states.
AA solutions consistent with this characterization of the problem are evident at meetings, in the AA steps, and through people in the AA fellowship.
Meetings provide an opportunity to share one’s own struggles (and learn how to talk about one’s feelings), to increase one’s motivation to abstain, and to get outside of one’s self (and change one’s mood) by hearing others talk about their problems and how AA helped them.
The steps help with self-governance, narcissism and omnipotence: accepting powerlessness over alcohol (step 1); recognizing that one cannot do it alone (but that a higher power, which can be operationalized as the AA group, is there to help; steps 2–3); realizing how one’s behavior affected and affects others (step 4–9); treating other people better (step 10); finding meaning in life (step 11); and relinquishing one’s negative self-focus by helping others (step 12). Through the people in AA, one learns how to live a sober life, and how to regulate one’s behavior one day at a time.
Bandura’s social learning theory  adds to the psychodynamic perspective (noting the problem) of social influences and from self-efficacy: if everyone around you drinks, and if you don’t think it is within your ability to not drink, you will be unable to abstain. The antidote includes changing environmental cues (such as staying away from bars), role modeling (seeing others succeed at not drinking), and self-efficacy (believing you can abstain).
AA meetings, and spending time with people in AA, represent changes in environmental cues; that is, you’re not at a bar, seeing alcohol and seeing people drink alcohol, when you’re at a meeting or out with AA friends.
At an AA meeting, you are exposed to successful role models, instead of current drinkers, who suggest a new approach to abstinence: not drinking 1 day at a time (instead of saying you are “quitting forever”). Seeing yourself able to abstain for one day begins to build self-efficacy, which accumulates with the passage of every sober day.
Spending time at AA meetings and with people in AA also leads to relapse prevention mechanisms put forward by standard behavioral modification techniques. These include learning how to say no to a drink when offered, having a plan of action when confronted with likely drinking conditions, and choosing alternative behaviors to take the place of drinking.
Several studies offer empirical support for these mechanisms. The positive relationship between AA involvement and abstinence has been shown to be partially mediated (explained) by (a) psychological and spiritual mechanisms including finding meaning in life , greater motivation for abstinence , and changes in religious beliefs and spiritual experiences ; (b) social influences such as fewer pro-drinking influences , more friends in general , having AA friends supportive of abstinence , and enhanced friendship networks ; and (c) social learning and behavioral mechanisms including improved self-efficacy [31, 37] and effective coping and relapse prevention skills [34, 36] to abstain. These mechanisms (and theories) are inter-related. For example, AA friends represent a particularly effective source of social support, because they provide expertise in preventing relapse.
The goal was not to provide an exhaustive review of the evidence, but rather to present representative studies that address AA effectiveness according to six accepted criterion for establishing scientific causation. This framework may be especially appropriate for considering AA effectiveness, because it acknowledges the value and limitations of experimental evidence in the context of other criterion for determining treatment effectiveness.
As stated at the outset, the experimental evidence for AA effectiveness (addressing specificity) is the weakest among the six criteria considered crucial for establishing causation. Only two studies provided strong proof of a specific AA or TSF effect: the outpatient arm of Project MATCH (with effects at 1 and 3 years) [2, 3], and the intensive referral condition in Timko’s trial (with effects for abstinence at 6 months and 1 year) . The effect sizes were similar, with the TSF/Intensive referral conditions having a 5-10% advantage in abstinence rates. It is noteworthy that neither of these studies attempted to randomize patients to AA per se; instead, they focused on interventions intended to facilitate AA involvement.
One reason that several of the other trials may not have found positive effects for AA/ TSF is because many individuals randomized to the non-AA/non-TSF conditions also attended AA; thus, the AA or TSF condition ended up being compared to a condition consisting of an alternative treatment plus AA. This was the case in Walsh’s hospital inpatient treatment vs. AA study  and in the aftercare arm of Project MATCH , and arose because the patients in the non-AA/non-TSF conditions also had attended 12-step-based inpatient treatment, which in turn engendered strong participation in AA. Thus, AA attendance levels were high in the inpatient hospital condition in the former study, and in the CBT and MET conditions among the Project MATCH aftercare subjects. In fact, CBT and MET aftercare patients attended more meetings than the TSF outpatients, and the aftercare patients overall attended twice the number of meetings at every follow-up compared to the outpatients [22, see pp.191–192].
There are other concerns with the Brandsma trial  which call its experimental results into question. The control condition allowed for participation in actual AA meetings, while those in the AA condition attended a weekly AA-like meeting administered by the study (that was not an actual AA meeting). The description of the AA condition states that the steps were used for discussion content, the group focused on newcomers, and they told patients about sponsors [25, p.34], but it is not clear whether the meetings were led by AA members, whether crosstalk was allowed, whether the meeting leader shared their story as part of the meeting, or whether the meeting format was what one would encounter at an actual AA meeting. The meetings may not have been open to other AA members in the community, and not been listed in the AA meeting directory, which would mean that a potentially important therapeutic ingredient of AA–the experience of longer-term members–would not have been present in the AA condition. This is of special concern because the control condition did allow for attendance at such meetings.
Given these challenges in conducting rigorous randomized trials of AA effectiveness, researchers have turned to statistical methods to address the selection bias associated with AA attendance in observational studies. These efforts are intended to address criteria 5, specificity of the AA effect. The goal with these methods is to statistically adjust for study participants’ likelihood or propensity to attend AA, prior to evaluating AA’s impact on subsequent drinking.
One approach, used in two studies of AA effectiveness, is an econometric method using so-called “instrumental variables” to parse-out AA attendance.
Using different instrumental variables (perceived seriousness of drinking, and having a coping style tending towards information-seeking solutions), another study  found that AA’s impact on heavy drinking was significant and doubled in magnitude after correcting for the instrumental variables. A third study  adjusted for baseline motivation and psychopathology as potential confounders, and found that those with more AA involvement at 1 year had fewer alcohol problems at the 2-year follow-up interview. Another statistical study of selection bias, now under review, used Propensity Scores to adjust for study participants’ propensity to attend AA , and found that the odds of abstinence associated with AA attendance were reduced, but remained significant, after adjusting for individuals’ propensity to attend AA. The method allowed investigators to study whether the selection bias operationalized by the Propensity Scores varied based on whether an individual had a low versus a high propensity to attend AA. Among those with a high propensity to attend AA, AA’s effect was minimal (e.g., OR=1.3); however, among those with a lower propensity to attend AA, the odds of abstinence associated with AA attendance were significant and of considerable magnitude, ranging from 2.7 to 6.9.
What, then, is the scorecard for AA effectiveness in terms of specificity? Among the rigorous experimental studies, there were two positive findings for AA effectiveness, one null finding, and one negative finding. Among those that statistically addressed selection bias, there were two contradictory findings, and two studies that reported significant effects for AA after adjusting for potential confounders such as motivation to change.
Readers must judge for themselves whether their interpretation of these results, on balance, supports a recommendation that there is no experimental evidence of AA effectiveness (as put forward by the Cochrane review).
As for the scorecard for the other criteria, the evidence for AA effectiveness is quite strong: Rates of abstinence are about twice as high among those who attend AA (criteria 1, magnitude); higher levels of attendance are related to higher rates of abstinence (criteria 2, dose-response); these relationships are found for different samples and follow-up periods (criteria 3, consistency); prior AA attendance is predictive of subsequent abstinence (criteria 4, temporal); and mechanisms of action predicted by theories of behavior change are evident at AA meetings and through the AA steps and fellowship (criteria 6, plausibility).”