“Dear Program Director”: An Analysis of Gender Bias in Internal Medicine Letters of Recommendation from 2009 and 2019

Abstract

Purpose

The majority of United States Internal Medicine (IM) programs use letters of recommendation (LOR) as part of their holistic review of applicants in the residency selection process.  It is important to determine if there are gender differences in IM LOR for frequency of agentic (e.g., assertive, confident) and communal (e.g., compassionate, kind) descriptors.

Methods

The authors retrospectively reviewed LOR from University of Utah IM matched applicants in 2009 and 2019.  Text analysis was used to determine agentic and communal descriptor frequency and compared by applicant gender, letter writer gender and year with ANOVAs.

Results

Letter writers used more communal terms in men applicants’ LOR relative to women applicants’ LOR in 2009, F(1,158)=9.80,P=0.001,np2=0.06 and there was more communal presence in women writers’ LOR relative to men writers’ in 2009, F(1,158)=8.97,P=0.003,np2=0.04, which did not persist in 2019. Agentic terms were used more often in 2019 relative to 2009, F(1,383)=4.49,P=0.035,np2=0.01, as were communal terms, F(1,383)=28.07,P < 0.001,np2=0.07. 

Conclusion

It is unclear how the equivalent use of descriptors in LOR impacts women in the residency selection process.  Further research is needed to understand how IM residency programs use LOR for resident selection and how this could be impacted by gender bias. 

Keywords: bias; residents; letters of recommendation; gender; text analysis

Introduction

According to the 2018 National Resident Matching Program (NRMP), United States Internal Medicine (IM) program director survey, letters of recommendation (LOR) are used by 74% of program directors in selecting applicants to interview.1  Recently, the Invitational Conference on United States Medical Licensing Examination Scoring called for a more holistic review of applicants, which takes information regarding an applicant’s attributes and future potential, such as those mentioned in LOR, into account. The goal of holistic review is to address biases and avoid over-reliance on USMLE step scores.2  Yet, if elements of the application like LOR are biased themselves, using them may not help address bias in the residency selection process.  Given the potential for implicit bias, some specialties have transitioned to a standardized letter of recommendation (SLOR), which includes standard evaluative and comparative data in an effort to reduce gender-based differences in LOR.3,4  To date, no studies have investigated how the content of LOR may vary by gender of IM applicants.  Understanding if there is gender bias in LOR and whether this has changed over time is essential for the fair assessment of students’ abilities during the IM residency selection process.

One way to evaluate for gender differences in LOR has been to look at how agentic and communal terms describe applicants. Agentic descriptors (i.e., assertive, confident, dominant, aggressive) are more often associated with men as a man’s social role has historically been to be strong, dominant, and self-reliant.5 Contrastingly, women are often described in communal terms (i.e., affectionate, kind, compassionate) that focus on the welfare of others because traditionally, a primary role of women is to care for others.5,6

The study of communal and agentic descriptors within academic medicine has demonstrated varying differences between how men and women are described, particularly how standout adjectives and agentic terms are used.  Results from recent studies of LOR for urology and general surgery residency as well as transplant surgery fellowship concluded that standout adjectives and/or agentic terms were used to describe men applicants more often than women applicants, who were more likely to be described using communal terms.7–9  This is in contrast to LOR for women applicants applying to a general surgery residency program which were more likely to contain standout adjectives relative to men applicants, but overall, the LOR contained similar descriptors for men and women.10  In radiology, LOR for women were more likely to include agentic descriptors than LOR for men.11 

Given the paucity of data in IM, we studied LOR for men and women residents accepted to our IM residency program in 2009 and 2019.  We chose to evaluate LOR over a 10-year span to determine if growing recognition of gender bias12 has impacted the language used to describe applicants.  Based on prior research showing gender bias in LOR7–9,13, we hypothesized that IM residency program LOR for men applicants relative to women applicants would contain more agentic terms, but that the difference would decline from 2009 to 2019 with increasing awareness of gender bias. 

Methods

Participants and Design

This was a text analysis study of LOR from categorical and preliminary applicants who matched at the University of Utah IM residency program in 2009 and 2019.  We analyzed LOR entered in the Electronic Residency Application Service (ERAS) for all categorical and preliminary IM applicants who matched at the University of Utah in 2009 and 2019.  The LORs for matched applicants in 2009 and 2019 represented all Association of American Medical Colleges Group on Education Affairs United States geographic regions- 35% (135) were from the central region, 27% (106) were from the western region, 21% (82) were form the southern region, 6% (22) were from the northeast region and 11% (42) were for international graduates.  For each LOR, the gender of the letter writer, gender of the applicant, length of letter as measured by word count, and year of application were recorded by a research assistant who was unaware of the study hypotheses. If there were multiple letter writers, only the gender of the first letter writer listed was included as it was assumed this was the primary author. While the authors acknowledge that gender is not a binary construct, because ERAS only allows applicants to select from one of two options, we used a binary approach to assign gender in this study.  Since ERAS does not require letter writers to identify their gender when submitting a LOR, we assigned the gender of each letter writer based on a name’s historical association with a man or woman.  If the gender was not apparent, an Internet search of the faculty was conducted.   In this paper, we use the terms man and woman to refer to the residency applicants and letter writers because we are exploring the potential effect of gender bias, rather than sex, on applicants.14  LOR were missing for one applicant in 2009. 

Letter of Recommendation Analysis

A communal and agentic dictionary of terms was created based upon previously defined lists of agentic and communal words in LOR for surgery and radiology applicants.9,11 In addition, we used the software program R (v2.6.2)15 to capture the 20 most frequent terms that were used to describe our Internal Medicine residency applicants in 2009 and 2019 and reviewed these terms for inclusion as agentic or communal descriptors. Two researchers independently reviewed these terms and categorized each as communal or agentic and then met to review and resolve differences.  Through this process we identified 18 additional terms that were used frequently in internal medicine LOR of applicants applying to our institution.  Our initial set of terms included 63 agentic terms and 52 communal terms. (Appendix)

To ensure accuracy and appropriate use of context, three researchers were each assigned one third of the terms and performed a manual review of the LOR in the de-identified file to ensure the agentic and communal terms were used in the appropriate context.  A term was deemed appropriate if it was a direct descriptor of the applicant or a description of an applicant’s attributes or skills in the past, present, or future.  If the term did not meet the preceding criteria, it was removed and not included in the final analysis.  For example, the term “aggressive” used in the context of “he handled a situation with an aggressive patient” was excluded from the analysis as it referenced an attribute of a patient and not the applicant.  The three researchers calibrated their shared definition for appropriate use before reviewing their assigned terms.  If a researcher had a question regarding whether a term met the defined criteria, it was discussed amongst all three researchers for agreement.  The data was corrected based on manual review before any analyses. 

Statistical analysis

Frequencies and percentages were computed for demographic variables and average word count was computed for each LOR and compared between by year with the Mann Whitney U test.  An average agentic percentage and average communal percentage were computed by determining if each agentic and communal term was present or absent in a LOR, respectively.  The total presence counts were averaged across all agentic and communal words, respectively for each LOR.

To determine if the presence of each word type varied by applicant gender and application year, 2 (applicant gender: woman, man) x 2 (year: 2009, 2019) ANOVAs were run on agentic presence averages and communal presence averages.  To determine if presence of each word type varied by the letter writer’s gender, 2 (letter writer gender: woman, man) x 2 (applicant gender: woman, man) ANOVAs were run on agentic presence averages and communal presence averages in 2009 and in 2019.  ANOVAs were run for 2009 and 2019 due to small Ns of women letter writers for each level of applicant gender.   This study was deemed exempt by the University of Utah Spence Eccles Fox School of Medicine Institutional Review Board. 

Results

A total of 387 LOR were analyzed: 146 LOR (58 women and 88 men applicants) from 2009 and 241 LOR (104 women and 137 men applicants) from 2019.  Table 1 provides letter writer gender, number of applicants, average number of LOR per applicant, and average word count per LOR.  The average word count per LOR increased by 102 words (CI 58-146) from 2009 to 2019, P < 0.001.

After manual review, the percentage of agentic terms used in the appropriate context was 91% (4,062/4484) and the percentage of communal terms used in the appropriate context was 79% (1551/1972) for an overall accuracy rate of 87% (5613/6456). 

Table 2 provides the average presence of agentic and communal terms by applicant gender, letter writer gender, and year.  There was 2% more communal presence (CI 0.005–3.0%) in men applicants’ LOR relative to women applicants’ LOR in 2009,  F(1,158) = 9.80, P = 0.001, np2 = 0.06, and 2% more communal presence (CI 0.4-4%) in women writers’ LOR  relative to men writers’ LOR in 2009.  There was 1% more agentic presence (CI 0.2-3%) in 2019 relative to 2009, F(1,383) = 4.49, P = 0.035, np2 = 0.01, and 2% more communal presence (CI 1-3%) in 2019 relative to 2009, F(1,383) = 28.07, P < 0.001, np2 = 0.07. 

Discussion

To date, this is the first study using text analysis to examine potential gender differences in LOR for IM residency applicants.  Overall, there was an increase in the presence of agentic and communal terms used in LOR from 2009 to 2019, irrespective of gender.  There was also an increase in word count for LORs from 2009 to 2019.   We found no difference in the presence of agentic terms by gender in 2009 or 2019.  Finally, communal terms were used more often to describe men applicants in 2009, in comparison to women applicants and, more specifically, these terms were used more often by women letter writers to describe men applicants, in comparison to men letter writers.  There was no difference in the use of communal terms to describe men and women applicants in 2019.

Our results show that over the last 10 years IM applicants’ LOR have increased in length and have more agentic and communal descriptors, irrespective of applicant gender.  It is unclear what accounts for the increased use of agentic and communal terms.  While we did not look at faculty rank in association with frequency of use of terms, Grimm et al. found that junior faculty were more likely to use agentic and communal terms.11  It is possible that with a rise in the number of hospitalist providers over the last 20 years, whose median age is 41,16 more junior faculty are being asked to write LOR as students frequently interact with these providers on their inpatient clerkship and sub-internship rotations.  Without clear guidelines for the structure and content of LOR in IM, junior faculty, with fewer years of experience and less comparative performance data of residents, may be more likely to use agentic and communal descriptors for all applicants. 

We found that communal terms were used more often by women letter writers to describe men applicants in 2009, a finding that was not seen in 2019 LOR.  This is in contrast to prior work evaluating medical student performance evaluations (MSPE), which found that women authors used less “positive emotion” words to describe men students.17  However, studies looking at LOR for radiology and general surgery applicants found that women LOR writers are more likely to use agentic and communal terms than men letter writers, irrespective of applicant gender. 10,11  Given our findings differ from surgical-based specialties, this raises the possibility of specialty-specific differences in the use and value placed on agentic and communal terms used in LOR. This may be dependent upon the percentage of women practicing in a specialty, as agentic traits are valued in male dominated fields and communal traits are valued in female dominated fields, and these values may change over time as more women enter a specialty.18,19  These differences highlight that use of agentic and communal descriptors may depend upon specialty and local or institutional cultural norms.  Further research is needed to explore how letter writers’ demographics (i.e., location, rank, age, gender) and institutional gender bias training impact LOR. 

It should be noted that our manual review highlighted a lower accuracy rate for communal terms relative to agentic terms using the text analysis approach.  Other studies assessing communal and agentic terms in residency LOR do not comment on the context of communal and agentic term use and if it pertains directly to the applicant.7–9,11,20,21 Future studies should comment on whether communal terms are describing the applicant or features of something or someone else, like the patient.  

The Invitational Conference on USMLE Scoring called for a holistic approach in the review of residency applicants to best identify applicants who align with individual residency program’s strengths and guiding principles.2  One of the goals of holistic review is to address potential biases, including gender bias.  Given the potential for implicit bias in the narrative LOR, Emergency Medicine (EM) residency programs implemented a SLOR in 1997 to improve resident selection based on evaluative and comparative data.3    Other residency programs, mainly the surgical subspecialties, have followed suit.  SLOR contain standard evaluative and comparative data in addition to a short narrative component, and, in several studies, they have shown little to no gender-based differences in comparison to narrative LOR.4,20,21  In May, 2020, the Alliance for Academic Internal Medicine (AAIM) released recommendations for the Department of Medicine summary letters to follow a format similar to SLOR in order for program directors to have “standardized, objective data to facilitate holistic review.”22 It is unclear what impact the recommended SLOR format will have on the use of agentic and communal descriptors for candidates.

Several limitations of this study should be considered.  First, it is a single-center study with a small data set consisting of LOR from applicants who matched at our program. Second, while LOR represent one resource in the residency selection process, we only analyzed LOR of applicants who matched into our residency program.  Excluding non-matched applicants may have inherently created bias and limited the generalization of the results to IM matched applicants.  We purposely limited the sample to matched IM applicants because a sample of unmatched and matched applicants would have had a broader range of abilities making it more difficult to know if the results were really due to gender bias or performance characteristics.  Third, much of the prior work on gender bias in letters of recommendation has relied on Linguistic Inquiry and Word Count, which has a predefined dictionary, whereas we created a dictionary of terms from several recent sources which may not be exhaustive of terms; our manual review however allowed for confirmation that terms were used in the context of the applicant.  In addition, while the frequency of agentic and communal terms has been widely used in studies to evaluate for gender bias, it only represents two linguistic domains; thus, it is possible that our methods were insufficient to detect all possible types of bias.6,8–11  Finally, we describe the differences and trends in agentic and communal word use in LOR over time but in this retrospective study we were unable to ascertain how these words were interpreted and acted on by the reader.

Conclusion

While we found no difference in the frequency of agentic terms used to describe women and men applicants, the presence of agentic and communal terms in LOR has increased significantly from 2009 to 2019.  Despite the lack of gender difference in the frequency of agentic terms, it is unclear if equivalent use affects women in the residency selection process as prior studies have shown that women are less likely to get male stereotyped jobs if their qualifications are equivalent to their male counterparts.23  Further research is needed to understand how IM residency programs use LOR for the residency selection process and how this could be impacted by gender bias.  In addition, institutions should review narrative components of the residency application like the LOR to determine if bias exists to better focus faculty development efforts for letter writers and those reviewing LOR to make residency selection decisions.



Return to Table of Contents: 2022 Journal of the Academy of Health Sciences: A Pre-Print Repository

“Dear Program Director”: An Analysis of Gender Bias in Internal Medicine Letters of Recommendation from 2009 and 2019 by Katie Lappé, MD, Sonja Raaum, MD, FACP, Mariah Sakaeda, MS, Candace Chow, PhD, Caroline Milne, MD & Jorie Colbert-Getz, PhD, MS