Let's continue our discussion of studies in Holman et al. 2019 "Evidence of Bias in Standard Evaluations of Teaching" listed as "finding bias". See here for the first entry in the series and here for other entries.

---

16.

Huston 2006 "Race and Gender Bias in Higher Education: Could Faculty Course Evaluations Impede Further Progress toward Parity" is a review that, as far as I can tell, does not report novel data on unfair sex or race bias in student evaluations of teaching.

Sandler 1991 "Women Faculty at Work in the Classroom: Or, Why It Still Hurts To Be a Woman in Labor" is a review/essay-type of publication.

---

17.

Miles and House 2015 "The Tail Wagging the Dog; An Overdue Examination of Student Teaching Evaluations" [sic for the semicolon] reported on an analysis of student evaluations from a southwestern university College of Business, with 30,571 cases from 2011 through 2013 for 255 professors across 1,057 courses with class sizes from 10 to 190. The mean rating for the 774 male-instructed courses did not statistically differ from the mean rating for the 279 female-instructed courses (p=0.33), but Table 7 indicates that the 136 male-instructed large required courses had a higher mean rating than the 30 female-instructed large required courses (p=0.01). I don't see results reported for a gender difference in small courses.

For what it's worth, page 121 incorrectly notes that scores from male-instructed courses range from 4.96 to 4.26; the 4.96 should be 4.20 based on the lower bound of 4.196 in Table 4. Moreover, Hypothesis 6 is described as regarding a gender difference for "medium and large sections of required classes" (p. 119) but the results are for "large sections of required classes" (p. 122, 123) and the discussion of Hypothesis 6 included elective courses (p. 119), so it's not clear why medium classes and elective courses weren't included in the Table 7 analysis.

---

18.

Martin 2016 "Gender, Teaching Evaluations, and Professional Success in Political Science" reports on publicly available student evaluations for undergraduate political science courses from a southern R1 university from 2011 through 2014 and a western R1 university from 2007 through 2013. Results for the items, on a five-point scale, indicated little gender difference in small classes of 10 students, a mean male instructor rating 0.1 and 0.2 points higher than the mean female instructor rating for classes of 100, and a mean male instructor rating 0.5 points higher than the mean female instructor rating for classes of 200 or 400.

The statistical models had predictors only for instructor gender, class size, and an interaction term of instructor gender and class size. No analysis was reported that assessed whether ratings could be accounted for by plausible alternate explanations such as course or faculty performance.

---

Comments are open if you disagree, but I don't think that any of these three studies report a novel test for unfair sex or race bias in student evaluations of teaching using a research design with internal validity, with internal validity referring to an analysis that adequately addresses plausible alternate explanations. The interaction of instructor gender and class size that appeared in Miles and House 2015 and Martin 2016 appears to be worth further consideration in a research design that adequately addresses plausible alternate explanations.

Tagged with: , ,

Let's continue our discussion of studies in Holman et al. 2019 "Evidence of Bias in Standard Evaluations of Teaching" listed as "finding bias". See here for the first entry in the series and here for other entries.

---

13.

Smith and Hawkins 2011 "Examining Student Evaluations of Black College Faculty: Does Race Matter" analyzed "undergraduate student ratings data for tenure-track faculty who used the 36-item student evaluation form adapted by the college" (p. 152), over a three-year period for the College of Education at a southeastern research university. Mean ratings from low to high were for Black faculty, White faculty, and nonwhite nonblack faculty.

No analysis was reported that assessed whether ratings on the items could be explained by plausible alternate explanations such as course or faculty performance.

---

14.

Reid 2010 "The Role of Perceived Race and Gender in the Evaluation of College Teaching on RateMyProfessors.com" reported on RateMyProfessors data for faculty at the 25 highest ranked liberal arts colleges. Table 3 indicated that the mean overall quality ratings by race were: White (3.89), Other (3.88), Latino (3.87), Asian (3.75), and Black (3.48). Table 4 indicated that the mean overall quality ratings by gender were: male (3.87) and female (3.86).

No analysis was reported that assessed whether ratings on the overall quality item or the more specific items could be explained by plausible alternate explanations such as faculty department, course, or faculty performance.

---

15.

Subtirelu 2015 "'She Does Have an Accent but…': Race and Language Ideology in Students' Evaluations of Mathematics Instructors on RateMyProfessors.com" reported that an analysis of data on RateMyProfessors indicated that "instructors with Chinese or Korean last names were rated significantly lower in Clarity and Helpfulness" than instructors with "US last names", that "RMP users commented on the language of their 'Asian' instructors frequently but were nearly entirely silent about the language of instructors with common US last names", and that "RMP users tended to withhold extreme positive evaluation from instructors who have Chinese or Korean last names, although this was frequently lavished on instructors with US last names" (pp. 55-56).

Discussing the question of whether this is unfair bias, Subtirelu 2015 indicated that "...a consensus about whether an instructor has 'legitimate' problems with his or her speech...would have to draw on some ideological framework of expectations for what or whose language will be legitimized [that] would almost certainly serve the interests of some by constructing their language as 'without problems' or 'normal'...while marginalizing others by constructing their language as 'containing problems' or 'being abnormal'" (p. 56).

In that spirit, I'll refrain from classifying as "containing problems" the difference in ratings that Subtirelu 2015 detected.

---

Comments are open if you disagree, but I don't think that any of these three studies report a novel test for unfair sex or race bias in student evaluations of teaching using a research design with internal validity, with internal validity referring to an analysis that adequately addresses plausible alternate explanations.

Tagged with: , ,

Let's continue our discussion of studies in Holman et al. 2019 "Evidence of Bias in Standard Evaluations of Teaching" listed as "finding bias". See here for the first entry in the series and here for other entries.

---

10.

Huston 2006 "Race and Gender Bias in Higher Education: Could Faculty Course Evaluations Impede Further Progress toward Parity" is a review that, as far as I can tell, does not report novel data on unfair sex or race bias in student evaluations of teaching.

---

11.

Baldwin and Blattner 2003 "Guarding Against Potential Bias in Student Evaluations: What Every Faculty Member Needs to Know" doesn't report novel data about unfair bias in student evaluations. I don't know why that publication would be classified under "academic articles, book chapters, and working papers finding bias".

---

12.

Pittman 2010 "Race and Gender Oppression in the Classroom: The Experiences of Women Faculty of Color with White Male Students" discusses interviews with 17 nonwhite female faculty. Here is a sample from one of the interviews, from page 190:

Now I can't prove that these are racial events, OK. But I have some supposition that they may be racially motivated...the occurrence of...white males...much more predominantly white males, are coming into my class and questioning my expertise...whereas I don't believe, and I can't prove this, but I don't believe that they go into their chemistry class and challenge their chemistry white male,...now that may be gender as well as race. Because I just don't think that they'd go to some of their other classes and question or challenge their professors in ways that I've been questioned or challenged.

This study doesn't provide much evidence of unfair bias in student evaluations, although the article does note that "Several women faculty of color talked about low course evaluation ratings from race- and gender-privileged students and expressed their fear of how these might affect their departmental merit reviews" (p.191).

---

Comments are open if you disagree, but I don't think that any of these three studies report a novel test for unfair sex or race bias in student evaluations of teaching using a research design with internal validity.

Tagged with: , ,

Let's continue our discussion of studies in Holman et al. 2019 "Evidence of Bias in Standard Evaluations of Teaching" listed as "finding bias". See here for the first entry in the series and here for other entries.

---

7.

Spooren et al. 2013 "On the validity of student evaluation of teaching: The state of the art" is a review that, as far as I can tell, does not report novel data on unfair sex or race bias in student evaluations of teaching.

---

8.

Laube et al. 2007 "The impact of gender on the evaluation of teaching: What we know and what we can do" is a review that, as far as I can tell, does not report novel data on unfair sex or race bias in student evaluations of teaching.

---

9.

Stark and Freishtat 2014 "An evaluation of course evaluations" is a discussion that, as far as I can tell, does not report novel data on unfair sex or race bias in student evaluations of teaching.

---

Comments are open if you disagree, but I don't think that any of these three studies report a novel test for unfair sex or race bias in student evaluations of teaching using a research design with internal validity. I think that these publications would be more appropriate in a separate section of Holman et al. 2019 "Evidence of Bias in Standard Evaluations of Teaching" instead of in their list of academic articles, book chapters, and working papers finding bias.

Tagged with: , ,

Let's continue our discussion of studies in Holman et al. 2019 "Evidence of Bias in Standard Evaluations of Teaching" listed as "finding bias". See here for the first entry in the series and here for other entries.

---

4.

El-Alayli et al. 2018 "Dancing backwards in high heels: Female professors experience more work demands and special favor requests, particularly from academically entitled students" does not present novel evidence about bias in student evaluations of teaching. Instead: "The current research examined the extra burdens experienced by female professors in academia in the form of receiving more work demands from their students" (p. 145).

---

5.

Holman et al. 2019 "Evidence of Bias in Standard Evaluations of Teaching" lists as "finding bias" Hessler et al. 2018 "Availability of cookies during an academic course session affects evaluation of teaching". I'm not sure why this study is included in a list that one of the Holman et al. 2019 coauthors described as a "list of 76 articles demonstrating gender and/or racial bias in student evaluations". The Hessler et al. 2018 experimental design focused on the provision or non-provision of cookies; the study also had variation in which Teacher A handled 10 groups of students and Teacher B handled the other 10 groups of students, but the p-value was 0.514 for this variation in teacher in the Table 3 regression predicting the summation score.

---

6.

The Holman et al. 2019 "Evidence of Bias in Standard Evaluations of Teaching" list doesn't provide a summary for Uttl et al. 2017 "Meta-analysis of faculty's teaching effectiveness: Student evaluation of teaching ratings and student learning are not related", so I'm not sure why this study is included in a list that one of the Holman et al. 2019 coauthors described as a "list of 76 articles demonstrating gender and/or racial bias in student evaluations".

For what it's worth, I don't know that student evaluations of teaching being uncorrelated with learning is much of a problem, unless student evaluations of teaching are used as a measure of student learning. For example, if an instructor received a low score on an item asking about the instructor's availability outside of class because the instructor is not available outside of class, then I don't see why responses to that instructor availability item would need to be correlated with student learning in order to be a valid measure of the instructor's availability outside of class.

---

Comments are open if you disagree, but I don't think that any of these three studies report a novel test for unfair sex or race bias in student evaluations of teaching using a research design with internal validity.

Tagged with: , ,

My prior post on Holman et al. 2019 "Evidence of Bias in Standard Evaluations of Teaching" indicated that:

I think there would be value in a version of "Evidence of Bias in Standard Evaluations of Teaching" that accurately summarizes each study that has tested for unfair bias in student evaluations of teaching using a research design with internal validity and plausibly sufficient statistical power, especially if each summary were coupled with a justification of why the study provides credible evidence about unfair bias in student evaluations of teaching.

Pursuant to a discussion with Holman et al. 2019 co-author Dr. Rebecca Kreitzer, I thought that it might be a good idea for me to occasionally read and discuss a study that Holman et al. has categorized as finding bias.

---

1.

I have already posted about Peterson et al. 2019 "Mitigating gender bias in student evaluations of teaching". Holman et al. 2019 includes that article in the list of academic articles, book chapters, and working papers finding bias, so let's start there...

I do not perceive how the results in Peterson et al. 2019 can be read as finding bias. Feel free to read the article yourself or to read the Holman et al. 2019 summary of the article. Peterson et al. 2019 indicates that their results "indicate that a relatively simple intervention in language can potentially mitigate gender bias in student evaluation of teaching", but their research design does not permit an inference that bias was present among students the control group.

---

2.

Given that I am familiar with the brilliance research discussed in this Slate Star Codex post, let's move on to Storage et al. 2016 "The frequency of 'brilliant' and 'genius' in teaching evaluations predicts the representation of women and African Americans across fields", which reported evidence of a difference found in RateMyProfessors data:

Across the 18 fields in our analysis, "brilliant" was used in a 1.81:1 male:female ratio and "genius" in a 3.10:1 ratio...In contrast, we found little evidence of gender bias in use of "excellent" and "amazing" in online evaluations, with male:female ratios of 1.08:1 and 0.91:1, respectively.

But is the male/female imbalance in the frequency of "brilliant" and "genius" an unfair bias? One alternate explanation is that male instructors are more likely than female instructors to be in fields in which students use "brilliant" and "genius" in RateMyProfessors comments; that pattern appears in Storage et al. 2016 Figure 2. Another alternate explanation is that a higher percentage of male instructors than female instructors are "brilliant" and "genius"; for what it's worth, my analysis here indicates that male test-takers are disproportionately at the highest scores on the SAT-Math test, even accounting for the higher number of female SAT test-takers.

It's certainly possible that, accounting for these and other plausible alternate explanations, student comments are unfairly more likely to refer to male instructors than female instructors as "brilliant" and "genius". But it's not clear that the Storage et al. 2016 analysis permits such an inference of unfair bias.

From what I can tell, the main implication of research on bias in student evaluations of teaching concerns whether student evaluations of teaching should be used in employment decisions. Data from Storage et al. 2016 are from RateMyProfessors, so another hurdle for anyone properly using Storage et al. 2016 for the purpose of undercutting the use of student evaluations of teaching in employment decisions is producing a plausible argument that the "brilliant" and "genius" pattern in RateMyProfessors comments are representative of comments on student evaluations conducted by a college or university that are used in employment decisions.

Another hurdle is establishing that any instructor's employment would be nontrivially affected by a less-frequent-than-deserved use of "brilliant" and "genius" in student evaluation comments conducted by a college or university or on the RateMyProfessors site.

---

3.

Let's move on to another publication that Holman et al. 2019 has listed as finding bias: Piatak and Mohr 2019 "More gender bias in academia? Examining the influence of gender and formalization on student worker rule following".

It's not clear to me why an article reporting on a study of "student worker rule following" should be included in a list of "Evidence of Bias in Standard Evaluations of Teaching".

---

Comments are open if you disagree, but I don't see anything in Peterson et al. 2019 or Storage et al. 2016 or Piatak and Mohr 2019 that indicates a test for unfair bias in student evaluations of teaching using a research design with internal validity: from what I can tell, Peterson et al. 2019 had no test for unfair bias, Storage et al. 2016 did not address plausible alternate explanations, and Piatak and Mohr 2019 isn't even about student evaluations of teaching.

Tagged with: , ,

"Evidence of Bias in Standard Evaluations of Teaching" (Mirya Holman, Ellen Key, and Rebecca Kreitzer, 2019) has been cited as evidence of bias in student evaluations of teaching.

I am familiar with Mitchell and Martin 2018, so let's check how that study is summarized in the list, as archived on 20 November 2019. I count three substantive errors in the summary and one spelling error, highlighted below, and not counting the fgender in the header or the singular RateMyProfessor:

The summary referred to the online courses as being from different universities, but all of the online courses in the Mitchell and Martin 2018 analysis were at the same university. The summary referred to "female instructors" and "male professors", but the Mitchell and Martin 2018 analysis compared comments and evaluations for only one female instructor to comments and evaluations for only one male instructor. The summary indicated that female instructors were evaluated differently in intelligence, but no Mitchell and Martin 2018 table reported a statistical significance asterisk for the Intelligence/Competency category.

---

The aforementioned errors in the summary of Mitchell and Martin 2018 can be easily fixed, but that would not address a flaw in a particular use of the list, given that, from what I can tell, Mitchell and Martin 2018 has errors that undercut the inference about students using different language when evaluating female instructors than when evaluating male instructors. Listing that study and other studies as evidence of bias in student evaluations of teaching based on an uncritical reading of results shouldn't be convincing evidence of bias in student evaluations of teaching, especially if the categorizing of studies does not indicate whether "bias" is operationalized as an unfair difference or as a mere difference.

I think there would be value in a version of "Evidence of Bias in Standard Evaluations of Teaching" that accurately summarizes each study that has tested for unfair bias in student evaluations of teaching using a research design with internal validity and plausibly sufficient statistical power, especially if each summary were coupled with a justification of why the study provides credible evidence about unfair bias in student evaluations of teaching. But I don't see why anyone should be convinced by "Evidence of Bias in Standard Evaluations of Teaching" in its current form.

Tagged with: ,