Is there hidden bias in performance evaluations?

A NewsGuild-led employee delegation, which was formed after the enormous response from our membership to the Tom Cotton Op-Ed published in June 2020, has met twice with management to discuss diversity, equity and inclusion, and inquire about the concrete steps management is taking to address the issues raised. We found that more effort was coming from management, but their willingness to collaborate with us was lacking.

As part of our ongoing efforts on Diversity, Equity and Inclusion, we requested data around performance reviews from the company, and we are presenting our findings here. Long story short: We found what looks like systemic hidden bias in performance evaluations conducted by The Times in 2019 and in 2018, when widespread performance reviews began.

Although the data accessible to the Guild is limited, the results are strong enough that we believe they warrant a clear call for additional investigation and transparency from management.

What did we find?

  1. There is a strong pattern of racial disparity in reviews. Employees of color were disproportionately likely to receive low ratings, while white employees were more likely to be rated highly.
  2. The discrepancies most clearly affect employees who identify as Black and Hispanic.
  3. Imbalances are obvious in 2019 as well as 2018 and occur in roughly the same pattern in both years. 

What We Looked At

  • We analyzed how ratings from Guild members' performance reviews corresponded to racial and ethnic self-identification categories, gender, age and department. 
  • We got data on reviews for Guild members only; the Guild does not have access to non-Guild employee data. There were 986 Guild members in 2018 and 1,010 Guild members in 2019.
  • Each year, the vast majority of members — about 86 percent — were in Newsroom or Opinion. 
  • About 70 percent of total members each year were white, and racial makeup was stable.

The Rating System Over Time

  • The ratings system changed slightly from its first year to its second. In 2018, there were five categories, and more than 40 employees received no rating.
  • In 2019, there were six categories, and almost all employees received a rating. 
  • We mapped the qualitative ratings used by The Times onto a scale of 1 to 5 in 2018, and 1 to 6 for 2019, when an extra category was added. You can see these categories below.

The specifics:

In 2018, Black and Hispanic employees were 16.3 percent of membership but 43.9 percent of people scoring a (2). White members were 71.6 percent of the Guild but 83.1 percent of people with a (5), which was the top score.

In 2019, Black and Hispanic employees were 15.9 percent of membership but 33.3 percent of people scoring a (2). White members were 70.3 percent of the Guild but 77.8 percent of people receiving a (6).

What the disparity looks like: Compared with their share of the Guild population, workers of color are under-represented at high score levels and over-represented at lower ones.

How Each Racial Group Was Overrepresented or

Under-Represented at Each Score



In 2018, on average, Black employees scored 0.58 points less than white employees. The difference was significant at the 95 percent confidence level. Hispanic members scored 0.21 points less, on average, than white employees, and this too was statistically significant. Other nonwhite groups also scored lower, but the difference in those categories was not significant.

In 2019, Black members scored 0.32 points less than white members, on average, a number that was again significant to 95 percent. Hispanic workers scored 0.23 points less; this was significant to the 90 percent level but not 95 percent. Most other groups also had lower scores than white members, on average, but the difference was not significant.

As for employees who didn’t receive ratings…

In 2018, a large number of employees did not receive a rating. Employees of color were slightly over-represented in this category. However, almost all employees received a rating in 2019.

Two people, both Black employees, received a rating of "Not Meeting Expectations," a (1), in 2018. In 2019, this category was combined with those not receiving reviews at all, including some people leaving the company. This made analysis difficult. Five people received the 2019 version of the (1) rating: three white employees, one Hispanic employee and one whose race was not specified.

Gender and Age Analysis

Gender analysis indicated that women were slightly less likely to receive the highest ratings. In 2018, they were 48 percent of the membership but only 40 percent of those in the top category. In 2019, women were 49 percent of Guild membership and 44 percent with the highest rating. Other ratings showed more balanced results. (Membership includes a small number of non-binary employees, but they did not receive the highest or lowest ratings.)

We also analyzed discrepancies by age. Employees over 61 are less likely to receive the highest rating,  but it is difficult to say this is compelling evidence of potential bias, considering that age, seniority, and job expectations may be logically correlated. 

Analysis by Department

Some departments and desks may tend to provide skewed ratings. If members of certain races cluster in these sections, this may affect results. However, such inter-departmental explanations would not let the company off the hook. Rather, the immediate question should be: Why are departments with large numbers of employees of color rated more harshly

Technology and Advertising, for example, showed markedly different distributions in 2019. The large proportion of (3) ratings in Technology is particularly notable because that department has a relatively large percentage of employees who are Black.

Additional Caveats

Because the data is restricted to ratings from Guild members, it is possible that it does not paint a full picture of the treatment of employees of different races and ethnicities when considering the company as a whole.

The need to anonymize the data means that we can see only one attribute at a time. For example, we can tell how many Hispanic employees received a (4) and how many newsroom employees received a (4), but not how many Hispanic women in the newsroom received a (4). This means we cannot perform the sort of multivariate analysis that could help isolate potential biases.

Guild members should be aware of the caveats above, but the pattern of racial disparities is so stark that it warrants additional attention by leadership in spite of any limitations in the analysis.

You may view the full report here. If you have questions, contact Local Representative Barbara Davis. 

Where We Go From Here

Clearly, there’s a lot of work to be done and many of you have stepped up to engage with management and hold them accountable in creating a workplace that strives to eliminate a culture of systemic inequity and racism. It’s up to us to ensure that the conversation continues. Now’s the time to get involved. If you want to join our discussion on how to take action around these findings, email Unit Chair Bill Baker to receive an invite.