When Is Big Data Bad Data? When It Causes Bias

Daily Labor Report® is the objective resource the nation’s foremost labor and employment professionals read and rely on, providing reliable, analytical coverage of top labor and employment...

By Kevin McGowan

July 28 — Employers are turning to computer-driven algorithms to find, recruit and hire job candidates online, but one negative output could be unintentional discrimination.

“It's a bit of a black box,” said Commissioner Victoria Lipnic (R) of the Equal Employment Opportunity Commission, referring to the formulas data analysts and programmers develop to aid employers in their talent searches.

Vendors that promote the algorithms say using a neutral formula that eliminates the human element, at least in the early stages of searching for and recruiting candidates, reduces the risk of unlawful bias.

But others fear the algorithms, depending on how they are constructed and used, could create or perpetuate discrimination based on race, sex or other protected characteristics.

Lipnic, lawyers and an academic interviewed by Bloomberg BNA this summer aren't sure current laws are adequate to address the potential discriminatory effects.

Global Search for Talent

Employers embrace these algorithms because the search for talent has accelerated and become a top corporate priority, said Heather Morgan, a partner with Paul Hastings in Los Angeles.

Employers need to trim the thousands of online applications they receive to a manageable pool of candidates having at least the minimum job qualifications, Morgan told Bloomberg BNA. Algorithms that search for particular terms in online candidates' credentials and “knowledge, skills and abilities” help them do so, she said.

Employers also are using “big data” to find and recruit “passive” candidates online because it's an “increasingly global and mobile world” featuring “fierce” competition for talent, Morgan said.

Industries making the greatest use of algorithms include high technology and financial services, sectors in which there's always been an “affinity” for statistical solutions to employment issues, she said.

But such techniques aren't confined to searches for highly skilled employees, said Christine Webber, a partner with Cohen Milstein Sellers & Toll LLP in Washington, which represents employees.

Workers applying for entry-level service and retail jobs may be asked to complete the online equivalents of old-fashioned personality tests, Webber told Bloomberg BNA. Their answers then may be scanned by an algorithm that makes quick determinations, she said.

Algorithms are used to search the “digital footprints” of potential job candidates, including those who haven't applied for a job and aren't actively seeking new employment.

This “data mining” is intended to unearth all the online information available about a candidate. The formulas use statistical matches between the characteristics of a company's incumbent successful employees and online candidates to predict which ones would also succeed.

Multiple Levels of Risk

The factors put into the formula may be statistically correlated with job success, but they aren't necessarily causally related to job performance, said Pauline Kim, a law professor at Washington University in St. Louis. Algorithms may be used to predict things such as job retention or odds of a workplace injury, which differ from the traditional employment test, Kim told Bloomberg BNA.

The algorithms may not actually be measuring an individual's ability to perform the job, she said.

They also may lead employers to replicate their current workforce's demographics. Searching for people who resemble a company's top-rated performers may perpetuate existing under-representation of women, racial minorities or other protected groups. If performance appraisals are affected by unconscious bias, that might be baked into the algorithm.

It's also possible certain identifiable groups of people have less of a “digital footprint” than others and won't be discovered by models that scan the Internet for potential job candidates.

Employers contemplating the use of algorithms should be clear on what they're trying to measure, whether their model actually measures those qualities and what the potential impacts on protected groups might be, Kim said.

Once they apply the formula, employers should retain their data and audit the results for potential bias, she said.

Are Current Laws Sufficient?

Some of those interviewed by Bloomberg BNA questioned if traditional legal analysis under Title VII of the 1964 Civil Rights Act is adequate to handle the emerging issues presented by employers' use of algorithms.

Workers excluded by an online selection device could allege disparate treatment or intentional discrimination, but the more likely claim is disparate impact. That requires a plaintiff to identify a specific employment practice that has a disproportionate adverse impact on individuals belonging to a protected class.

If disparate impact is shown, an employer can defend the practice as “job related” and “consistent with business necessity.” Even if an employer makes that showing, a plaintiff still could prevail under Title VII if he shows there's a selection device with less discriminatory impact that would achieve the business-related objectives.

Some of those steps are problematic when the selection device is an algorithm with elements that may be a mystery even to the employer or programmer.

Job candidates may not even know they have been rejected because of an algorithm, plaintiffs' attorney Webber said.

“People don't know that big data has been used on them,” she said. “That makes it quite a challenge” to file discrimination charges or lawsuits challenging its use, she said.

Some algorithms are always changing, as “machine learning” means the formula might be tweaked as a computer evaluates the results, Morgan said.

Discovering the elements of the algorithm could be difficult because vendors might claim trade secret protection for their proprietary formulas. Programmers also may be unable to identify what variable within an algorithm is producing discriminatory effects, Kim said.

One solution would be to identify the entire algorithm as the employment practice producing the alleged disparate impact, said Adam Klein, a plaintiffs' attorney with Outten & Golden in New York.

The affected worker shouldn't have to deconstruct an algorithm that even the employer might not fully understand, he said during an American Bar Association webinar June 21.

Validating Tools

The traditional method of showing business necessity is to validate the selection device as job-related, under the federal government's Uniform Guidelines on Employee Selection Procedures, issued in 1978 and unchanged since then.

An employer's use of algorithms in seeking passive job candidates or recruiting individuals who haven't applied presumably wouldn't trigger the need to validate the device, said Morgan of Paul Hastings.

But if the algorithm is used to choose among applicants, it could be difficult to define what “business necessity” means in this context, she said. Sometimes the algorithm is seeking to predict how long a candidate who is hired would stay in the job, she said. Such focus on retention isn't necessarily related to ability to perform the job, Morgan said.

Regarding the UGESP, Morgan said the “idea” of validating selection devices “may still be applicable” but “there are some open questions out there.”

For example, if disparate impact is shown, what should validation look like for an algorithm that's constantly changing, Morgan asked.

The traditional notions of test validation can be applied to algorithms, plaintiffs' attorney Klein said.

It’s “an employment practice” to use the algorithm, which functionally is “a test like any traditional tool,” he said during the ABA webinar. Provided the output data are available, Klein said a plaintiffs' statistical expert could do a “shortfall analysis” on how many members of the protected group were selected compared with how many should have been chosen absent discrimination.

“Very large pools of data aren't new either,” Klein said. For example, he said a recently settled discrimination case against the U.S. Census Bureau regarding its criminal background check policy involved 4 million applicants, with about 1 million hired and 800,000 subjected to background checks.

All the relevant concepts for analyzing an algorithm's potentially discriminatory effects are “pretty established” under Title VII and the UGESP, Klein said.

If an algorithm could be validated, it would be under a “criterion” basis, said David Jones, president and chief executive of Growth Ventures Inc., a technology-based candidate assessment company.

Criterion-related validity asks if a statistical correlation exists between test performance and job performance. It could be done by running the algorithm on current employees whom the employer rates as its best performers and evaluating the results.

That validation approach could work in the big data context, Jones said during the ABA webinar.

EEOC in ‘Learning Phase.'

Another concern in measuring potential discrimination from use of algorithms is whether an employer kept the relevant data, the EEOC's Lipnic told Bloomberg BNA.

Under Title VII, employers generally must keep applicant records for a considerable period after hiring decisions are made. Compliance with that obligation could be an issue in this context, Lipnic said. The EEOC generally doesn't pursue record-keeping violations as a stand-alone case rather than as a complement to a substantive discrimination claim, she said.

The EEOC in March 2014 held a public meeting on the impacts of social media on discrimination issues. Since then, all the agency's offices have been educating themselves on the potential bias issues raised by employers' reliance on algorithms and other online tools, Lipnic said.

“Everyone's definitely in a learning phase,” she said.

The agency is “very much trying to understand what is happening,” what's being created by employers and how it's being done, Lipnic said. The issues discussed at the EEOC's 2014 meeting were “the tip of the iceberg” compared with what employers are doing today, she said.

It can be difficult to determine when online job candidates become “applicants” and an employer's obligation to keep their data kicks in, Morgan said.

The Labor Department's Office of Federal Contract Compliance Programs in 2006 began applying its Internet applicant rule to federal contractors, she said. The EEOC also considered defining job applicants in the online context but ultimately didn't do so.

Under the OFCCP's definition, an online candidate must be responding to a posted job opening and possess at least the minimum qualifications to be deemed an applicant. The “basic qualifications” criterion is one way an employer can shrink its online pool of candidates, Morgan said.

But the OFCCP rule is “woefully outdated,” she said. Even after applying the rule, “we're still talking about running these tools against massive numbers of candidates,” Morgan said.

Employers would benefit from the EEOC and the OFCCP “rethinking” their current guidance that affects employers' use of online recruitment and selection tools, Morgan said.

“It's a very different world today,” featuring much more electronic data than when the agency guidance was formulated, she said. “It's time we have a very meaningful discussion,” Morgan said.

Alternative Approach Suggested

The EEOC and the courts also could consider an alternative way to analyze discrimination claims under Title VII, distinct from the disparate treatment and disparate impact paradigms, Kim said.

In a draft law review article, Kim suggested a new approach based on “classification” language found in Section 703(a)(2) of the act.

That provision makes it an unlawful employment practice for an employer to “classify” employees or job applicants in any way that “would deprive or tend to deprive” any individual of employment opportunities because of race, color, religion, sex or national origin.

Considering the issues raised by employers' use of big data through a prism of “classification bias” might be better than trying to shoehorn them into disparate impact analysis, Kim said.

Disparate impact analysis is “a particularly poor fit for addressing the types of harms potentially caused by workplace analytics,” Kim said in her paper. “Rather than providing specific criteria which are justified by clearly stated employer rationales, data models typically involve opaque decision processes, rest on unexplained correlations and lack clearly articulated employer justifications,” she said.

“When an algorithm relies on seemingly arbitrary characteristics or observed behaviors interacting in some complex way to predict job performance, the claim that it is ‘job related' often reduces to the fact that there is an observed statistical correlation,” she wrote. “If a statistical correlation were sufficient to satisfy the defense of job-relatedness, the standard would be a tautology rather than a meaningful legal test. In order to protect against discriminatory harms, something more must be required to justify the use of an algorithm that produces biased outcomes.”

Under her proposed Title VII analysis, an employer would bear the burden of establishing the algorithm's validity and it wouldn't be sufficient to show a statistical correlation exists.

A “bottom line defense” might make sense for employers using algorithms, Kim said. “Because of the difficulty of isolating the effect of particular variables, it will often make sense to treat the algorithm as an undifferentiated whole,” she wrote. “And if its operation does not disproportionately exclude members of protected groups, then it is difficult to identify a discriminatory harm in the absence of any motive directed against particular individuals.”

In any event, she said, the law “will have to depart from traditional disparate impact doctrine in significant ways if it is to respond effectively to these challenges.”

“Whether the discussion is framed in terms of ‘classification bias' or a revised disparate impact theory, the critical point is to recognize that data analytics are fundamentally different from the employer practices subject to challenge in earlier cases,” Kim wrote. “It is certainly possible to interpret Title VII in ways better suited to meet those differing threats to workplace equality.”

To contact the reporter on this story: Kevin McGowan in Washington at kmcgowan@bna.com

To contact the editor responsible for this story: Susan J. McGolrick at smcgolrick@bna.com

Copyright © 2016 The Bureau of National Affairs, Inc. All Rights Reserved.