Science meets bias and diversity

Deborah Belle is a psychology professor at Boston University (BU) who is interested in gender differences in social behavior. She has reported a shocking result about bias.

Today I thought I would discuss the issue of gender bias and also the related issue of the advantages of diversity.

Lately at Tech we have had a long email discussion on implicit bias and how we might do a better job of avoiding it in the future. My usual inclination is to think about such issues and see if there is some science behind our assumptions. One colleague stated:

The importance of diversity is beyond reasonable doubt, isn’t it?

I agree. But I am always looking for “proof.”

Do not get me wrong. I have always been for diversity. I helped hire the first female assistant professor to engineering at Princeton decades ago. And I have always felt that it is important to have more diversity in all aspects of computer science. But is there some science behind this belief? Or is it just axiomatic—something that we believe and needs no argument—that it is “beyond reasonable doubt?”

This is how I found Deborah Belle, while looking on the web for “proof.” I will just quote the BU Today article on her work:

Here’s an old riddle. If you haven’t heard it, give yourself time to answer before reading past this paragraph: a father and son are in a horrible car crash that kills the dad. The son is rushed to the hospital; just as he’s about to go under the knife, the surgeon says, “I can’t operate—that boy is my son!” Explain …

If you guessed that the surgeon is the boy’s gay, second father, you get a point for enlightenment… But did you also guess the surgeon could be the boy’s mother? If not, you’re part of a surprising majority.

In research conducted by Mikaela Wapman […] and Deborah Belle […], even young people and self-described feminists tended to overlook the possibility that the surgeon in the riddle was a she. The researchers ran the riddle by two groups: 197 BU psychology students and 103 children, ages 7 to 17, from Brookline summer camps.

In both groups, only a small minority of subjects—15 percent of the children and 14 percent of the BU students—came up with the mom’s-the-surgeon answer. Curiously, life experiences that might [prompt] the ‘mom’ answer “had no association with how one performed on the riddle,” Wapman says. For example, the BU student cohort, where women outnumbered men two-to-one, typically had mothers who were employed or were doctors—“and yet they had so much difficulty with this riddle,” says Belle. Self-described feminists did better, she says, but even so, 78 percent did not say the surgeon was the mother.

This shocked me. I knew this riddle forever it seems. But was surprised to see that the riddle is still an issue. Ken recalls from his time in England in the 1980s that surgeons were elevated from being addressed as “Doctor X” to the title “Mister X.” No mention of any “Miss/Mrs/Ms” possibility then, but this is now. I think this demonstrates in a pretty stark manner how important it is to be aware of implicit bias. My word, things are worse than I ever thought.

Bias In Diversity Studies

I looked some more and discovered that there was, I believe, bias in even studies of bias. This may be even more shocking: top researchers into the importance of diversity have made implicit bias errors of their own. At least that is how I view their research.

Again I will quote an article, this time from Stanford:

In 2006 Margaret Neale of Stanford University, Gregory Northcraft of the University of Illinois at Urbana-Champaign and I set out to examine the impact of racial diversity on small decision-making groups in an experiment where sharing information was a requirement for success. Our subjects were undergraduate students taking business courses at the University of Illinois. We put together three-person groups—some consisting of all white members, others with two whites and one nonwhite member—and had them perform a murder mystery exercise. We made sure that all group members shared a common set of information, but we also gave each member important clues that only he or she knew. To find out who committed the murder, the group members would have to share all the information they collectively possessed during discussion. The groups with racial diversity significantly outperformed the groups with no racial diversity. Being with similar others leads us to think we all hold the same information and share the same perspective. This perspective, which stopped the all-white groups from effectively processing the information, is what hinders creativity and innovation.

Nice study. But why only choose to study all-white groups and groups of two whites and one black? What about the other two possibilities: all black and two blacks and one white? Did this not even occur to the researchers? I could imagine that all-black do the best, or that two black and one white do the worst. Who knows. The sin here seems to be not even considering all the four combinations.

It Is Even Worse

Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, Adam Kalai have a recent paper in NIPS with the wonderful title, “Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings.”

Again we will simply quote the paper:

The blind application of machine learning runs the risk of amplifying biases present in data. Such a danger is facing us with word embedding, a popular framework to represent text data as vectors, which has been used in many machine learning and natural language processing tasks. We show that even word embeddings trained on Google News articles exhibit female/male gender stereotypes to a disturbing extent. This raises concerns because their widespread use, as we describe, often tends to amplify these biases. Geometrically, gender bias is first shown to be captured by a direction in the word embedding. Second, gender neutral words are shown to be linearly separable from gender definition words in the word embedding. Using these properties, we provide a methodology for modifying an embedding to remove gender stereotypes, such as the association between the words receptionist and female, while maintaining desired associations such as between the words queen and female. Using crowd-worker evaluation as well as standard benchmarks, we empirically demonstrate that our algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks. The resulting embeddings can be used in applications without amplifying gender bias.

Here is one of their examples. Suppose we want to fill X in the analogy, “he is to doctor as she is to X.” A typical embedding prior to their algorithm may return X = nurse. Their hard-debiasing algorithm finds X = physician. Yet it recognizes cases where gender distinctions should be preserved, e.g., given “she is to ovarian cancer as he is to Y,” it fills in Y = prostate cancer. Their results show that their hard-debiasing algorithm performs significantly better than a “soft-debiasing” approach and performs as well or nearly as well on benchmarks apart from gender bias.

Overall, however, many have noted that machine learning algorithms are inhaling the bias that exists in lexical sources they data-mine. ProPublica has a whole series on this, including the article, “Breaking the Black Box: How Machines Learn to be Racist.” And sexist, we can add. The examples are not just linguistic—they include real policy decisions and actions that are biased.

How to Balance Bias?

Ken wonders whether aiming for parity in language will ever be effective in offsetting bias. Putting more weight in the center doesn’t achieve balance when all the other weight is on one side.

The e-mail thread among my colleagues centered on the recent magazine cover story in The Atlantic, “Why is Silicon Valley so Awful to Women?” The story includes this anecdote:

When [Tracy] Chou discovered a significant flaw in [her] company’s code and pointed it out, her engineering team dismissed her concerns, saying that they had been using the code for long enough that any problems would have been discovered. Chou persisted, saying she could demonstrate the conditions under which the bug was triggered. Finally, a male co-worker saw that she was right and raised the alarm, whereupon people in the office began to listen. Chou told her team that she knew how to fix the flaw; skeptical, they told her to have two other engineers review the changes and sign off on them, an unusual precaution.

One of my colleagues went on to ascribe the ‘horribleness’ of many computer systems in everyday use to the “brusque masculinism” of their creation. This leads me to wonder: can we find the “proof” I want by making a study of the possibility that “men are buggier”—or more solidly put, that gender diversity improves software development?

Recall Ken wrote a post on themes connected to his department’s Distinguished Speaker series for attracting women into computing. The series includes our own Ellen Zegura on April 22. The post includes Margaret Hamilton and her work for NASA’s Apollo missions, including the iconic photo of the stack of her code being taller than she. Arguments over the extent of Hamilton’s role can perhaps be resolved from sources listed here and here, but there is primary confirmation of her strong hand in code that had to be bug-free before deployment.

We recently posted our amazement of large-scale consequences of bugs in code at underclass college level, such as overflowing a buffer. Perhaps one can do a study of gender and project bugs from college or business applications where large data could be made available. The closest large-scale study we’ve found analyzed acceptance rates of coding suggestions (“pull requests”) from over 1.4 million users of GitHub (news summary) but this is not the same thing as analyzing design thoroughness and bug rates. Nor is anything like this getting at the benefits of having men and women teamed together on projects, or at least in a mutual consulting capacity.

It is easy to find sources a year ago hailing that study in terms like “Women are Better Coders Than Men…” Ordinarily that kind of “hype” repulses Ken and me, but Ken says maybe this lever has a rock to stand on. What if we ‘think different’ and embrace gender bias by positing that women approach software in significantly different ways—?—where having such differences is demonstrably helpful.

Open Problems

What would constitute “proof” that gender diversity is concretely helpful?

Littlewood’s Law and Big Data

 “Leprechaun-proofing” data source

Neil L. is a leprechaun. He has visited Dick on St. Patrick’s Day or the evening before many times. Up until this night I had never seen him.

Today, Neil’s message is more important than ever.

The breaks keep on coming…

Holly Dragoo, Yacin Nadji, Joel Odom, Chris Roberts, and Stone Tillotson are experts in computer security. They recently were featured in the GIT newsletter Cybersecurity Commentary.

Today, Ken and I consider how their comments raise a basic issue about cybersecurity. Simply put:

Is it possible?

With a little more from Smullyan

Maurice Ashley is an American chess grandmaster. He played for the US Championship in 2003. He coached two youth teams from Harlem to national championships and played himself in one scene of the movie Brooklyn Castle. He created a TEDYouth video titled, “Working Backward to Solve Problems.”

Today we discuss retrograde analysis in chess and other problems, including one of my own.

Serious work amid the puzzles and jokes.

 Amazon source

When Raymond Smullyan was born, Emanuel Lasker was still the world chess champion. Indeed, of the 16 universally recognized champions, only the first, Wilhelm Steinitz, lived outside Smullyan’s lifetime. Smullyan passed away a week ago Monday at age 97.

Today, Dick and I wish to add some thoughts to the many comments and tributes about Smullyan.

A discussion on the famous problem

William Agnew is the chairperson of the Georgia Tech Theoretical Computer Science Club. He is, of course, an undergraduate at Tech with a multitude of interests—all related to computer science.

Today I want to report on a panel that we had the other night on the famous P vs. NP question.

What to do about claims of hard theorems?

 Cropped from source

Shinichi Mochizuki has claimed the famous ABC conjecture since 2012. It is still unclear whether or not the claimed proof is correct. We covered it then and have mentioned it a few times since, but have not delved in to check it. Anyway its probably way above our ability to understand in some finite time.

Today I want to talk about how to check proofs like that of the ABC conjecture.

The issue is simple:

Someone writes up a paper that “proves” that X is true, where X is some hard open problem. How do we check that X is proved?

The proof in question is almost always long and complex. So the checking is not a simple matter. In some cases the proof might even use nonstandard methods and make it even harder to understand. That is exactly the case with Mochizuki’s proof—see here for some comments.

Possible Approaches

Let’s further assume that the claimed proof resolves X which is the P vs. NP problem. What should we do? There are some possible answers:

• Ignore: I have many colleagues who will not even open the paper to glance at it. Ken and I get a fair number of these, but I do at least open the file and take a quick look. I will send a message to the author—it usually is a single author—about some issue if I see one right away.

• Show Me The Beef: I firmly believe that a proof of an open problem should have at least one simple to state new trick or insight that we all missed. I would suggest that the author must be able to articulate this new idea: if they cannot then we can safely refuse to read it. I have worked some on the famous Jacobian Problem. At one time an author claimed they had a proof and it was just “a careful induction.” No. I never looked at it because of the lack of “beef,” and in a few weeks the proof fell apart.

• Money: Several people have suggested—perhaps not seriously—that any one claiming a proof must be ready to post a “bond” of some money. If someone finds an error they get the bond money. If no one does or even better if the proof is correct, then the money can be donated to one of our conferences.

• Hire: I have seen this idea just recently. The author posts a request for someone to work on their paper as a type of consultant. They are paid a fair hourly rate and help find the error.

• Timeout: An author who posts a false proof gets a timeout. They are not allowed to post another paper or submit a paper on X for some fixed time period. Some of the top journals like the Journal of the ACM already have a long timeout in place. The rationale behind this is that very often when an error is found in such a paper the author quickly “fixes” the issue and re-claims the result. In Stanislaw Ulam’s wonderful book Adventures of a Mathematician he talks about false proofs: Here “he” refers to an amateur who often joined Ulam at his habitual coffeehouse:

Every once in a while he would get up and join our table to gossip or kibitz ${\dots}$ Then he would add, “The bigger my proof, the smaller the hole. The longer and larger the proof, the smaller the hole.”

• Knock Heads Together: Oxford University hosted in December 2015 a workshop to examine Mochizuki’s claimed proof, including contact by Skype with Mochizuki himself. A report by Brian Conrad on the MathBabe blog makes for engaging reading—we could quote extensively from its concluding section 6. This shorter news report cited feelings of greater understanding and promise but lack of definite progress on verifying the proof, noting:

…[N]o one wants to be the guy that spent years working to understand a proof, only to find that it was not really a proof after all.

• Share The Credit: Building on the last point, perhaps proper credit can be given to someone who does spent a great deal of time working on trying to understand a long proof. If they find an unfixable error, then maybe they can publish that as a paper—especially if the error is nontrivial and not just a simple one. If they show that the proof is indeed correct, could they be rewarded with some type of co-authorship? Maybe a new type of authorship:

P Does Not Equal NP: A Proof Via Non-Linear Fourier Methods
Alice Azure with Bob Blue

Here the “with” signals that Alice is the main author and Bob was simply a helper. Recall a maxim sometimes credited to President Harry Truman: “It is amazing what you can accomplish if you do not care who gets the credit.”

Open Problems

What do you think about ways to check proofs? Any better ideas?