Many innovators in the A.I & machine learning space are facing a problem with training their technology. The data they are using to teach their algorithms are limited to being sourced from the web or fair use resources, which are open to prejudice or inadequate for not including a diverse range of people.
For example, here are a few quick Google image searches that help visualize disparity in inclusiveness and representation. The following results are from searching “man”, “woman” “hair” & “beautiful smile.”
Representation of Men & Women, According to the Internet.
What People With Hair & Beautiful Smiles Look Like, According to the Internet.
Questions arise when A.I is exposed to cultural stereotypes, semantic and systemic biases that have been cultivated for decades on the web. Many of us have been unconsciously conditioned into accepting these biases as ‘norms’; programming of A.I bring these biases to light and puts their problematic nature into context — forcing us to be more equitable and accountable. Many question whether A.I is inherently biased because it is being trained by humans, who themselves, are imperfect.
The answer lies in asking two key questions:
1) What data is being used to train the A.I, where’s it being sourced from and is it ‘pure’ data?
2) Is there a conscious effort being made from the ground up to implement systems in the A.I that stop bias from happening?
Why A.I Become Bias
When done wrong, it is common to see racist and misogynistic characteristics in machines that are being trained by data that is sourced from the internet or where a serious effort has not been made to address bias from the beginning.
The following examples quickly highlight why using the internet as a medium for building data sets in machine learning is problematic & proliferates pre-existing human biases.
1) Microsoft and their twitter chatbot, Tay turned into a pro-hitler, racist in about a day, tweeting statements such as:
“Bush did 9/11 and Hitler would have done a better job than the monkey we have now. Donald Trump is the only hope we’ve got.”
Tay was learning and pulling its data from what it was being tweeted by people on the internet.
2) Recent research indicates that A.I trained on a standard body of text from the web, results in it associating European American names with pleasant words such as “gift” or “happy”, while associating African American names with unpleasant words.
3) Beauty.AI performed a beauty contest judged by artificial intelligence. There were 6,000 uploaded selfies from 100 different countries. From the 44 winners, a handful were Asian. The rest were white, excluding one. A.I isn’t intentionally taught that people with white skin are more beautiful, it does, however assume that if the data it’s being fed consists mainly of white people.
Alex Zhavoronkov, Beauty.AI’s Chief Science Officer, in the Guardian, explains the importance of diverse data sets:
“While there are a number of reasons why the algorithm favoured white people, the main problem was that the data the project used to establish standards of attractiveness did not include enough minorities…. If you have not that many people of colour within the data set, then you might actually have biased results.” he said at the time.
Here’s how to build an A.I without bias:
1) Use ‘pure data,’ which is a database that has been built from scratch, is free of inherent bias, inclusive and diverse in nature.
2) Build the A.I being mindful of bias & holistically implement systems that prevent it from happening.
3) A human touch is essential for quality control, to identify whether any biases are being developed when introducing new data to the A.I.
For example, Knockri curates video job applications through analysing the applicant’s verbal and non-verbal communication. We have built our own proprietary data set, which emphasizes inclusiveness and diversity. In addition, when quantifying a candidate’s competencies, the algorithm does not account for an individual’s ethnicity, gender, appearance or sexual preference as measures of desirability for a hire. We are able to recognize individual ethnicities when initially pinpointing facial landmarks, however. This ensures no one ethnicity is the standard to which all others are analyzed.
A.I is only as good as its data sets. When built mindfully, artificial intelligence can provide businesses with a scalable resource, that can consistently help people make more objective decisions.
Author: Maaz Rana