Skip to main content

Diversity: Fully Exploit the Potential of Data Science

Just around 25% of the participants of the 8th Swiss Conference on Data Science are female. This aligns with the report from the World Economic Forum that claims that women make up only an estimated 26% of workers in data and AI roles globally. But why? And how can we change that?

We talked to Christian Hopp from BFH, who has addressed academic careers, gender, and diversity in his research, and Teresa Kubacka, who works as a freelance data scientist at Litix and is a member of a community “Women in Machine Learning and Data Science Zurich”

Why do you think diversity is important in Data Science?

Christian Hopp: Very generally, why would anyone think that diversity should not be important in Data Science? Aiming for example at gender balance or the equal representation of minorities in Data Science means fully exploiting the talent pool and fairly distributing opportunities.

Software, algorithm, or more broadly technology development in general, is first and foremost an interactive process, where various actors are involved and where communication and collaboration help to combine different knowledge bases. Hence, to fully exploit the potential of Data Science, it is paramount to involve individuals from all sorts of backgrounds (gender, ethnicity, age, socio-economic background, etc.). Diversity broadens the search horizon, it may help to develop insight and products/service offerings that are more responsible, more ethically, and socially acceptable. When the process itself is more inclusive, so will the final solutions developed be. This may range from making AI applications less racist or including more female-centered views when developing algorithms. If we fail to pay attention to diversity, many of the unconscious biases that have been uncovered may find their way into algorithm development. In the end, we may end up with severely biased algorithms and less trusted by potential user groups.  Diversity in Data Science can endorse these relevant values and viewpoints already during the technology development process. Values like diversity and inclusiveness of the development teams/companies need to be front and center to ensure responsible and inclusive technology and innovation development. 

Teresa Kubacka: Data science needs diversity because we live in diverse societies.  Although we tend to think that “data is the new oil” and “data speaks for itself”, data is not part of the natural world as the oil or the force of gravity, but it is people who actively create data. People decide which data is important enough to collect and what to leave out. People decide which research question is worthwhile the effort and which projects to invest in. This is why if our goal is to create data-driven products that make sense for all the members of society, we have to include a diverse representation of society in the process of creating those products and defining what is important. We have many examples of data projects that backfired spectacularly because they have been designed and developed by a homogeneous group of people (for example a health monitoring app that doesn’t have a functionality to track menstruation). Luckily, we also have many examples of projects where inclusive data science projects led to more empowerment or have driven positive change. This holds for all kinds of diversity, not only gender diversity. Last but not least – can we afford to lose talented data scientists only because they don’t have “the right appearance (gender, skin color, etc.) for the job”? 

What’s keeping women out of Data Science?

Christian Hopp: To be honest. I sincerely do not fully know, but I wish to understand. We have done prior research in STEM fields, focussing on female academic careers. We found that gender stereotyping attributes lower field-specific ability to women. In sum, women aspiring for an academic STEM career with leadership responsibility are confronted with “double” incongruity: First, they are experts in domains that are clearly male-dominated, subjecting them to severe biases stemming from the perception of their abilities. Second, even an aspiration for leadership was still seen as atypical for women. It could be that careers in data science are fraught with problems because women have to fulfill expectations in very male-dominated environments. 

Also, interactions with colleagues and superiors played a similarly important role in academic careers in STEM fields. Oftentimes role models are important to pursue a certain career path. Especially early career stages are sensitive periods in which influential imprints may be left. A lack of prominent role models might keep women out of data science as a career choice. But to answer this more fully, we would need more empirical evidence as to how individual aspects of gender imbalance interact and co-evolve with systemic ones. Formal and informal rules, proximate social structures, organizational culture, professional networks, couple perspectives, prevailing stereotypes, and individual motives may all interact here.  

Teresa Kubacka: In my experience, it’s not the lack of interest. I meet plenty of women who are fascinated by data science and are highly competent to become good data scientists. Many obstacles that they face are the same for women in tech in general. Here I’ll focus on the ones most characteristic for data science. 

One group of obstacles is related to a stereotypical perception of who can be a good data scientist: it is a person with a formal degree in an area historically predominantly given to men (computer science, mathematics, etc.), so women are statistically more likely to be perceived as not having the “right” qualifications. This happens also because a data scientist is often perceived to be a better software engineer and there is low awareness of a big variety of different flavors of data scientist roles among recruiters as well as applicants – some roles are more product-oriented, there are many teams where a data scientist needs to be a good communicator and structured, analytical thinker in the first place, some data science roles have a strong UX and product design component. 

Another group of obstacles is more of a mundane nature, but can, unfortunately, be a real deal-breaker. One big issue is a lack of data science roles with 60% workload and a relatively small market for freelancers, which in Swiss reality means that women who have to share a large portion of family responsibilities cannot easily work as part-time data professionals. It is also not easy to make a transition into data science gradually on the job, without investing time after work into getting a certification (like a CAS or a Bootcamp). Working women with family responsibilities are particularly impacted by this because they cannot afford the time. 

The third group of obstacles comes from within the existing machine learning/data science community. Some things that used to be perceived as normal in male-dominated communities are perceived as hostile by many women. For example, until not long ago one of the most important AI conferences used to be called “NIPS” with a pre-conference event called “TITS”. Only after severe criticism, it has been hesitantly renamed to “NeurIPS” (link 1, link 2). As a woman wanting to enter the field, you start asking yourself: will I be taken seriously there if they picked an acronym for a conference after a female body part? 

The last thing that I think is also relevant is that on one side, requirements for data scientists written by some recruiters are unrealistic, but on the other side, many women don’t believe that they can apply and do the job if they fulfill only part of the requirements and learn the rest on the job. This is why it’s important to build up their confidence – for example by creating networks of female professionals who can exchange experiences, by creating inclusive environments that allow for free experimentation and learning by doing, and by engaging in different mentoring programs both as mentor and mentee. 

How can you support and push diversity forward?

Christian Hopp: Generally, I think it is important to increase awareness through communication. Organizations need to put the benefits of diversity front and center. Not only on the webpage and other communications but in the hearts and minds of people working in data science. We need to wholeheartedly embrace the notion that diversity leads to better, more inclusive, and more innovative outcomes. Second, and that being said from a middle-aged, white, male professor from a non-academic family background trying to educate the next generation of data scientists, we need to activate and communicate through role models. Career paths in data science need to become more visible for women, diverse, or minority individuals. Third, we need more mentoring for diverse, minority individuals in companies but also in academic training.

Teresa Kubacka: I can give some examples based on the activities in our community. We organize meetups and workshops aimed to support women and gender minorities in data science and machine learning. Our community members can talk about their data science projects and learn new skills in a friendly environment. We try to give an opportunity to speak to people who have different kinds of data science roles and life situations to present a variety of inspiring role models. We put a lot of emphasis on events where the participants can engage in informal coaching and at least once a year we try to organize a career event. We also support other communities for women in tech, for example by participating in the conference “WeTechTogether”, where more than 20 communities took part, and where WiMLDS, Litix, and Databooster organized a geodata workshop together. 

However, we can only do that much and there is still much more systemic change that needs to happen. There is no one single solution and every organization will have to find its own path. Some solutions come as an answer to the obstacles to diversity I described before. For example, the Swiss job market would need to open up for more part-time work in data science. Over the last few years, we have already seen an increase in 80% workload positions. So if you are a manager and have an open position for a data scientist, consider making it a 60-100% position or a job-sharing position. If you have an employee with a strong analytic skill that is inspired by data science, think if there is a way for this person to learn some data science skills within their current role. As a general rule, we are all biased and use stereotypes, so it’s always good to check your bias and privilege, and question your instinctive hiring choices because they may act against inclusiveness and diversity. If you design a product or start a project, put effort into assembling a diverse team. If you can, encourage people from historically non-privileged groups to participate in high-profile projects and give them credit and recognition for their effort. Once an organization embraces diversity as its value, and not as an option for interested participants, many of these changes will follow naturally as a consequence of this choice.