Mastering the magic: Why data science remains critical in using and scaling AI

Risks posed by black box AI
How the role of data science in AI is evolving
Explainability and transparency in deploying AI solutions
Examples of how data scientists bring value to evolving AI
Why data science matters more than ever for AI development, deployment, and scaling
The future of data science

In a world where fact is getting harder to detect from fiction, and where information is increasingly retrieved from black box AI solutions, the role of data science is becoming more critical to mastering the magic, or more precisely, mastering the math to ensure organizations can responsibly use and scale AI to achieve business outcomes.

If you scroll through your LinkedIn feed or follow other conversations across tech-focused social media, one trend you will see is increasing user reliance on readily available GenAI models to explain themselves and the world around them. With AI expertise increasingly focused on AI model configuration versus AI models themselves, there is discussion that prebuilt and trained models will make data science and its practitioners obsolete.

While the role of data science is clearly evolving, the need for data scientists to ensure best practices and provide clarity on the models and data behind more accessible AI solutions is critical. As organizations across industries increasingly invest in AI, data science remains essential to understanding how best to use and scale this powerful technology across the enterprise.

Risks posed by black box AI

Best practices in AI ethics, data, and governance lean on ethical principles developed from academic research, which have also served as the foundation for responsible data science for decades. These principles include transparency, reliability, explainability, privacy, security, and fairness in data preparation and analysis.

Black box AI involves highly complex machine learning models that generate outputs from massive amounts of data, but their decision-making process is a mystery, even to their developers. Black box AI opens the door to accessing information without clarity on its source or the reasoning models behind it, which is a concern for organizations increasingly relying on GenAI and agentic models across their business operations. With black box AI, organizations run the risk of “trusting” solutions that operate using the same logic as auto-text, which we have seen, often comically, is unreliable at best. As we share below, data science plays an important role in reducing black box AI risks.

How the role of data science in AI is evolving

While advances in generative AI and agentic AI have changed the role of data scientists in developing and training algorithms to generate data insights, understanding the science behind how AI models work is more important than ever. This is especially true not only as the use of black box AI increases but also as the ability to configure and use pre-trained, low-code AI solutions becomes more accessible to both individuals and organizations.

The role of data scientists has evolved from architecting AI models to validating them and ensuring responsible AI principles are embedded. Data scientists help us understand what’s happening “under the hood,” so to speak, especially in terms of the math and mechanics behind GenAI and similar algorithms.

Transparency and explainability also are especially important as AI use cases increase in complexity. For example, more organizations are moving beyond pilots and simple GenAI use cases, such as standalone chatbots, toward agentic AI.

As organizations look to scale their AI usage, they need data scientists with the ability to validate trustworthy models, understand how the models were trained, implement appropriate guardrails, and help manage these solutions—accelerating their use, adjusting course, or putting on the brakes, as needed.

Explainability and transparency in deploying AI solutions

Reflecting on school experiences, arriving at the correct answer was often insufficient; demonstrating the underlying reasoning was equally essential. Without insight into one’s thought processes, it can be difficult to fully understand or trust an answer. Moreover, if the parameters of a problem shift, the ability to adapt responses becomes crucial.

Some organizational leaders and teams suggest that data scientists are becoming less critical for AI solution deployment. While this observation may hold true for AI model development (for which data scientists were traditionally deployed), complications can arise as solutions scale, integrate, and evolve alongside advancing technologies.

The increased accessibility of AI across varying skill levels is undeniable. Given the limited explainability of contemporary AI—including generative and foundation models—the expertise of data scientists is more vital than ever, even if used at different stages in the AI life cycle.

The task of interpreting the parameters, weights, biases, and mathematical functions within billion-parameter models presents significant challenges. Even more pressing is the issue of maintaining control over these systems.

Notwithstanding portrayals in popular culture, human input remains indispensable for comprehending and managing AI. Fundamentally, AI continues to be a tool rooted in algorithms and data, designed to generate insights and content based on user instructions. Although its complexity has increased, a fundamental principle endures; without diligent human oversight, even the most advanced AI models may deviate from intended outcomes.

Those with a deep understanding of these models can design methods to increase their reliability, as well as help detect when models or outputs are incorrect. This includes designing methods to protect against systemic risks such as the use of AI for unlawful and fraudulent purposes.

To promote fairness, transparency, and alignment with human values, organizations need to rely on robust statistical techniques. Data science methodologies underpin explainability efforts and establish clear criteria for model deployment or decommissioning in cases of unpredictable or unethical behavior.

Examples of how data scientists bring value to evolving AI

We have numerous examples of how data science is critical to ensure trustworthy outcomes of AI business strategies and analyses. Here are two examples to underscore our point.

Use case #1: Deploying a predictive analytics model

In developing a predictive analytics model for assessing customer churn in the online gaming industry, CGI data scientists conducted thorough statistical audits on both input data and model outputs. They applied techniques such as disparate impact analysis, fairness metrics (like equal opportunity difference), and feature attribution methods (e.g., SHAP values). These analyses uncovered unintended biases and patterns of unfairness in the model’s behavior, prompting a full re-evaluation, refinement, and eventual redeployment of the system.

In the same use case, CGI teams also implemented statistical drift (model and data) detection algorithms to monitor the model’s behavior over time. When signs of model degradation or bias emerged, they acted quickly, initiating root cause analysis, retraining, and adjusting to ensure continued trust and relevance.

All these control mechanisms are powered by statistical analysis, underscoring the critical role of data science in not just building AI, but governing it responsibly. Without a data scientist, there would be no understanding of what to monitor, which metrics to best use, and how to course correct model behavior when needed for more reliable outputs.

Use case #2: Determining which AI models to deploy in healthcare

In the healthcare space, machine learning (ML) has been used for decades to advance disease detection, support diagnostics, and leverage decision support and specialized systems to improve patient outcomes. Further, AI modeling provides information to practitioners more quickly and at the point of care.

While GenAI has advanced clinicians’ ability to interrogate data and unstructured data in their clinical notes more quickly, traditional AI is still the best tool to work in concert with GenAI agents to identify, access, and assess information patterns to give better advice and improve patient outcomes. As the integration of low code GenAI and traditional AI and ML models becomes more critical in this agentic AI healthcare ecosystem, data scientists are key to understanding the data, algorithms, and required outputs.

Understanding the best model to use to answer a question or address a problem reliably requires a deeper understanding of all available models. With agentic AI, where GenAI and traditional AI models work in concert, data scientists are key to designing the interactions. While configurable agentic AI solutions may have a business focused interface, understanding the underlying model to ensure its robustness and correctness is important to process and output quality.

Why data science matters more than ever for AI development, deployment, and scaling

Data science has driven the accessibility of AI and GenAI innovation through rigorous research, and the assumption that it’s no longer needed is like suggesting that we no longer need to understand or evolve the technology tools that are increasingly driving enterprise decisions. In a world of rapid evolution and turbulent AI deployments, data scientists and data engineers have enabled AI innovation, and they’re also the ones with the proverbial “kill switch” to contain AI.

Data scientists can leverage traditional AI and statistical models to validate, interpret, and identify the appropriate AI models and data to provide relevant and reliable outputs. They can ensure ethical practices, tackle data and algorithmic biases, and make sure the right models are applied to real-world business problems, so the results are trustworthy. Monitoring each stage of the design, development, and operations of AI systems is crucial to avoid unexpected issues and keep things clear and understandable.

By managing data and algorithms the right way, data scientists bring transparency and trustworthy insights to the table. They help businesses and people deal with the complexity of our tech-filled world, while making sure decisions are backed by solid, evidence-based information. In short, they ensure innovation stays on track and grounded in truth.

Delivering a production-grade AI project today involves more than configuring a chatbot using cloud platforms and ready-to-go models. While AI may seem as simple as an “API call away,” trusted outputs require the collaborative effort of data engineers, AI engineers, and data scientists to get models into production.

Innovation needs experience, and organizational leadership still needs analysts to curate and ensure relevant insights, despite disclaimers like "generated by AI." AI aids in generating hypotheses, but experienced data scientists are essential for designing experiments that extend beyond out-of-the-box configurable solutions. Remember, AI combines data science, computer science, math, programming, data architecture, and user experience.

The future of data science

In today’s world, where reality is becoming harder to detect, validating what is the “truth” and distinguishing human-generated content from AI-generated output is becoming increasingly complex. As tools like GenAI and autonomous agentic AI systems become more common, opportunities to spread misinformation proliferate.

Human data scientists are and will remain essential additions to AI infrastructure configuration teams to help provide responsible AI oversight and validation of AI systems. It’s the data scientist who understands the nuances of data quality, bias mitigation, ethical considerations, and rigorous experimentation needed to keep AI in check.

While AI can amplify insights, accelerate innovation, and automate routine tasks, the role of the data scientist has evolved. Rather than just build models and interpret data, data scientists are needed now to determine when, where, and how AI should be leveraged and scaled and, most critically, when it needs to be halted.

Data science is far from dead; it's more pivotal than ever, acting as both an accelerator and a critical control mechanism in an increasingly AI-driven world. After all, someone needs to know how the models work to drive ongoing innovation, as well as when to turn them off.

For further discussion, reach out to Diane or Parimal. Also, learn more about CGI’s AI work and capabilities.

About these authors

Diane Gutiw

Vice-President and Global AI Research Lead

Diane leads the AI Research Center in CGI's Global AI Enablement Center of Excellence, responsible for establishing CGI’s position on applied AI offerings and thought leadership.

View profile

Parimal Kulkarni

Director, Consulting Services - Canada

With a robust foundation in research, Parimal Kulkarni PhD blends technical expertise with strategic vision—propelling AI adoption and innovation in the fast-paced world of AI.

View profile

Alliances