Experts from Cardiff University and CGI discuss how AI and data science are being used to create innovative solutions to cyber security and societal challenges facing higher education.
Professor Pete Burnap from Cardiff University and Carwyn Cook from CGI discuss the integration of AI in higher education and cyber security, highlighting how research needs to be driven by industry demands. It effectively means focusing on practical application use cases and commercial potential in from the start and ensuring that there are industry champions to help drive the research.
Our experts discuss the data-driven nature of a lot of the applications of artificial intelligence and the challenges with managing the changing data, which Professor Burnap’s team is currently researching. They consider the question of how do we know when the data being used to make a decision using a machine learning approach changes, and how to manage the changing scenarios. The analogy of an AI “MOT” is discussed so just like taking a car to the garage to get it checked over and ensuring that it's still safe and roadworthy, similarly a similar kind of mindset needs to be considered for the upkeep of AI solutions
Meet the speakers
- Professor Pete Burnap, Professor of Data Science & Cybersecurity at Cardiff University, Director of the Cardiff Centre for Cybersecurity Research and the Founding Director of the Wales Cyber Innovation Hub
- Carwyn Cook, Vice President, Consulting Expert, CGI
- Transcript
-
Carwyn Cook: Hello, and welcome to today's podcast, which is part of CGI's AI and Industry series, where we speak to digital leaders about how artificial intelligence is transforming their particular industry sector. My name is Carwyn Cook, and I work with CGI's clients to develop digital roadmaps, setting out how they can use technology to improve outcomes for their organizations and for the customers or the citizens they serve.
Today's episode is one of a mini-series focusing on higher education, and I'm very pleased to be joined today by Pete Burnap, Professor of Data Science & Cybersecurity at Cardiff University. Pete's the Director of the Cardiff Centre for Cybersecurity Research and the Founding Director of the Wales Cyber Innovation Hub, which aims to help Wales become a global leader in cybersecurity by 2030. He's also a firm believer that research should be driven by real-world problems, and this is demonstrated by his team's work using AI and data science to create innovative solutions to cybersecurity and societal challenges.
Welcome, Pete. Thank you for joining us today.Pete Burnap: Hello. Thanks for having me.
Carwyn: As I mentioned in the introduction, I know you're very active in several areas of research, spanning cybersecurity, data science and AI, and that you and your team at Cardiff has already spun out several ventures taking this research out to market. I'd like to start by getting your thoughts on what helps to bridge the gap between a research project and something which is more of a long-term business venture, and perhaps any particular examples you've got that would help to bring that to life?
Pete: Absolutely. I think you captured it quite well earlier on in that as a group, as a research group, we are very applied in our focus. We aim to drive our research through industry demand, essentially, and align the work that we do to the work that industry needs. Essentially, when we're conceiving developments, for example, in AI-based cybersecurity defense, we spend a lot of time working to the people who would potentially use this, refining the problems and challenges that they face, and to try and develop scenarios that help us understand how such a solution would be used in practice.
Then, of course, the research that we do, some of it's quite fundamental, some of it develops new algorithms and new approaches, but ultimately, those are all done with the application use case in mind. I think that's key to actually moving from a research project to something that's used commercially. It effectively means that we are designing practical use and commercial potential in from the start, and have industry champions to help us drive that.
Carwyn: Yes, definitely. I know one of the areas that you're looking into is how to ensure AI systems remain trustworthy and safe over time. I suppose, particularly given the nature of things like machine learning models, where they grow and they adapt after deployment. When we were speaking recently, I think you used the analogy of an AI MOT. Just like taking a car to the garage to get it checked over, ensure that it's still safe and roadworthy. similar kind of mindset to the upkeep of AI solutions. What do you see as being the main risks which need to be monitored with artificial intelligence, and how is your research helping to inform the thinking around that area?
Pete: Ultimately, a lot of the applications of artificial intelligence at the moment are fundamentally data-driven. They're machine learning approaches, which means that effectively there's some applied mathematics behind the scenes trying to identify patterns in the data that it sees and understand which elements of the data lead it to make a decision that, for example, if it was an image recognition task, one image is of an orange and another image is of an apple. Because you've got two circular objects, essentially, but the colour differentiates them, and there may be some slight differentiation in shape. Ultimately, then, the shape of that and the data that represent that shape is the elements that the machine learning algorithm will use to determine whether something's an apple or an orange.
Now, one of the challenges with this is that data may change over time, and that is one of the key risks, is that an algorithm learns that something represents an outcome, and the data suggests an outcome, but it isn't actually the case. Perhaps a better example in that instance might be an image recognition system that classifies different types of leaves. An oak leaf versus a hazel leaf, for example.
The challenge there is, say you've got a bunch of data that's used to train that approach, and the leaves are all captured in June and July. They'll all be nice and green, nice and crisp, nice and round. Now, by the time it comes to November, those leaves will have changed in color, and they would have changed in shape. Actually, the machine learning approach that's been trained to recognize the difference between those two leaves will no longer work because the data that it's going to see being fed into it from the new sample collected later in the year is not going to provide the same data.
This can be thought of in pretty much any context you consider. Unless you've got a very closed deterministic system and set of data that's being collected, the likelihood is that your data will change regularly over the course of, well, it could be a year. It could actually be a couple of days or even a couple of hours. It leads us to the question of how do we know when the data you're using to make a decision using a machine learning approach changes, and what do you do about that? Those are a couple of fundamental areas of our research that we're looking at. Effectively, change point detection, where is the data changing sufficiently to have a concern that our machine learning approach might be throwing out the wrong decision? Secondly, when you can do that change point detection, what do you do about it?
Carwyn: In that example, then, what would you see happening there? Would it be triggering some manual intervention to get things back on track, for example?
Pete: Absolutely. One of the things that you can do is if you're detecting changes in the data that suggest that you might not be getting an optimum output, you may need to do some manual data collection again. You may need to consider the type of data you're using and whether you need alternative sources of data. There may be ways in which you can automate that approach to an extent. Ultimately, really, what we're trying to get towards is a hybrid approach where we can better understand the performance of machine learning over time and flag to people who ultimately-- obviously, always people are going to have responsibility and accountability for the outcomes of machine learning-based approaches. It's flagging to those people when the issues occur and then supporting them in making a decision in how to rectify it, much like any traditional risk-based approach.
I think that is one of the key blockers here is that, are people likely to adopt AI and machine learning to its full extent if they can't apply traditional risk assessment approaches and/or have confidence that they would know if the machine learning was doing the wrong thing?
Carwyn: Yes. I think that mitigates a lot of the concerns that people have about this technology. I think that human in the loop effectively is, I suspect, never going to go away if we want to ensure that these solutions remain safe and they're delivering value over time. Another area that I know you've done a lot of work into is the role of AI in cyber threat detection and response. That's an area where the cyber criminals, inevitably, they're also doing a lot of work in that area. They're doing their own research and development. The threat is continually changing. Can you tell us a bit about this and, in particular, how can we keep ensuring that AI deployed for good, if you like, is keeping pace with AI also being developed and used by bad actors?
Pete: Absolutely. Yes. it's an interesting one and it certainly keeps us on our toes. Some of the work that we've done over the last few years has been really trying to understand how we can use machine learning and effectively system behaviour that exists on a computer network. For example, at a device level, the data that's coming in and out of the network, the processes that are being created on a computer, the amount of memory and processing power that's being used at any given time, can we use those indicators to identify the difference between expected behaviour, or good behaviour, and bad behaviour that might be malicious?
We were successful in developing an approach that was able to use that form of data to detect ransomware starting to run on a Windows-based system. We started by actually running ransomware for 30 seconds up to a minute and collecting all that data retrospectively and then being able to distinguish between ransomware and normal behaviour. We were actually able to then reduce that to just a four-second window of data collection and determine that ransomware was effectively starting to execute on the computer within four seconds of it starting.
Of course, the next step then is what do we do about that? We also then developed a follow-on approach that was able to attribute the ransomware behaviour to actual processes on the computer and kill those processes in real time. We were able to reduce file encryption, and by doing that by over 90% with also an accuracy of over 94%, which is fantastic. It shows it can be done, and it helped us in that consistent, constant battle against the evolving cyber threat.
In terms of keeping up with threat actors, what we've also spent a lot of time doing is trying to build infrastructure that represents real world systems. Our cyber range, for example, we can build reasonably sized scale virtual enterprises. Virtualized Windows devices, laptops, phones, anything that we can virtualize and connected together as if they would be in a real office or enterprise type environment. Then, for example, deploying ransomware on it to see how that would spread across the network and how we can perform such analysis on a realistic network.That's another activity that we're trying to do to keep our sort of research current, and I suppose threat actors deploy their stuff in real world systems. They're lucky enough to be able to do this, if you can call it that. We don't have the luxury of being able to deploy our research solutions on real networks because, of course, they don't work straight away and we need to do it in a lab. Trying to build representative systems is a big task for us as well.
Carwyn: Yes, I can imagine. I think that example you gave is one of those good examples of something that pattern detection and so on that the machine learning solution can carry out. You couldn't replicate that with manual methods. It's one of those examples where AI really comes into its own, I guess. Thank you. I think that brings us on to another important topic, which is about the responsible and ethical use of AI in society more broadly. As the technology becomes more mainstream, that brings great opportunities, of course, but also risks. Here at CGI, we've created a responsible use framework, which is now integral to any AI related projects for us now. I'm just interested to know your thoughts on this area in particular.
Pete: Yes, absolutely. At the end of the day, it comes back to a risk-based approach and making sure that whatever is deployed is manageable from a risk perspective. That means transparency and understanding how machine learning approaches are making decisions, what's being fed in, how the decision is made and what comes out. That's hard. That's actually not particularly straightforward, particularly with things like deep neural networks where the algorithmic approach isn't as straightforward as being able to dump out a whole bunch of decision points. It gets quite messy quite quickly. Being able to do that is key.
The other point then is being able to test it as much as possible before it's rolled out. Like I said, I'm the director of the Wales Cyber Innovation Hub. In that context, we've developed this infrastructure that I was talking about that can be used by people who want to come in and test AI-based approaches on a realistic system. We can build the system. For example, say it was being used in a public services setting, we can build the IT infrastructure of the public services setting. They can deploy the AI on it and monitor its behaviour under different scenarios, stress testing, threat actors, potential misuse cases, and observe the outcome.Then of course, where it goes wrong in that sort of lab environment, which is a safe space, ensuring that it's fixed and sort of securely designed before it's actually rolled out into the public. It's doing a bit of testing and validation on it in a safe space before it's rolled out alongside having a strong risk management and transparent framework to understand how it's behaving and being monitored in the real world, much like, as you said earlier, the MOT-based approach for vehicle testing. We call the vehicles back in once a year for a test. We need to do the same thing with AI and machine learning. Of course, we can't call it back in. This is one of the challenges. Once it's out there, it's out there. We need to have an effective remote monitoring approach that gives us a transparent view of whether it's performing in a way it should.
Carwyn: Definitely, yes. I think that's a very valuable service that would be beyond the reach of individual organizations to try and replicate that setup. Finally, quite a tricky question, I guess, given the pace of change that we've been talking about within this area. What would you see as perhaps the new, the emerging areas of research that you think we'll see over the next couple of years, starting off within an academic research context, but then getting taken out to market?
Pete: Yes, so it's crystal ball time on that one. It's a challenging one. Certainly what we're seeing is a significant amount of research emerging now into content detection where the content's being generated by AI and machine learning. We've had obviously fake news we've had for a significant amount of time now, but with the large language models openly available now, the ability to create text and visual content is now far easier than it's ever been. The ability to detect content that's being generated that potentially is being created for malicious purposes is a big area of research. We're doing some of that in the cybersecurity research team at Cardiff University. We're also seeing bits of it being done in commercial research settings as well. The big tech players are working in this space as well. That's certainly going to be one big area, gen AI content detection.
Of course the other area then that is big and of particular interest is how generative AI can be used to conduct cyber attacks. The typical cyber attack kill chain is typically the elements of reconnaissance, understanding what's on a network, where the vulnerabilities are, how they can be probed, how they can be exploited, getting a foothold on a system, then pivoting and working through the system to exploit and extract data or conduct other activities that are undesirable. That's traditionally been reasonably human led.
If we've got the ability to have that AI led and make decisions to traverse networks after collecting some information about what's online on a network, that could be quite damaging very quickly and moves very much beyond the work I was talking about earlier in terms of detecting things like ransomware, trying to detect an autonomous agent on your network that's manipulating and causing undesirable outcomes on your network is a next level of understanding. That's going to be a big area of research.
Again, I would assume then we'll start to see commercial solutions hit the market as to, almost like a real time AI defender that will act on your behalf. It's tricky, of course, because as we've said earlier, realistically, are we going to see the ability to deploy AI without any human intervention? Probably not. When we get to the point of having multiple AI actors on your network who are enacting cyber attacks in real time, how do we keep up with that? Are we going to need to have co-pilots effectively in the security operations center helping support the humans make decisions in quicker than real-- well, it can't be quicker than real time, but quicker than the normal time they would take to make decisions because the AI is moving so quickly across the network. I think a big area of research there as well.
Carwyn: Yes, certainly. Certainly, I think, yes. It comes back to that arms race metaphor of the technology is advancing and it's being used for good and bad in parallel. That's been really interesting, Pete. Thank you so much. I'd just like to finish by saying a huge thank you to Professor Pete Burnap from Cardiff University for your time today and for sharing your insights with us. Thanks also to everyone listening in. I hope you enjoyed it. I certainly did. If so, please tune into the rest of the series. You'll hear other views, both from within higher education and from other industry sectors as well. Thank you for listening.
[END OF AUDIO]