FCA call for input on synthetic data
About Innovate Finance
Innovate Finance is the independent industry body that represents and advances the global FinTech community in the UK. Innovate Finance’s mission is to accelerate the UK's leading role in the financial services sector by directly supporting the next generation of technology-led innovators.
The UK FinTech sector encompasses businesses from seed-stage start-ups to global financial institutions, illustrating the change that is occurring across the financial services industry. Since its inception in the era following the Global Financial Crisis of 2008, FinTech has been synonymous with delivering transparency, innovation and inclusivity to financial services. As well as creating new businesses and new jobs, it has fundamentally changed the way in which consumers and businesses access finance.
Innovate Finance welcomes the opportunity to respond to the FCA’s call for input on synthetic data to support financial services innovation.
Synthetic data is the key for unlocking the development of a number of innovative products and services across a range of industry verticals. Importantly, it allows firms to create solutions which seek to address top tier priorities for the financial services sector, which align with regulators and Government’s aims, including tackling economic crime and supporting the transition to a Net Zero economy.
Given the clear benefits of synthetic data, Innovate Finance considers that it should be a strategic priority for regulators to continue to facilitate collaboration as regards the creation of synthetic data sets. We consider there is a role for the Digital Regulation Co-operation Forum (“DRCF”) to make this a focus of their current workplan, given the clear read across to privacy and competition. Regulators should consider prioritising use cases where there is a clear and immediate benefit to the public such as tackling economic crime.
This response has been informed by engagement with a cross-section of our membership base, including firms who generate synthetic data, and others such as RegTechs and challenger banks that use synthetic data to develop innovative propositions to tackle economic crime and climate-related risks. Innovate Finance would be pleased to discuss our response in more detail with the FCA and/or facilitate discussions with our members and the wider ecosystem.
Q1 How important do you think access to data is for innovation within financial services? What else do you view as significant barriers to innovation?
Innovate Finance considers that synthetic data is the key for unlocking the development of a number of innovative products and services across a range of industry verticals. Importantly, it allows firms to create solutions which seek to address top tier priorities for the financial services sector, which align with the aims of UK regulators and Government, including tackling economic crime and supporting the transition to a Net Zero economy.
We have seen growing consensus amongst industry players that the use of synthetic data will overtake real data for the purposes of Artificial Intelligence (AI) modelling. See, for example, the Gartner forecast:
The experience of our members and the wider FinTech and RegTech ecosystems point to two key barriers in the context of data usage and innovation:
Data scarcity: access to data is challenging, which can adversely affect companies looking to design services for new clients and companies trying to serve existing customers better. Critically, for very novel use cases or innovative solutions, the data required is rare, does not exist in sufficient quantities for training purposes, or it does not yet exist.
A number of our members are pioneering the use of synthetic data sets to tackle financial crime, including fraud. Deep learning models rely on large amounts of high-quality training data. However, the dearth of model-relevant data limits the potential of deep learning applications. Firms rightly need to be able to demonstrate to the regulators that they have effectively trained their AI models to identify fraud and other forms of financial crime, but in order to train the models, firms need two types of data: genuine and fraudulent customer transactions. However, fraudulent transaction data makes up a small percentage of firms’ overall data. Thus, by generating large data sets having similar statistical properties with the required data, synthetic data can significantly improve the performance and accuracy of deep learning models and allow firms to enhance their systems and controls.
Privacy: much of the data required for the development of new, innovative products and services relates to customers and individuals, which also needs to be adequately protected and individuals’ privacy retained. This is because firms want to innovate in a way that is tied to how consumers act, while handling clients’ data responsibly.
Some companies attempt to use PETs and pseudonymisation/anonymisation tools as a way of building data sets from real individuals. However, even with complex pseudonymisation tools, computational reversibility means there is a risk individuals can be reidentified. This is why pseudonymisation may not be an alternative for synthetic data, and why synthetic data can provide enhanced protection for consumers while still delivering the data needed to innovate.
Synthetic data offers the most effective solution to privacy challenges, and it does so without compromising security or privacy of consumers and other end users.
Q2: Do you agree that it is challenging to access high-quality financial data sets? If so, specifically what challenges do you face? (for example, understanding legal requirements around data access, commercially expensive, or technology infrastructure.)
Innovate Finance strongly agrees that it is challenging to access high-quality financial data sets.
Typically, established companies, rather than new market entrants or scaleups, hold the private data sets which could unlock and facilitate innovation in the financial sector. There are no commercial or other incentives for established firms to share these data sets (even if it were permissible under the UK’s data protection regime) with new market entrants or scale ups, as the innovative firms are often the incumbents’ competitors operating in the UK market.
We recognise that some of this data held by established firms will be personally identifiable data (PII) and governed by the UK’s data protection regulatory and legislative framework, and firms will be rightly cautious about using customers’ data in a way that undermines privacy. As such, even firms who have large PII data sets may not be at an advantage.
Q3. Do you agree with the high-level benefits for synthetic data? Are there any other benefits for synthetic data for your organisation, both now and in the future?
Innovate Finance agrees with the high-level benefits of synthetic data.
Q4: Does your organisation currently generate, use, purchase or otherwise process synthetic data? If possible, please explain for what purpose(s).
A number of Innovate Finance members use synthetic data. However, very few members currently generate synthetic data. Of those who generate synthetic data, the focus is on datasets which facilitate use cases across ESG, SME lending, and tackling financial crime.
Q5: If your organisation generates synthetic data, please describe at a high level the techniques used. Why have you chosen to use this approach?
Q6: What do you see as the difficulties and barriers for firms in creating high-utility, privacy secure synthetic data?
The Innovate Finance members that generate synthetic data have highlighted that comprehensive metadata discovery is crucial for their synthetic data generation techniques. Extracting statistical information for synthetic data creation requires access to analytics and distributions of key features of the data. Although different from the requirements of other generative modelling techniques, access to high-level overview from a statistical standpoint, or full access to real data is required. Data reserves are often in silos which can be another barrier to interlinking synthetic data ecosystems.
Q7: Does your organisation engage with privacy enhancing technologies or privacy preserving techniques other than synthetic data? How would you assess the utility and benefits of synthetic data in comparison to other techniques?
Please refer to Innovate Finance’s answer to Q1, regarding the shortcomings of PETs.
Q8: What do you see as the highest priority use cases that would benefit from synthetic data?
Innovate Finance considers that the highest priority use cases that would benefit from synthetic data include tackling economic crime, increasing financial inclusion with innovations in lending and credit risk assessments, and easing the transition to a Net Zero economy.
Synthetic data would also provide a powerful tool for government and regulators when assessing policy around data ethics (responsible AI), measuring potential biases in data sets and machine learning models.
Q9: Are the synthetic data use cases you have mentioned significant for early business phases or mature operations/processes within your organisation?
The experience of Innovate Finance members has shown that synthetic data can be useful across the lifecycle of a product and the lifecycle of a company.
Synthetic data could enable early stage companies to innovate and develop products.
Established companies can also innovate and develop new products, and they may also have a wider set of internal tools to test with synthetic data.
Q10: How would your organisation make use of synthetic data if it was available (if at all)?
Q11: What synthetic data sets would you find most valuable to have access to? For example, Open Banking, Customer profiles, account to account payments, Credit card transactions, trading data, etc. What challenges would these data sets help your organisation to solve? E.g. AML and fraud detection, ESG, etc. Please be specific.
The experience of Innovate Finance members indicates that the more synthetic data that is available, the more complex and realistic the synthetic data sets can be, and the more useful it is for innovators.
This is important because people are multifaceted, and innovation needs to be built in a way that matches this complexity. Some specific synthetic data sets which could enable current innovation areas include:
- Customer profiles help with developing better robust machine learning engines used by the credit checks industry, lenders among many others.
- Transactions data is pivotal in identifying fraud and money laundering.
- ESG data can be useful for large organisations to help their vendors and suppliers in their ESG transition. EPC data, insurance, lending data and other data lakes can be useful in this context. Most of the data required is private.
- Mortgage brokerage is another use case where private data has a role to play along with other data.
Q12: What requirements would you need for the synthetic data to feasibly meet your use cases? Please be as specific as possible (for example, details on volume, accuracy, referential integrity between sets).
Innovate Finance members emphasise that relational integrity is crucial in order to maintain the same within our synthetic data ecosystems. This is what brings utility to the data.
For consistency in time series data, access to data streams for a decided period of time is also needed.
Q13: Do you agree with our assessment of the potential limitations and drawbacks of synthetic data? Are there any others?
Q14: Do you believe that regulators should play a role in the provision of synthetic data? If so, what do you think the extent of that role should be? (e.g. co-ordination, generation, hosting, etc)
Guidance and co-ordination from the UK financial services regulators is needed from a high-level perspective to foster innovation in synthetic data generation. Especially with bringing together large data providers to facilitate referential integrity within data ecosystems.
Given the clear benefits of synthetic data, Innovate Finance considers that it should be a strategic priority for regulators to continue to facilitate collaboration as regards the creation of synthetic data sets. We consider there is a role for the DRCF to make this a focus of their current workplan, given the clear read across to privacy and competition. Regulators should consider prioritising use cases where there is a clear and immediate benefit to the public such as tackling economic crime.
Q15: To what extent would you be willing to collaborate with regulators and/or other organisations to generate synthetic data? For example, would you provide real data samples, or benchmark synthetic data against real data sets?
Q16: Do you think access to synthetic data should be a public utility for the purposes of innovation and research? Would you pay for access if it was delivered at-cost, or monetised?
Innovate Finance members have first-hand experience that proves that access to high-quality synthetic data can help drive innovation. It should be noted that the creation of high-quality synthetic data can be resource intensive, and any authority should carefully consider its own resource profile to develop and maintain such data sets.
Innovate Finance members who generate synthetic data have flagged that they often have to generate bespoke synthetic data sets to help clients based on individual use cases. Any authority will need to consider – if it is to provide synthetic data – whether it is ready to resource a near continuous need to produce bespoke synthetic data sets.