
Belated New Year wishes! I hope you and your family are keeping safe. For the first article of the New Year, I am not going to write about trends or top five healthcare tech etc., which you may already have read and seen on a variety of platforms.
Instead, I am going to write about something which, I believe, has a great potential in shaping the future of healthcare data analytics. For that reason, I am personally going to spend 6-7 years (joys of being a part time learner) of my life researching into Federated Learning (FL)!
We have seen significant development related to healthcare data analytics in last few years. In digital healthcare, the introduction of powerful Machine based learning (ML) and particularly, Deep Learning-based models have led to innovations in radiology, pathology, genomics and many other fields. But unlike other verticals such as finance and automotive, existing medical data is not fully exploited by ML, primarily because it sits in data silos and privacy concerns restrict access to this data. For example, different hospitals may be able to access the clinical records of their own patient populations only. And while regulations such as HIPAA, PHI, etc., are great at protecting such sensitive data, they also pose bigger challenge for modern data mining and ML techniques, such as deep learning which relies on large amount of training data.
Federated Learning (FL) is a learning architype that addresses the problem of data governance and privacy by training algorithms collaboratively without exchanging the underlying datasets. In simple words, it holds great promise on learning with fragmented sensitive data and instead of aggregating data from different places all together, or relying on the traditional discovery then replication design, it enables training a shared model with a central processing, while keeping the data in local repositories/location, where they originate from. Simple, but not quite!
The below diagram is a good depiction of a simple FL workflow.

Let’s dig a little more. Data driven healthcare is the main source for precision medicine, requires models to be trained and is evaluated on sufficiently large and diverse datasets. There is no denying that medical datasets are hard to curate, for reasons mentioned previously. The need for sufficiently large databases for AI training has deposited many initiatives seeking to pool data from multiple institutions. Large initiatives have so far primarily focused on the idea of creating data lakes. Examples include NHS Scotland’s National Safe Haven, the French Health Data Hub and Health Data Research UK. Centralising the data, however, poses not only regulatory and legal challenges related to ethics, privacy and data protection, but also technical ones – safely anonymising and controlling access. Therefore, transferring healthcare data is a non-trivial, and often impossible, task.
On the other hand, Federated Learning (FL) promises the solution to above challenges. In a FL setting, each data controller (hospital or other care facility) not only defines their own governance processes and associated privacy considerations, but also, by not allowing data to move or to be copied, controls data access and the possibility to revoke it. So, the potential of FL is to provide controlled, indirect access to large and comprehensive datasets needed for the development of ML algorithms, whilst respecting patient privacy and data governance. Moving the to-be-trained model to the data instead of collecting the data in a central location has another major advantage: the high-dimensional, storage-intense medical data does not have to be duplicated from local institutions in a centralised pool and duplicated again by every user that uses this data for local model training.
I am concluding this episode on the note that ML, and particularly DL, has led to a wide range of innovations in the area of digital healthcare. As all ML methods benefit greatly from the ability to access data that approximates the true global distribution, FL is a promising approach to obtain powerful, accurate, safe, robust and unbiased models.
All in all, a successful implementation of FL will represent a shift from centralised data warehouses or lakes, with a significant impact on the various stakeholders in the healthcare domain. One important thing to remember is that the medical FL use-case is fundamentally different from other domains!
To be continued ……
