A Personal Health Train implementation
The Medical Data Works Railway service implements a privacy preserving federated infrastructure as proposed
by the Personal Health Train. The main principle of the Personal Health Train is to bring questions to the data rather than moving data.
This concept is called Federated Learning.
In this section, the governance of projects based on Railway is discussed.
The Personal Health Train metaphor defines the following components:
- Station: An information system that contains sensitive data, such as from patients.
Typically a hospital or registry provides a Station.
- Train: Software that encodes a question to be asked to a Station and outputs anonymous, statistical and optionally encrypted data.
Typically a university or a company provides a Train.
- Track: Software and infrastructure that allow Trains to enter Stations and ask questions.
Medical Data Works B.V. provides the Track.
Below is a video that explains the Personal Health Train.
Personal Health Train on Vimeo.
Medical Data Works has setup a governance framework for projects using Railway with the following principles:
- Lawfulness, fairness and transparency:
A project (aka consortium or collaboration) agreement must be agreed upon by each project using Railway.
This project agreement is between the organizations that provide Stations and/or Trains.
A template of such an agreement is provided below.
The project agreement has a number of elements including
Note that Medical Data Works is typically not a party in such an agreement.
- A list of organizations involved in the project and who the leading organization is
- A list of Stations and a description of the data elements that they contain
- A description of the Trains being deployed in the project
- How GDPR or other regulations will be adhered to - see this section for GDPR specific aspects
- Intellectual property, results ownership, revenue sharing, publication, and authorship matters
The project agreement ensures that only projects are conducted on Railway that all organizations have agreed on.
- Accountability: Each Station, Track and Train providing organization is responsible and accountable for its own activities.
- Purpose limitation: In the project agreement the goals of the project should be clearly described. The Track user agreement demands that
the Track is only used for specified, explicit and legitimate purposes.
- Data minimization: In the project agreement the data elements needed for the project should be clearly described and
proportional to the project goals. It is only these data elements that should be places in a Station.
- Integrity & confidentiality: The Personal Health Train is a privacy-by-design infrastructure.
Data are only processed in the Station which is inside the IT environment of the Station provider (e.g. hospital) under
the governance and with appropriate security measures as defined by the local IT department. Station data is typically a de-identified export
of data from clinical information systems. Accidental loss of Station data can be corrected by re-exporting the data.
As the Track provider, Medical Data Works
applies a user/role based authorization combined with industry standard identification and authentication to perform tasks on the Track.
Documentation on information security of Railway can be found here:
- Digital sovereignty & data control: A Station or Train provider may choose their own digital solution
as long as these choices do not harm interoperability between Stations and Trains. This is a matter for the project partners to decide on.
Medical Data Works takes no position in these matters.
At all times will a Station be in control of their own data. The Station provider has to actively starts its node software
that allows a Train to access its data and it stop its Station or
deny or revoke access to its Station at any moment and for any reason.
- Open source, open access, open standards: Medical Data Works uses the open source
and is a member of its open source community. Any addition, improvement or modification that Medical Data Works does in its projects
is made open source using the same license as Vantage6.
Medical Data Works provides all its legal templates and publications open access under a
CC0 1.0 Universal license
Medical Data Works is committed to support open standards including open health data standards such as FHIR, DICOM and openEHR,
Semantic Web standards (RDF, RDFS, OWL, SHACL) and subscribes to the FAIR principles.
- Neutrality: Medical Data Works offers Railway as a supported open-source service with no claim on any intellectual property
or other results that are developed using its infrastructure. It does not mandate the use of specific data formats, programming languages
or mathematical methods by Trains or Stations and they can be proprietary. Users of Railway can come from any part of the world and be
commercial or non-commercial.
The Personal Health Train is based on Federated Learning where the question travels to the data, rather than data moving.
Thus no personal data is shared between organizations. Only statistical, anonymous and optionally encrypted data is shared via the Track.
All data processing takes place in the Station. As such it is a privacy-by-design infrastructure
assuming that the Train providers can be trusted which is ensured by legal and technical governance measures.
As data processing on personal data is taking place in any Personal Health Train project, the GDPR roles of (joint)controller
and processor needs to be defined and proper agreements between joint controller and/or controller and processor need to reached.
Under GDPR, a controller is the legal entity that
"determines the purposes for which and the means by which personal data is processed.
So, if your company/organisation decides 'why' and 'how' the personal data should be processed it is the data controller.".
In the case of the Personal Health Train, the Train provider is a controller as it is the legal entity
asking the question ('why') and supplies the software ('how').
However, in Personal Health Train projects, this leads to the peculiar situation that the controller (typically a university or a company)
does not have or control (in the common sense) any data. The Train provider has no access to data in the Station
nor does it control how it gets into the Station. Because of this, the Train provider cannot fulfill some of the responsibilities
that the GDPR puts on it.
The Station providers have a choice which GDPR role they assume. They have two options:
- Processor: If the Station provider is not involved in defining the Train (i.e. the question being posed to the data)
and are "simply" a data provider,
then a role as processor seems in order. In the data processing agreement the Station provider needs to take on some responsibilities
of the controller / Train provider (see above). Specifically it should be recognized that
- The Data Controller has not only no access to the identity of the Data Processor's patients
but also no access to the personal data of the Data Processor's patients.
- The Data Controller needs to rely on the Data Processor in order to be able,
by means of appropriate technical and organizational measures,
to fulfil the obligation imposed to the Data Controller under Applicable Laws.
- The Data Processor shall therefore respond to requests from Data Processor's patients (“Data Subjects”)
pursuant to Applicable Laws such as
- right of access
- right of rectification
- right to erasure
- right to restrict the processing
- right to data portability
- right to object
- Joint controller: If the Station provider is involved in defining the Train then a role as controller seems in order.
The Station provider then becomes a joint controller with the Train provider.
In the joint controller agreement the responsibilities of each party needs to be defined, with each controller being responsible
for their own part in the processing.
The above GDPR aspects are implemented in the template project agreement given below.
Track Provider does not have a GDPR role
As stated above, the main principle of the Personal Health Train / Federated Learning is to bring questions to the data rather than moving data.
The role of the Track is thus not
to move data but to bring questions (Trains) to the data (Stations).
As such, no patient/personal data shall be processed on the Track. As a consequence, the Track provider (in Railway this is Medical Data Works)
has no GDPR role, it is neither a controller nor a processor.
The only data that is shared via the track is statistical data,
the answer to the reseach question, which is anonymous data and falls out of the scope of GDPR.
There are a number of legal and technical measures to ensure that no patient/personal data is shared via the Track:
- All providers of Stations and/or Trains using the Track have to sign an Infrastructure User Agreement which explicitly
forbids the use of the Track to share patient/personal data.
- The Infrastructure User Agreement dictates that any project in which the Station or Train provider participates, is based on a
project agreement which defines what questions are being asked / Trains are deployed and which organizations are involved
in which role (Station/Train/both).
- Medical Data Works will create a project-specific Track of which only the organizations of the project are members.
It is technically impossible to deploy Trains to the wrong Stations.
- On the direction of the Train provider, Medical Data Works will create users which have the authorization to create Trains.
- A Station providers actively allows the Train to enter its Station, in other words this is a pull rather than push system.
The action for the Train to enter and process data in the Station is one that the Station provider actively takes.
At any point and for whatever reason the Station provider can disconnect itself from the Track and reject or remove the Train.
process patient/personal data and for anything else then the project.
Note that as a consequence of this position, Medical Data Works cannot and will refuse to sign a data processing or similar agreement
as it does not process data.
Any sharing of patient/personal data on the Track constitutes a data breach
describe this as so. At any time Medical Data Works can shut down the Track if it knows or suspects that patient/personal data is being shared.
The above describes the normal operation of the Track. There are hypothetical scenarios in which the Track is used against its legally
binding terms by internal staff at the Station, Track or Train provider.
This is a breach of contract which has legal consequences. Medical Data Works also has a number of technical measures to prevent this from happening.
There are additional hypothetical scenarios in which the Station, Track or Train is breached by an external agent and Medical Data Works
has again a number of technical measures to prevent this from happening.
These hypothetical scenarios and its counteractions/mitigation can
be found in the Security documentation
Process to implement governance
Below is a typical process for setting up the governance of a Personal Health Train project.
- Write research protocol: Define the research question and data elements needed to answer that research question.
- Write project agreement: Define the lead and other organizations.
Define the role of each party (e.g. which organizations will provide Stations, which organizations will provide Trains).
For organizations subject to GDPR: Define if the party is a (joint) controller or a data processor.
Joint controller agreements and data processing agreements are annexes to the project agreement.
Typically the research protocol is an annex to this agreement.
Optionally financial, intellectual property, authorship etc. clauses are part of this agreement.
Note that typically Medical Data Works is not a party in the project agreement.
- Review and sign project agreement: The project agreement typically requires legal review and approval before being signed by all organizations.
- Get approval for research protocol: The research protocol typically needs to be reviewed and approved
by internal review boards at organizations providing Stations and (if required) organizations providing Trains.
- For organizations subject to GDPR: Complete a Data Protection Impact Assessment (example below).
- Review and sign infrastructure use agreement:
- Get approval from IT: A Personal Health Train project requires the installation of software.
Typically IT departments have a process to review and approve such external software (e.g. cybersecurity) before it can be installed.
Legal agreements are needed for the use of Railway. In the below figures the required
agreements are shown in two common cases.
In the top figure, Medical Data Works is a member of the consortium.
In that case the following agreements are needed:
- Consortium or Project Agreement between all members of the consortium which describes
the project for which Railway will be used.
- Joint Controllor Agreement or Data Processing Agreement between Train Providers
and Station Providers. This can also be an annex to the Consortium Agreement.
- Bilateral Infrastructure User Agreement between Medical Data Works
and each member of the consortium that wishes to use the Railway infrastructure
as a Train provider or as a Station provider.
No fee or cost is involved in the Infrastructure User Agreement.
In the bottom figure, Medical Data Works is not
a member of the consortium.
In that case an additional Infrastructure Service Agreement
is needed between Medical Data Works and the lead party in the consortium.
A fee for Railway is charged to the lead party.