Clinical trials in the pharmaceutical industry have always its challenges, as providing quality data ethically and fairly for the submission is difficult. Clinical Data Management (CDM) team reviews multiple numbers of reports on a daily, weekly, and monthly basis or at various frequencies to produce quality data. A single step of failure can affect the entire effort of a clinical trial. So, it is an important stage in clinical research which involves generating high-quality, reliable, and statistically sound data from clinical trials. Data management team provides the specification document which clearly mentions about the forms to be referenced, fields to be included, records satisfying specific conditions to be flagged, purpose etc based on study protocol and case report form.
The main objective of these listings is to pull up the discrepancies in the raw data. Once these listings are created, the data management team looks over the discrepancies and route the identified issues and rectifies them with the help of associated team or clinical monitors.
The data monitoring listings are rerun at specific intervals of time until the DB lock. There are different types of listings- DM
- Coding
- MM
- Recon
Each of these gets its name based on the reviewers. The listings reviewed by data management team are called DM listings and those reviewed by medical monitoring team is called MM listings. Coding listings are reviewed by the medical coders. The next type of listing is Reconciliation or Recon listing. Here we express the ideas from a clinical programmer’s perspective.
DM ListingEvery study is different with its own objectives and maintaining and validating the Data Quality becomes of utmost importance to the Data Managers. DM Listings are reports generated on the raw datasets and assist in the process of data validation.
The DVS – Data Validation Specification which describes how many listings are needed, purpose of each listing, fields to be presented in the outcome, which raw data sets to be used, what all discrepancies to be flagged etc. Majority of the DM listings have a final column in output which is specifically for flagging discrepancies identified in each listing.
The CP is responsible for generating the listings. Initially the program is tested against a dummy data which contains discrepancies and is then passed to DM team for review. If all discrepancies are populated as expected, the listing passes the review, and the program is run against the raw data sets and actual discrepancies identified.
The discrepancies can be of any type. It can be a mismatch in date variable, changes in data collection from CRF, violation of protocol-specific conditions, duplicate records, overlapping dates, inconsistency of subject’s record over multiple datasets. These listing programs are rerun at specific intervals of time, the discrepancies identified will be resolved by investigators. And this process is continued till the dataset contains only a minimal number of discrepancies. Some examples of DM listings with such discrepancies flagged are shared below.
- Duplicate record
In adverse event form, there might be cases where a subject has multiple records with the same AETERM and same Start date and End date. These records are identified as a discrepancy and should be flagged as duplicate records.
Here the AETERM “Headache” has the same start date and end date for three records, meaning Duplicate records which can be consolidated as one. Similar cases might be possible in Medical History and Concomitant Medication forms.
- Overlapping record
For a subject, If the AE Start date or AE End date for an AETERM falls in between the AE start date and AE End date of the same AETERM, then these records should be flagged as overlapping records.
In the above example for subject ABC-X01-06, the three records with AETERM Nausea overlap with others. First Nausea is recorded with start date 20 Feb 2018 and end date 24 Feb 2018. Second is again reported with Start date 23 Feb 2018 and end date 03 Mar 2018. But the start date 23 Feb 2018 falls in between the first reported start date and end date. The third record with start date 27 Feb 2018 and end date 03 Mar 2018 overlaps with second record. So, a single record with AETERM as Nausea having start date 20 Feb 2018 and end date 03 Mar 2018 would suffice here instead of these three records. So, overlapping records can be summarised into a single record by considering earliest date among the AE start date and latest date among the AE End date.
- Violation of protocol-specific conditions
If a study is designed for an age group between 18 and 45, then the inclusion criteria specify that the age of the subject should be between 18 and 45 years (both inclusive). Then a listing should be included in DVS to identify any subject violating this inclusion criterion. A subject having age <18 and age >45 years should be flagged and investigated as to why such violations happened.
This listing can help the DM team to identify that there is a subject violating the protocol specified criteria, which is the subject age should be between 18 and 45 years. Any subject with this discrepancy needs to be investigated more.
- Inconsistency of subject’s record over multiple datasets
Consider a subject who has a medical history which is ongoing, and medication is currently being taken. But in the concomitant medication form, there are no medication corresponding to the indication being recorded. Then such records should be flagged, and further investigation should be done to identify why such discrepancy happened.
Similar discrepancies are possible over Adverse Event and Concomitant medication forms as well.Here both subjects have different medical history, and it is denoted that both are taking medication for the same. But no medication is recorded corresponding to the medical history in the concomitant medication form. It is necessary to identify why such inconsistency of records happens between different datasets.
The above-mentioned examples are just a few among the number of discrepancies that can be identified. These listings and discrepancies are common in almost all studies. Once the discrepancies are identified, the respective team will investigate about the issue and resolve them. The DM listings are rerun at specific intervals of time until the DB lock. The objective is to get quality data with minimum number of issues thus leading to Data Quality and proper analysis.
Coding listingsAcross clinical studies Adverse event and Concomitant Medications are common data collected from the subjects where medical coding of data becomes a priority for standardization. Suppose if it is a multi-centre study, different investigators from different sites are involved and it is not necessary that all the medical terms are recorded uniformly and thus to analyse the data properly, the medical team uses the two most common dictionaries MedDRA and WHO-DD to perform coding.
In the category of coding listings, we classify all listing where standardized dictionaries are used. It is necessary to the specify the version of the dictionary within the listing. Mostly in this category we have AE and CM listings.
As per the information from the specification shared by the Data Managers, CP will program these listings. From this, we can infer two important information as below. 1. Whether the coding is Direct/Manual Direct Coding: The verbatim term recorded by the investigator, or any responsible person exactly matches with the term in the medical dictionary. Manual Coding: Sometimes all term does not have an exact match with the medical dictionary. In such situations, the medical term needs to be manually coded by the medical coder.Now a day’s most of the data are created using electronic data capture and thus most of terms are auto coded. But in certain situations, the terms need to be manually coded. Challenges can happen when multiple signs and symptoms might be reported. In such scenarios, the coder needs to confirm the actual diagnosis with the site investigator. Otherwise, it can be a self-evident spelling issue which leads to creating a difference in the verbatim and preferred terms. So, flagging the column as Direct/Manual may help the reviewer to take special care on the manually coded terms.
For creating CM coding listing, we use the WHO-DD dictionary. The medical coder uses the standardized dictionaries by coding the verbatim term “CMTERM” labeled as Medication. If both verbatim terms and the preferred term are same, then we will code it as Direct or if the terms are different then it is grouped as Manual.
For example, for the 1st record both the Medication term and preferred term is “CANDESARTAN CILEXETIL” So the variable “Direct_Manual” is populated as Direct. For the 2nd record, Medication =” Magmitt Tab” and Preferred term =” MAGNESIUM OXIDE”. Here Medication term is different from the preferred term. So, the MM team needs to be coded manually.In AE coding listing, we use MedDRA dictionary. The medical coder uses the standardized dictionaries by coding the verbatim term “AETERM” labelled as Verbatim. If the verbatim term is similar to preferred term or lowest level term, then we add it as Direct otherwise as Manual.
In the table above Verbatim term and Preferred Term/Lowest_level is Encephalopathy. So, we termed it as “Direct” under the variable Direct_Manual. In the 2nd row, verbatim term is “INTERMITTENT NAUSEA” but preferred term / lowest level is” Nausea”. Since both Verbatim term and Preferred/Lowest level are different, we will flag it as Manual under the variable Direct_Manual.
2. Summary details of coded and un-coded termsFor maintaining proper data Mapping and analysis we need all the verbatim terms to be coded. It will be helpful thus for the medical team/data monitors to identify the number of un-coded terms and update it until no further terms need to be coded. Summary reports thus help in such scenarios.
In the above summary report, we will be able to identify how many terms are coded automatically (Direct) and manually. Also, we have an idea on how many terms are coded and un-coded. The medical coder who is responsible for the study will then code every pending term in the listing. This process continues until every un-coded term gets coded.
Medical Monitoring listingMedical monitoring is an inevitable part of the clinical research process. They are responsible for giving medical expertise from the initial stage of the study. MM teams need a regular report to check about the safety and clinical integrity of the subjects. They are usually the raw datasets being represented in a simple listing report format making it more reviewable for the medical reviewers. Sometimes minor tweaks are also made on the data to be reviewed as requested in the specification.
Depending upon the study, medical monitoring teams need to check for any specific conditions/discrepancies in the data. These conditions will be specified in the DVS and as per the specified criterions, Clinical programmers will program and create the output. These outputs will be reviewed by the medical authorities/medical monitors to ensure that the study is going as expected.
For a study, if the medical monitoring team needs to know the information on percentage change of lab test values from the baseline value, this will be included as a check in the specification. Here we will use the lab data to create the listing and will find the percentage change of each of lab test depending upon the different lab category. If MM team requests, we will also flag the values where the percentage change of any particular test has gone beyond a specific value.
There are many challenges in tasks performed by the DM team, but the ultimate challenge is to reconcile the Vendor data (Lab data, SAE Data) with the related data present in the clinical database. A manual reconciliation may cause errors of overlooking the data.
The main objective of Data Management (DM) is to deliver a qualitative database to SAS Programming, Statistical Analysis teams in a timely manner in turn helps to generate bug-free reports. The ultimate challenge is managing the third-party vendor data, which loads into the database, and our aim is to reconcile this Vendor data with the related data present in our database. To find out the optimized process in such a way that avoids lot of manual effort, the various challenges, efficient techniques are discussed further on Lab data, SAE Data Reconciliations.
There are two different types of databases:- Clinical Database – It is a relational database that allows entry of all data captured on the electronic CRF or a paper CRF. E.g., Oracle Clinical, Inform, and Rave etc.
- Vendor Database – An external database, where the collected clinical data is entered and collected samples are tested. We could name it as Vendor database or Third-party database.
- Flow of Lab Data between the Clinical Database and Lab Vendor Database
- Site Identifier
- Subject Identifier
- Gender
- Date of birth
- Visits for lab sample collected
- Lab test name (Optional)
- Date of lab sample
- Time of lab sample (Optional)
- Incorrect data loaded for subjects
- Mismatch in dates entered in Vendor and Clinical Databases
- Mismatch in time entered in Vendor and Clinical Databases
- Visits incorrectly loaded
- Visit dates collection mismatches when data is collected in 24-hour format
- Data collected for screen failures
Let’s consider an example to understand lab reconciliation. Merging the clinical and vendor databases with match on demographics and procedural data. Display 1 shows SAS output of lab data reconciliation, which indicates the mismatch – Where the subject 10011 has a mismatch in ‘DOB’ and ‘Visit’ between lab data reported from clinical database and vendor database.
Display 1. SAS output for Lab Data Reconciliation
SERIOUS ADVERSE EVENT (SAE) RECONCILIATIONSAE data reconciliation is the process of reconciling the clinical database (i.e., Data collected on the CRF) with the safety database (i.e., SAE forms) to ensure the data is consistent and not contradictory. Safety data reconciliation must be performed for every clinical study to ensure completeness and consistency of the safety information.
A serious adverse event (experience) or reaction is any untoward medical occurrence that at any dose:- Results in death
- Is life-threatening
- Requires inpatient hospitalization or prolongation of existing hospitalization
- Results in persistent or significant disability/incapacity
- Is a congenital anomaly/birth defect
- Cases found in the SAE system but not in the clinical database system
- Cases found in the clinical database system but not in the SAE system
- Deaths reported in one but not the other, perhaps because of updates to the SAE report
- Cases where the basic data matched up but where there are differences, such as in onset date
Of these fields, some will require a one-to-one match with no exception, while some may be deemed as acceptable discrepancies based on logical match. The fields that require an exact match or logical determination are listed in the below table.
Case report forms are used for clinical database and Adverse Event monitoring forms are used for safety database and database are being created by entering data into these forms respectively.
The data reported in clinical database (For example, Oracle Clinical) data is collected to meet the requirements of a specific protocol while the safety database (For example, ARGUS) is collected to meet regulatory reporting requirements. The CRF forms could vary depending on the disease under study even when working on the same family of drugs. It is important to define how the data will be reviewed prior to the reconciliation in order to have guidelines consistent across the entire study.
When preparing for the data extractions, it is important to map the data fields in the safety database that mirror those data fields in clinical database to ensure an accurate comparison of the data. This can best be accomplished by reviewing the actual CRF page in the clinical database to establish the appropriate field name and by data handling guidelines for the study under review.
Here we refer safety database as ARGUS and Clinical database as Oracle Clinical. In Table the data points collected against Argus and Oracle Clinical are listed.Table 1. Comparison of Data Fields using Argus and Oracle Clinical
One Response
I enjoyed the article. It was very informative.