Enabling precision medicine cohort studies at the community level

Enabling precision medicine cohort studies at the community level

By Harsha Rajasimha, Ph.D., Founder and CEO

The precision medicine paradigm is all about targeting medicines to patient cohorts that are most likely to benefit from them. Since the one-size-fits-all model has been shown not to work, the healthcare and life sciences community now needs a community-based approach to systematically gather and analyze real-world health information from study participants. With the increasing digital empowerment of consumers, patients are eager to engage in clinical research leading to faster and better diagnostics and therapeutics.

President Obama’s precision medicine initiative, the AllofUs program is preparing to start patient recruitment into its ambitious 10-year longitudinal study of a million American residents. More recently, Regeneron and Global Gene Corp announced a similar precision medicine program from the private sector involving genome sequencing and study of a large cohort of Indian residents, the largest yet in the subcontinent. These are indications of many more precision medicine cohort studies to come.

Longitudinal precision medicine cohort programs of this scale and nature cannot solely rely on traditional clinical trial sites for timely patient recruitment, engagement, and retention. Digital health technologies such as mobile apps, electronic consents, surveys, patient-reported outcomes, and wearables can play a critical enabling role in such studies. Since these are longer-term longitudinal studies, the cost savings on labor and manual work is enormous. In fact, community-level precision medicine initiatives are not even possible without these technologies being put together in a cohesive solution.

Another complexity to address in these studies is the scale and types of data that must be aggregated and integrated for analytics and insights. The ROI of these massive programs lies in the insights we can draw from the resulting data–the potential translational research and development that could be enabled with the resulting integrated datasets. Unfortunately, existing open-source or commercial frameworks available for translational research such as Transmart i2b2, OHDSI, Oracle TRC and the standard data models such as OMOP, I2B2, etc. have their own shortcomings. This remains a big unmet need.

Once the data aggregation and integration are complete, the resulting integrated longitudinal precision medicine cohort data warehouse, or data lake, becomes a gold mine for data analytics and insights for discovery or translational research. Various AI and cognitive computing algorithms and approaches could be leveraged at this stage for the target use cases and research questions.

Finally, results that are clinically relevant should be returned to individual participants. What do the set of genetic mutations identified in the individual’s genome mean clinically? What new clinical diagnoses or treatment options might be relevant to each participant? Should raw data be returned or only clinically actionable insights? Any incidental findings that need to be reported? etc. These questions need to guide the study design, informed consent, data collection and analysis – all phases of the study.

A precision medicine cohort study has the following phases and components:

  1. Goals and specific objectives of the longitudinal study. This includes criteria for patient enrollment, specific disease versus normal, geographic, genetic, environmental exposures, and other factors. It will be important to define the target outcomes of the study at this phase. Define whether the goal is to measure specific outcomes, environmental exposures, lifestyle information, response to existing or potential new drugs, or enabling translational research. A qualified clinical investigator has to define the protocols, health measures and other information to be gathered.
  2. Participant selection and recruitment:A critical and most time-consuming aspect of any cohort study is selecting the right participants in the right numbers and recruiting them into the study. Participants should be properly informed about the study protocol, what is expected of them, what they can (or cannot) expect from the study. Typically, a coordination center can engage multiple clinical sites for participant recruitment or employ digital technologies such as mobile apps and websites to directly recruit participants into the coordination center. A government hospital, a non-profit or an academic institution would be best suited to serve as a coordination center. If no other options are available, selected hospitals in the geography of interest could serve as patient recruitment sites.
  3. Sample Collection and Biobanking: Based on the clinical study protocol, appropriate biospecimen have to be collected from each participant, properly labeled and stored in a biobank. Each sample has to be characterized at the molecular and biochemical levels to capture all relevant biomarkers for the study.
  4. Aggregation of patient health data, genomics, lifestyle, exposures, and reported outcomes: Throughout the duration of the study, relevant health information and patient-reported outcomes have to be gathered from each patient and aggregated in a centralized location.
  5. Data integration and contextualization:Integration of all aggregated study data is necessary to make it useful for analytics and interpretation. This will require a suitable platform that supports relevant data models and translational research.
  6. Analytics and Insights:Systematically collected patient cohort data is a goldmine for data mining, analysis, insights and discovery research. This is where the quality of the data becomes highlighted – if the right data is collected with high quality and accuracy, proper statistical analysis and tools can enable discovery of disease mechanisms, novel biomarkers, drug targets, or drug candidates for repurposing.
  7. Data Sharing, Security and Privacy:Proper data sharing procedures should be defined in-line with relevant laws, regulations, study protocols, participant consents, and guided by the long-term benefits to patients.
  8. Study Closeout:While data analysis is an ongoing process and novel algorithms may be brought to this data over time, the study should closeout at the end of defined terms and participants’ milestones.

Patient groups and foundations such as Global Genes, the National Organization for Rare Disorders, or other disease-specific patient groups can play a key role in enabling such studies at a community level or national/international scale.


  1. http://knowledge.wharton.upenn.edu/article/precision-medicine-new-paradigms-risks-opportunities/
  2. https://healthitanalytics.com/features/what-are-precision-medicine-and-personalized-medicine