004_ Investigating the association of age at onset in schizophrenia patients with genetic/polygenic risk factors
Research Question and Aims
The aim of this study is to investigate the association of genetic markers and/or polygenic risk scores with age at onset in schizophrenia/schizoaffective patients within the PsyCourse cohort. PsyCourse would be one of the cohorts included in the general study.
Addendum: In supplementary analyses, we also would like to estimate the influence of copy number variants (CNVs) on the age at onset in schizophrenia / schizoaffective patients in PsyCourse.
We hypothesize that genetic/polygenic risk factors will have a significant influence on the age at onset of schizophrenia/schizoaffective patients.
Data from all schizophrenia/schizoaffective patients in PsyCourse who have genotype data available will be included in this study.
Data from the CIBERSAM (Biomedical Research Network in Mental Health) cohort, and the GAIN and nonGAIN datasets (obtained from the dbGaP repository, accession numbers phs000021.v3.p2 and phs000167.v1.p1, respectively) will be integrated with the PsyCourse cohort to perform a case-only GWAS for AAO of SCZ, which could help improving the understanding in the genetic factors that influence disease onset and the mechanisms underlying SCZ. Quality control steps will be performed following the standard protocols and corrections for gender will be applied whenever necessary.
SNP-based heritability estimates
The proportion of phenotypic variance explained by SNPs will be estimated using different methods, specifically by the Linkage Disequilibrium Score Regression (LDSC) method, the High-Definition Likelihood (HDL) method and the Genome-based Restricted Maximum Likelihood (GREML) approach implemented in the Genome-wide Complex Trait Analysis (GCTA) tool , adjusting for sex, the dataset and the top 10 principal components.
Polygenic risk scores:
GWAS summary statistics for schizophrenia will be obtained from the latest schizophrenia GWAS published to date (Pardinas et al., 2018).SCZ PRS will be calculated based on summary statistics from the discovery dataset excluding rare SNPs (MAF < 5%), low quality imputed variants (info score <90%), indels, ambiguous markers (A/T and C/G), and SNPs in the extended major histocompatibility complex region (chromosome 6: 25-34 Mbp). Data will be clumped in windows of 500 kbp, discarding variants in LD (R2>.1) with another more significant marker. Polygenic risk scores will be standardized. Scores will be calculated based on p value thresholds ranging from p < 5 x 10-8 to p < 1.
Copy Number Variants:
The CNVs will be estimated using the fluorescent signal from raw genotyping data and bioinformatic tools like PennCNV (https://github.com/WGLab/PennCNV).
At each time point models will be built adjusting for sex, ancestry components, recruitment site, to determine the association of PRS or single genetic variants of special interest on age at onset.
gsa .idat files for the PsyCourse samples fulfilling schizophrenia or schizoaffective diagnoses.
Sociodemographic variables and age at onset information. Genetic data (preferably imputed).