2022-02-04

** 049_ Replication study: Kernel Machine Regression Analysis of Networks considering the longitudinal course of phenotypes**

**Research Question and Aims**

Pathway analysis is a specific approach to simultaneously analyze a set of genes or a pathway for association with a phenotype of interest. We perform pathway analysis by applying the kernel machine regression (KMR) with a specific kernel, the network kernel (Freytag et al., 2013) and for comparison the common linear kernel (for more details on KMR and kernel see below). The KMR was first developed for cross-sectional studies, but we and others extended it to longitudinal data. We implemented the longitudinal KMR as a new R package "KMRLoPa" which we aim to investigate on simulated longitudinal data as well as on PsyCourse data. For the latter we will focus on our previous investigation of the longitudinal course of executive functions (EFs) (Wendel et al. 2021), where we identified a LD block of nine SNPs associated with the change over time in set-shifting. The pathway analysis using the FUMA pipeline (MAGMA), which uses GWAS summary statistics, was unsuccessful. Here we will investigate if we gain even more information for pathway analysis when directly studying all available measurement points and using the raw genotype data to test whether specific gene sets/pathways are associated with the longitudinal executive performance.

**Analytic Plan**

First, we will explain the longitudinal kernel machine regression (KMR) and the network kernel. Then we will present our analysis plan example to study the longitudinal-KMR, including the simulation studies as well as the application.

A kernel machine regression is a semi-parametric regression including a covariate matrix of fixed effects (e.g. age, gender, principal components) and a non-parametric function integrating the genetic data (Schaid, 2010). The latter can be interpreted as random effects due to the estimation equivalence of KMR with linear mixed models (LMMs). We utilize this equivalence to extend the KMR to longitudinal data integrating additional random effects into the regression to correct for the dependence of phenotype measurements at different time points. The genetic data are integrated by a non-parametric function, which is most often unknown or computationally expensive. Here, instead of computing a highly complex function, we calculate a kernel matrix, a matrix comprising similarity assessments. This kernel matrix contains for each pair of individuals a scalar which describes how similar the pair is regarding their genotypes (SNPs) (assumption: N= number of individuals, then we have a NxN kernel matrix). Thus, we transform high-dimensional genotype data in a similarity value (scalar, low-dimensional data). The computation of the similarity values is very flexible. The kernel matrix only needs to be symmetric and semi-positive definite. We use two different kernels, the commonly applied linear kernel where the genotype matrix is multiplied with its transposed and the network kernel (Freytag et al., 2013). The network kernel developed by our group is more complex as it includes the genotype data of the individuals and additional information of the pathway analyzed. The latter is gained from pathway databases, here we use the Reactome database (https://reactome.org/). The information is included in form of two specific matrices, the annotation matrix (assigning SNPs to genes) and the adjacency matrix. The latter is a matrix displaying if two genes of the pathway are interacting with each other (entry=1) or not (=0). We multiply the matrices to obtain the final kernel matrix which is then tested for association by applying a variance-component test.
We will study properties (type-1 error, power) of the longitudinal-KMR using the network kernel and the linear kernel via simulations. As an exemplary pathway we selected the "Signaling by ERBB4" pathway from the Reactome database, as this pathway is highly connected and has key genes. The original topology is used as well as artificial changes to investigate e.g. less connected pathways. We expect that the network compared to the linear kernel will gain with higher connectivity.

Furthermore, we want to perform a pathway analysis to test different pathways for association with core executive functions (EFs). As in our previous GWAS (Wendel et al. 2021) we yielded nice results with the Trail-Making-Test, Part B (TMT-B) in PsyCourse, we will focus on this phenotype. Only if those results are not rewarding for our application in the context of discussing our new approach we will consider the Verbal Digit Span Backwards (VDS-B) as well. We selected roughly 20 pathways of the Reactome database based on the following keywords: serotonin, dopamine, gaba, glutamate, NMDA receptor, prefrontal cortex, synapse, plasticity and voltage gated potassium channels.
We will use the latest PsyCourse sample and genotype data version. We include all genotyped patients and healthy control individuals in which the TMT-B phenotype is assessed at least once. Our model is similar to the LMM used in the GWAS (Wendel et al. 2021), as we use log TMT-B or VDS-B as outcomes, add random intercepts and slopes to model the subject-specific time courses. Similar as before we expect to include age, sex, time, DSM-IV diagnoses and the top five-ancestry principal components as fixed effects. The SNPs are included by the kernel matrix applying the network kernel (Freytag et al. 2013) and the linear kernel in comparison.

**Resources needed**

**Recruitment data:**

Participant identity column v1/v2/v3/v4_id

Clinical/Control Status v1_stat

Data of interview v1/v2/v3/v4_interv_date

Recruitment center v1_center

**Demographic information:**

Sex v1_sex

Age (at first interview) v1/v2/v3/v4_age

Marital status v1/v2/v3/v4_martial_stat

Relationship status v1/v2/v3/v4_partner

Children v1_no_bio_chld

v1_no_adpt_chld

v1_stp_chld

Siblings v1_brothers

v1_sisters

v1_hfl_brthrs

v1_hlf_sstrs

v1_stp_brthrs

v1_stp_sstrs

Living alone v1/v2/v3_liv_aln

Education v1_ed_status

Employment v1_curr_paid_empl

**Psychiatric history:**

Current psychiatric treatment v1/v2/v3/v4_cur_psy_trm

Times treated as day-or inpatient v1_cat_daypat_outpat_trm

**Medication:**

Clinical participants v1/v2/v3/v4_Antidepressants

v1/v2/v3/v4_Antipsychotics

v1/v2/v3/v4_Mood_stabilizers

v1/v2/v3/v4_Tranquilizers

v1/v2/v3/v4_Other_psychiatric

Control participants v1/v2/v3/v4_Antidepressants

v1/v2/v3/v4_Antipsychotics

v1/v2/v3/v4_Mood_stabilizers

v1/v2/v3/v4_Tranquilizers

v1/v2/v3/v4_Other_psychiatric

Family history of psychiatric illness v1_fam_hist

**Substance abuse:**

Tobacco v1/v2/v3/v4_no_cog

Alcohol v1/v2/v3/v4_lftm_alc_dep

Illicit drugs v1/v2/v3/v4_evr_ill_drg

DSM-IV Diagnosis: schizophrenia (295.1/.2/.3/.6/.9)

schizophreniform disorder (295.4)

brief psychotic disorder (298.8)

schizoaffective disorder (295.7)

bipolar disorder (296.X [bipolar disorders incl. manic episode]) v1_scid_dsm_dx, v1_scid_dsm_dx_cat

**Symptom rating scales:**

PANSS Positive sum score v1/v2/v3/v4_panss_sum_pos

PANSS Negative sum score v1/v2/v3/v4_panss_sum_neg

PANSS Total score v1/v2/v3/v4_panss_sum_tot

IDS-C30 Total score v1/v2/v3/v4_idsc_sum

YMRS v1/v2/v3/v4_ymrs_sum

CGI v1/v2/v3/v4_cgi_s

GAF v1/v2/v3/v4_gaf

**Neuropsychology (cognitive tests):**

Trail-Making-Test v1/v2/v3/v4_nrpsy_TMT_A_rt

v1/v2/v3/v4_nrpsy_TMT_A_err,

v1/v2/v3/v4_nrpsy_TMT_B_rt

v1/v2/v3/v4_nrpsy_TMT_B_err

Verbal Digit span v1/v2/v3/v4_nrpsy_dgt_sp_frw

v1/v2/v3/v4_nrpsy_dgt_sp_bck

**GSA Chip analysis IDs gsa_id**

Imputed GSA Chip analysis IDs gsa_imp_id

**Imputed data:**

Trail-Making-Test nrpsy_TMT_A_rt

nrpsy_TMT_A_err

nrpsy_TMT_B_rt

nrpsy_TMT_B_err (long format)

Verbal Digit span nrpsy_dgt_sp_frw

nrpsy_dgt_sp_back (long format)

**Genotype Data: **

GSA Chip data