Main

Approximately 10–15% of patients with Parkinson’s disease (PD) are classified as ‘familial cases’, a designation conventionally restricted to patients known to have a first degree relative also affected by the disorder8,9. The identification of causal rare variants in familial PD has contributed enormously to our current understanding of the disease, ultimately yielding drug targets, biomarkers and key insights into disease mechanisms1,10,11,12. Currently, seven genes have been classified as definitely associated with familial PD2,13. The discovery of these familial PD genes has been achieved through various family-based study designs, including linkage analysis, homozygosity mapping and segregation filtering1,10,12. However, in many cases, conventional family-based study designs are insufficient to identify causal variants due to a combination of genetic heterogeneity across families, reduced penetrance and limited sample size within families. Rare variant association testing methods, such as gene burden analysis, provide an alternative strategy for discovering rare genetic risk factors14. Instead of leveraging familial structure, these methods leverage case–control differences in cumulative rare variant frequencies. This approach has recently been used to study rare genetic variation in idiopathic PD15, but thus far, targeted analyses of familial PD cases have not been performed. In previous work, we demonstrated substantial power gains for disease gene discovery in amyotrophic lateral sclerosis by restricting rare variant association testing to familial cases and controls16,17,18. The rationale of selecting familial cases is to increase sensitivity by enriching for patients carrying rare variants of large effect. In this Letter, we apply this strategy to familial PD, conducting to our knowledge the largest genetic analysis of familial PD so far.

We combined sequencing data from 16 cohorts, including both whole-exome and whole-genome sequencing (WGS) data, comprising a total of 2,824 individuals with familial PD and 78,683 controls. We uniformly processed all sequence data, including alignment to the Genome Reference Consortium human build 38 (GRCh38) reference genome and joint variant calling19. We retained individuals of predominantly European ancestry who met quality control criteria (2,756 cases and 73,879 controls) and then retained unrelated individuals, resulting in an association cohort consisting of 2,184 index familial PD cases (one affected individual per family) and 69,775 controls (Extended Data Figs. 13). Within this cohort, we identified 5,044,315 variants that passed strict quality control, out of which 2,114,963 were predicted to be nonsynonymous (Extended Data Fig. 4).

To identify genes associated with familial PD, we performed gene-based analyses for both low-frequency (minor allele frequency (MAF) <0.05) and ultrarare (≤5 carriers) nonsynonymous variants using an omnibus test that combines burden, sequencing kernel association test (SKAT) and the aggregated Cauchy association variant test (ACAT-V) (Methods)20,21,22. No significant disease associations were observed within the ultrarare variant category (Extended Data Fig. 5a,b); however, analyses of low-frequency variants (MAF <0.05) revealed disease associations for LRRK2, GBA and RAB32 at exome-wide significance (P < 2.7 × 10−6; Fig. 1a and Supplementary Data 1). Concurrently, we conducted exome-wide single-variant analyses, identifying four exome-wide significant variants in LRRK2, GBA and RAB32 (Fig. 1b, Extended Data Fig. 5c,d and Supplementary Data 2). Among the four significant variants, three had been previously implicated in PD, including c.1093G > A/p.E365K in GBA; c.6055G > A/p.G2019S and c.4321C > T/p.R1441C in LRRK2 (test statistics for all known PD variants are provided in Supplementary Data 2). The fourth variant, a c.213C > G/p.S71R variant of RAB32 (odds ratio of 65.5, P = 7.8 × 10−16), represented a novel discovery. We observed no evidence of genomic inflation in any of the analyses performed (gene λ1,000 = 0.94, single variant λ1,000 = 0.95; Extended Data Fig. 5). Of note, two additional genes (SNAPC1 and C5) were initially flagged as potential exome-wide significant associations but were then confirmed as technical artifacts in subsequent validation analyses (Methods, Supplementary Tables 1 and 2, Extended Data Fig. 5e,f, and Supplementary Figs. 1 and 2).

Fig. 1: Rare nonsynonymous gene-based and single-variant analyses in 2,184 PD cases and 69,775 controls identify RAB32, LRRK2 and GBA.
figure 1

a, The y axis shows the gene-based associations from the ACAT omnibus test including Firth’s logistic regression, SKAT and ACAT-V (two-tailed −log10(P value)) plotted against genomic coordinates on the x axis (GRCh38). The dashed line indicates the exome-wide significance threshold (P = 2.7 × 10−6). b, Estimated odds ratios (OR) (log transformed, x axis) and 95% confidence intervals (CI) of the exome-wide significant single variants obtained from Firth’s logistic regression. c, Conditional analyses (MAF <0.05) of significant genes using the ACAT omnibus test including Firth’s logistic regression, SKAT and ACAT-V (two-tailed). In red are the unconditioned gene associations and in blue are the conditioned gene associations, adjusted for the significant single variants within the respective gene (p.E365K in GBA, p.G2019S and p.R1441C in LRRK2, and p.S71R in RAB32). d, Single-variant associations in RAB32 estimated using Firth’s logistic regression (y axis; two-tailed −log10(P value)), plotted against coding sequence positions (CDS; x axis).

Both the gene-based and single-variant analyses converged on RAB32 as a novel PD gene. Conditional analysis revealed that the observed RAB32 gene association was due to the p.S71R variant (Pconditional = 0.83; Fig. 1b–d), whereas conditional analyses for LRRK2 and GBA confirmed that known residual associations from additional disease relevant rare variants were detectable within the dataset (Fig. 1c; GBA, Pconditional = 3.1 × 10−6 and LRRK2, Pconditional = 8.7 × 10−3). Among 2,184 index PD cases, we identified one homozygous carrier and 15 heterozygous carriers of the RAB32 p.S71R variant (MAFcase of 0.0037), while only three (heterozygous) variant carriers were identified among the 69,775 controls (MAFctrl of 0.000022). The MAF observed in the control group is similar to that reported in 44,329 non-neurological European ancestry individuals in the gnomAD (v2.1.1) reference database (MAFgnomAD of 0.000023), confirming that the variant is very rare in the general population23. For three PD cases carrying the p.S71R variant, exome sequencing data were available for one of their affected siblings, and the variant segregated with the disease in all three families (Fig. 2 and Extended Data Fig. 6). Thus, in total, we identified 18 PD cases carrying the p.S71R variant, of which 17 were heterozygous carriers and one was a homozygous carrier (no evidence for inbreeding, F = −0.004). Sanger sequencing confirmed the presence of the variant in all 18 PD cases (Supplementary Table 1 and Supplementary Fig. 1). Haplotype analysis indicated that carriers shared a ~300-kb haplotype surrounding the p.S71R variant (Extended Data Fig. 7). Evaluation of other known PD variants revealed the co-occurrence of p.S71R in two carriers with GBA risk variants (c.1093G > A/p.E365K and c.1223C > T/p.T408M) (Extended Data Table 1).

Fig. 2: Pedigrees of families with multiple sequenced individuals.
figure 2

Unaffected family members are indicated by white symbols, affected family members with verified PD are indicated by black symbols and family members with nonverified or reported PD are indicated by gray symbols. For individuals with PD, the age of onset is displayed at the left and age at death or last follow-up is displayed on the right (separated by ‘|’). For nonaffected individuals, the age at death or last follow-up is displayed. Probands are indicated by arrows. Pedigrees for which only the proband was sequenced are presented in Extended Data Fig. 6.

We identified an additional four carriers among 3,380 PD cases (0.12%) in the publicly available Accelerating Medicines Partnership program for PD WGS data (https://www.amp-pd.org/), of which two were familial PD cases (out of 900; 0.22%), one was a sporadic PD case and family history was not available for the fourth case.

Carriers of the RAB32 p.S71R variant had a mean age of onset of 56 years (s.d. 13; range, 39–82 years), which did not significantly differ from noncarriers (Extended Data Fig. 8 and Extended Data Table 1). The variant was observed in 11 males and 7 females, reflecting the overall sex ratio of patients in this study (~60% male). All p.S71R carriers evaluated for L-dopa response were confirmed as L-dopa responsive (15 evaluated and 3 not evaluated) and for the majority (10 out of 18), dyskinesia and/or motor fluctuations were reported. For all but one of the carriers, asymmetry was reported (16 out of 17), and the majority presented with rest tremors (11 out of 18). Moreover, postural instability, freezing of gait and/or falls were common symptoms among carriers reported in 11 out of 18 patients. Detailed case reports were available for nine patients (Supplementary Note). Among them, seven out of nine patients reported nonmotor symptoms, including, among others, hyposmia, autonomic dysfunction, sleep problems and depression. Notably, two patients had a medical history of Hashimoto’s thyroiditis. The case report of the homozygous carrier shows that the clinical presentation did not notably differ from that of heterozygote carriers. He had late-onset (61 years), L-dopa-responsive PD characterized by a right resting tremor and micrographia with motor fluctuations and dyskinesia emerging 9 years after onset. The patient passed away 12 years after the onset, with falls and mild cognitive impairment appearing 10 and 11 years after onset, respectively.

RAB32 belongs to the RAB family of small GTPases, which is composed of 70 members that play key roles in regulating intracellular vesicular transport24. Various members of the RAB family are known to be either substrates or regulators of the key PD protein LRRK2 (refs. 24,25,26). Most pathogenic LRRK2 variants increase kinase activity, in turn resulting in increased phosphorylation of several RAB family members25,27,28. Interestingly, while there is no evidence supporting phosphorylation of RAB32 by LRRK2, RAB32 has been shown to interact with LRRK2 through the N-terminal Armadillo domain of LRRK2, and may regulate its subcellular localization3,4. Moreover, overexpression of RAB29, a member of the RAB32 subfamily, has been shown to activate LRRK2, recruiting it to the trans-Golgi network and increasing its kinase activity26.

Given the established interaction between LRRK2 and RAB32, along with the closely related RAB29 functioning as a LRRK2 activator, we hypothesized that RAB32 S71R results in increased LRRK2 activation. To this end, we performed cotransfections in HEK293 cells, introducing Myc-LRRK2 wild type (WT) alongside either HA-RAB32 WT or HA-RAB32 S71R. Autophosphorylation of S1292 provides a known in vivo readout of LRRK2 kinase activity, as well as a biomarker for the development of parkinsonian phenotypes in patients carrying LRRK2 mutations27,29,30. As anticipated, we observed a significant increase in LRRK2 S1292 phosphorylation levels in cells expressing mutant RAB32 versus WT RAB32 (approximately threefold increase, P = 0.02; Fig. 3a and Extended Data Fig. 9). In keeping with this, we also observed a threefold increase in the interaction between LRRK2 and RAB32 through coimmunoprecipitation and western blotting (P = 0.034; Fig. 3b and Extended Data Fig. 9). Finally, previous phosphoproteomic analysis indicated that Ser71 constitutes the only known phosphorylation site on the RAB32 protein (Fig. 4)31. The S71R variant probably interferes with this phosphorylation event, and this may provide an explanation for why the S71R variant alone was found to exhibit a robust association with PD risk. Taken together, these results support a model where the S71R variant selectively impacts the functioning of RAB32 in a manner that increases its interaction with LRRK2 and, through this, increases LRRK2 activity.

Fig. 3: RAB32 S71R triggers increased LRRK2 kinase activity.
figure 3

Western blot analyses of cotransfections in HEK293 cells, where Myc-LRRK2 WT was introduced alongside either HA-RAB32 WT or HA-RAB32 S71R. The bar plots show mean ± s.d. a, LRRK2 S1292 phosphorylation; n = 3 biological replicates are shown (two-tailed one-sample t-test, P = 0.02; Extended Data Fig. 9). b, LRRK2 and RAB32 coimmunoprecipitation; n = 3 biological replicates are shown (two-tailed one-sample t-test, P = 0.034; Extended Data Fig. 9). *P < 0.05. Full-length blots are provided as source data. Mut, mutant.

Source data

Fig. 4: Three-dimensional protein structure of RAB32 highlighting the phosphorylation site at Ser71.
figure 4

Crystal structure of uncomplexed Rab32 in the active GTP-bound state at 2.13 Å resolution4,34.

In conclusion, we present, to our knowledge, the largest genetic association study of familial PD so far, identifying p.S71R in RAB32 as a novel risk variant and substantiating that the toxicity of this variant is most probably mediated through enhancement of LRRK2 activity. Importantly, substantial evidence points toward increased LRRK2 kinase activity as the primary pathological mechanism in LRRK2 PD and, to some extent, idiopathic PD, and consequently, reduction of LRRK2 activity has emerged as a major therapeutic focus5,6,7. Future research is needed to determine the mechanism through which RAB32 S71R increases LRRK2 activity. Of particular interest is the possibility that RAB32 S71R leads to tetramerization of LRRK2, similar to RAB2932, which was shown to increase LRRK2 autophosphorylation in contrast to WT RAB32. Our study underscores that, as with prior work in amyotrophic lateral sclerosis16,17,18, the combination of familial case ascertainment and a sizable control cohort enables the identification of low-frequency variants that were previously limited to detection by linkage analyses. This is evidenced first and foremost by the discovery of RAB32 p.S71R, but also by the recovery of known ultrarare PD variants that had not previously been detectable in genome-wide case–control analyses (LRRK2 p.R1441C, MAFctrl 0.000014, P = 2.27 × 10−10). Interestingly, our findings are corroborated by a recent study33, which discovered that RAB32 p.S71R segregates with PD in three families, not overlapping with our study. Consistent with our results, this parallel study demonstrates that RAB32 S71R enhances LRRK2 autophosphorylation, as well as RAB10 phosphorylation. This independent study provides complementary evidence that validates our findings, thus firmly establishing RAB32 p.S71R as a causal factor in familial PD. Finally, the identification of RAB32 as a familial PD gene provides new insight into the important mechanistic connections between RAB family biology, LRRK2 and PD risk.

Methods

Cohorts

We received approval for this study from the institutional review boards of the participating centers, and written informed consent was obtained from all patients (consent for research). Informed consent was obtained for the publication of individual case reports. The discovery cohort included 2,824 patients with PD and 78,683 controls, totaling 81,507 study participants, of which 10,270 were subjected to WGS and 71,237 to whole-exome sequencing (WXS). One thousand six hundred familial PD cases were from the PROGENI study, which recruited multiplex PD families primarily through a living affected sibling pair. Subsequently, ascertainment was loosened to include PD probands having a positive family history of PD in a first-degree relative, who was not required to be part of the study. All individuals completed an in-person evaluation that included parts II and III of the Unified Parkinson Disease Rating Scale (Lang and Fahn 1989) and a diagnostic checklist that implemented the UK PD Brain Bank inclusion and exclusion criteria. Responses on the diagnostic checklist were then used to classify study subjects as having verified PD or nonverified PD. A total of 783 familial PD cases were recruited at the Mayo clinic, patients were included who fulfilled the clinical diagnostic criteria of PD35. A total of 441 familial PD cases were from the Parkinson Institute Biobank; diagnosis of PD was made according to the UK Brain Bank criteria35. Control cohorts included 7,323 samples from the National Heart, Lung, and Blood Institute Trans-Omics for Precision Medicine research program36; 49,981 samples from the UK Biobank37; 2,805 samples from Project MinE38; 342 samples from the New York Genome Center’s amyotrophic lateral sclerosis (ALS) consortium and 18,232 samples from database of Genotypes and Phenotypes (included studies and accession number are listed in the ‘Data availability’ section)39.

Sequencing

All raw sequencing data was aligned to the GRCh38 reference genome using Burrow–Wheeler aligner maximal exact match (BWA–MEM; v.2.2.1)40 according to the pipeline described by Regier et al.41 (implementation can be found at GitHub42 and Zenodo43). Joint genotyping was performed using a uniform pipeline according to the Genome Analysis Toolkit (GATK) best practices (v. 4.2.6.1)19. Genotype calls with a quality score <20 were set to missing, variant calls supported by uninformative reads were excluded and multiallelic variants were split into biallelic variants. Male genotypes in nonpseudoautosomal regions on chromosome X were coded as 0 or 1 (according to 0 or 1 allele copies).

Variant annotation

Variants were annotated using snpEff44, dbscSNV45 and Ensembl Release 105 gene models46. Variants were classified as loss-of-function when predicted by snpEff to have a high impact (including nonsense mutations, splice acceptor/donors and frameshift mutations) or predicted as potentially splice altering by dbscSNV (‘ada’ or ‘rf’ score >0.7). Variants were classified as having moderate impact when predicted as such by snpEff (including missense mutations and inframe deletions). For each gene, the impact of a variant was determined by its most severe consequence across protein-coding transcripts.

Sample quality control

Ancestry was estimated by projecting all samples on a reference ancestry space comprising 1000 Genomes samples using the LASER software (v2.04)47. We retained individuals of predominantly European ancestry (Extended Data Figs. 1 and 2). We then excluded samples with low genotype call rates (<0.9), discordant sex or deviating heterozygosity (F < −0.1 or F > 0.1). These metrics were calculated in a set of autosomal variants meeting the following criteria: call rate >0.9 in each supercohort (WGS, WXSUKB and WXSother), MAF >0.01, and for sex inference, heterozygosity, relatedness and principal component analysis (PCA) variants were additionally filtered on the basis of Hardy–Weinberg equilibrium (P < 0.001) and pruned if in linkage disequilibrium (r2 < 0.5, window size of 50, step 5; additionally, high linkage disequilibrium regions were excluded before PCA48). We then excluded samples based on a high exome-wide number of single-nucleotide variants (SNVs), insertions/deletions (INDELs), singletons, high INDEL/SNV ratio or deviating Ti/Tv ratio (thresholds listed in Extended Data Fig. 2). Sample duplicates and relatives up to and including the second degree were identified using the KING software49. An unrelated sample set was generated by first excluding samples with five or more relations, followed by iteratively excluding individuals with the highest number of relations, resolving ties by prioritizing (in order) PD over controls and WGS over WXS samples. PCA was performed on the unrelated sample set using fastPCA as implemented in PLINK2 (ref. 50). A distinct cluster was identified on the fourth and fifth principal component, consisting of an Amish population, which was excluded since the cluster contained only controls (Extended Data Fig. 2).

Variant quality control

First, GATK variant quality score recalibration (VQSR) was applied to all variants using the training data and annotations as recommended by the GATK best practices19. Variants were excluded if they did not pass variant quality score recalibration, their genotyping rate was <0.9 in any of the supercohorts (WGS, WXSUKB and WXSother) or if they did not pass the Hardy–Weinberg equilibrium test in control subjects (P < 0.0001). We then additionally excluded variants with subpar quality scores, variants located in regions showing signs of batch effects and we retained only SNVs. Potential batch effects were identified by testing whether variant minor allele counts were associated with cohort membership within control subjects. Firth’s logistic regression was used to perform these control–control analyses, adjusting for sex and four principal components. This procedure was repeated for each cohort (that is, number 1 represents individual in respective cohort, number 0 represents otherwise). In total, 16 control versus control cohort comparisons were tested (including all WGS controls versus all WXS controls). The minimum P value across these analyses was used as a metric to identify variants associating with probable batch effects. The stringency of various standard variant quality control filters was then increased to eliminate variants exhibiting batch associated calling bias while maintaining maximal sensitivity for unbiased variant calls (Extended Data Fig. 4). In total 5,044,315 variants passed quality control, including 2,114,963 nonsynonymous variants (Extended Data Fig. 4).

Single-variant analyses

Single-variant analyses were performed for all nonsynonymous variants with MAF <0.05 and a minor allele count of at least five (425,827 variants). For each variant, we tested for an association between PD status and minor allele count using Firth’s logistic regression adjusting for sex, the first ten principal components and the total number of rare synonymous variants in each individual20. All tests were two sided and Bonferroni correction was employed to correct for multiple testing. Significant single-variant associations were screened for potential technical biases arising from different sequencing centers among cases by using the same procedure employed in the control–control analyses. Variants where Pcase–case < Pcase–control were flagged as being potentially driven by technical variation (Extended Data Fig. 5). In addition, we performed Sanger sequencing to validate novel identified variants (see ‘Sanger sequencing’ section). Haplotypes were inferred using Beagle (v.5.4) in single-nucleotide polymorphism (SNP) array data (Infinium Global Diversity Array-8)51.

Gene-based analyses

Gene-based rare variant analyses were performed for 18,242 protein-coding genes, including either low frequency (MAF <0.05) or ultrarare (≤5 carriers) nonsynonymous variants (loss-of-function or moderate impact). For ultrarare variants, we performed burden tests by testing for an association between PD status and the aggregate effect of minor alleles observed per sample per gene. For low-frequency variants (MAF <0.05), we additionally performed SKAT and ACAT-V tests22,52. To account for the unbalanced case–control ratio, burden tests were performed using Firth’s logistic regression20, the robust version of SKAT was employed21 and we adapted ACAT-V so that Firth’s logistic regression is used to perform the burden and single-variant tests that underlie ACAT-V53. Test statistics across the three statistical tests were combined using ACAT, which is designed to combine potentially correlated test statistics22. In each of the tests, we included sex, the first ten principal components and the total number of qualifying synonymous variants in each individual as covariates. Tests were retained when there were at least five variant carriers across the gene.

We performed conditional analyses to assess to what extent gene-based associations were driven by significant variants identified in the single-variant analysis. Conditional gene-based analyses were performed as described above, excluding and adjusting for the significant single variants within the respective gene.

Sanger sequencing

Sanger sequencing of RAB32 exon 1 and SNAPC1 c.1073-2A > T was performed using touchdown PCR (30 cycles with annealing temperature starting at 65 °C and decreasing 0.5 °C per cycle, followed by 15 cycles with an annealing temperature of 65 °C) using primers generated with M13 tails. Amplification was performed using AmpliTaqGold 360 Master mix (Thermo Fisher Scientific, 4398876). Patient DNA samples were provided after extraction from whole blood. PCR products were visualized on a 2% agarose gel using gel electrophoresis and subsequently purified by incubation with Exonuclease I (New England Biolabs, M0293L) and shrimp alkaline phosphatase (Sigma-Aldrich, GEE70092X). Following purification, all PCR products were sequenced at the MGH Massachusetts General Hospital DNA Core Facility. Sequence analysis was performed using the PHRED/PHRAP/Consed software suite (http://www.phrap.org/) and variations in the sequences were identified with the Polyphred v6.15 software54. The PCR and sequencing primers are listed in Supplementary Tables 1 and 2. Primer set 1a in Supplementary Table 1 is for one sample (sample 17 in Extended Data Table 1) that has a primer sitting on top of the common SNP, rs41285869. New primers were designed to avoid this common SNP as a false homozygous variant was visualized as compared with the exome sequencing results.

Immunoprecipitation

Immortalized cell culture

HEK293 cells (ATCC) were maintained at 37 °C with 5% CO2 in Dulbecco’s modified Eagle medium (Invitrogen) supplemented with 10% (vol/vol) heat inactivated fetal bovine serum (MediaTech), 2 mM l-glutamine (Gibco) and 1% (vol/vol) penicillin and streptomycin solution (Gibco).

Plasmids used for this study

The 2xMyc-LRRK2 WT plasmid was acquired from Addgene (number 25361) and confirmed via whole plasmid sequencing at Massachusetts General Hospital. The HA-RAB32 constructs, WT and S71R, were made and sequenced by Genescript. All constructs were prepped with a Maxi prep kit (Qiagen) following the manufacturer’s instructions.

Immunoprecipitation and western blot

HEK293 cells were transfected with Lipofectamine 2000 (Invitrogen) according to the manufacturer’s instructions using a DNA:Lipo ratio of 3 μg:6 μl per well of a six-well plate. For cotransfections, the DNA amount was divided at a 2:1 ratio for LRRK2 WT:RAB32 constructs. A total of 48 h posttransfection, the lysates were collected in denaturing hypotonic lysis buffer55 and protein concentration determined by Pierce BCA Protein Assay (Thermo Fisher). The 1× PBS + 0.1% Tween 20 washed Myc–antibody conjugated magnetic beads were combined with lysate in a 1 μl:5 μg ratio before rotating the immunoprecipitation overnight at 4 °C. The beads were washed with denaturing wash buffer55 before protein elution by boiling in 2× loading sample buffer (Boston Bioproducts). Approximately 50% of the eluate from each condition was run on a 4–20% gradient gel (Bio-Rad) and transferred to a polyvinylidene difluoride membrane (Bio-Rad) on the high molecular weight setting of a Trans-blot Turbo system (Bio-Rad). The membranes were processed as described previously56. Primary antibodies were used as follows: 1:2,000 for rabbit anti-HA (DSHB, catalog number anti-HA rRb-IgG), 1:1,000 mouse anti-Myc (BioLegend, catalog number 626802) and 1:300 Phos-LRRK2 S1292 (Abcam, catalog number 203181). Blots were visualized with an Odyssey Infrared Imager (LiCor, model 9120).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.