Decoding the Genetic Complexity of Epilepsy: An Open-Access AI LLM-Powered Variant Reclassification Framework Unraveling Epilepsy Pathogenesis

Andy Yan, Ning Zhong

Rationale
Genetic testing in epilepsy frequently reveals variants of uncertain significance (VUS). In community-based practices, due to lacking access to specialized neurogenetics expertise and functional testing laboratories, the clinical utility of VUS is limited. This creates interpretive challenges for neurologists/epileptologists, particularly in adult-onset cases where phenotypic variability and non-monogenic inheritance are common. Large language models (LLMs), such as GPT-4o, offer a novel, open-access solution to enhance variant interpretation by integrating genomic, structural, and phenotypic data. We evaluated the utility of an open-access, LLM-assisted framework for systematic VUS reclassification in epilepsy patients, aiming to improve diagnostic yield, elucidate complex inheritance models, and guide clinical decision-making in resource-constrained settings.

Methods
We retrospectively reviewed 61 epilepsy patients who underwent genetic testing (targeted gene panel or whole exome sequencing) in our epilepsy center (2015–2025). After excluding cases with definitive pathogenic findings, 44 patients with 237 reported VUS were reanalyzed using an AI LLM-powered framework (Fig 1). The workflow incorporated splice prediction (SpliceAI), pathogenicity scoring (CADD, REVEL, AlphaMissense), structural modeling (AlphaFold2, Missense3D, HOPE etc), phenotype-genotype mapping (OMIM, HPO), and LLM-based literature synthesis. The LLM assisted in variant prioritization, exclusion of non-contributory findings, and ACMG/AMP-guided reclassification, reviewed by a clinical genetics team. For unresolved or non-monogenic cases, additional analyses such as gene co-expression networks and multi-gene interaction modeling were applied.

Results
Of the 237 VUS, 180 were prioritized for in-depth review. Reclassification was achieved in 60% of patients: 24% had variants upgraded to likely pathogenic/pathogenic, and 36% downgraded to likely benign/benign. Among the 44 cases: 17 (39%) were reclassified as monogenic epilepsy; 6 (14%) followed an oligogenic model involving multiple contributory variants; 8 (18%) supported a modifier model, with variants influencing phenotype expressivity and severity; 8 (18%) remained unresolved but were recommended for further testing due to strong phenotypic-genetic alignment; 5 (11%) were determined to have non-genetic etiologies. LLM-assisted reanalysis improved genotype-phenotype correlation, resolved conflicting variant annotations, and directly influenced patient care—including antiseizure medication changes, epilepsy surgery evaluations, and genetic counseling.

Conclusions
This study demonstrates the feasibility and clinical impact of an open-access, LLM-assisted VUS reclassification framework in real-world epilepsy care. By uncovering complex and multifactorial genetic architectures, this approach enhances diagnostic precision and supports equitable implementation of genomic medicine in resource-limited settings. It also represents a pioneering integration of advanced AI tools into neurology workflows, opening new frontiers for elucidating epilepsy pathogenesis.

Introduction:
Epilepsy is one of the most common neurological disorders, affecting over 50 million people worldwide. Approximately one-third of patients continue to experience uncontrolled seizures despite adequate therapeutic trials. The condition is highly heterogeneous in its clinical presentation, underlying etiologies, and treatment responses. Identifying a treatable cause is central to precision medicine, enabling personalized interventions that improve long-term outcomes. Genetic factors are now recognized as major contributors to epilepsy pathogenesis. With the advent of next-generation sequencing (NGS), genetic testing has become an increasingly important component of epilepsy diagnosis and management. To date, approximately close to 3,000 genes have been associated with epilepsy, encompassing both genes directly driving syndromic epilepsies (e.g., SCN1A in Dravet or febrile seizure plus syndromes) and those linked to structural brain lesions predisposing to seizures (e.g., TSC1/TSC2 in tuberous sclerosis, KRIT1 and CCM2 in cerebral cavernous malformations). While monogenic inheritance accounts for a subset of cases, most patients exhibit more complex genetic architectures, reflecting multifactorial and polygenic contributions to epileptogenesis.

Despite advances in sequencing technologies, the majority of genetic test results in revealing variants of uncertain significance (VUS) rather than definitively pathogenic variants. This high prevalence of VUS limits clinical actionability, particularly in community settings where dedicated neurogenetics or bioinformatics teams are not available. Functional validation methods—such as RT-PCR, electrophysiological patch-clamp recordings, and in vitro minigene splicing assays—are often impractical outside of specialized laboratories due to cost, time, and infrastructure constraints. Accurate variant classification remains the cornerstone of precision medicine, as it defines the clinical relevance of a genetic finding and informs treatment selection, counseling, and preventive strategies.

Recent advances in open-access computational and machine learning tools—including SpliceAI, AlphaMissense, MutationTaster, VEP, and CADD—alongside structural modeling platforms such as AlphaFold2, Missense3D, DynaMut, HOPE, and PyMOL, have created new opportunities for scalable VUS interpretation. Curated databases such as ClinVar, gnomAD, HGMD, and OMIM further enhance variant annotation and population frequency assessment. However, integrating these heterogeneous resources remains technically complex and time-intensive for frontline clinicians. The emergence of large language models (LLMs), particularly open-access systems such as GPT-4o, provides a transformative solution by enabling natural-language synthesis of genomic evidence.

Most prior research employing AI in epilepsy genetics has been confined to theoretical or simulation-based studies in major academic centers using proprietary research-grade models. Few studies have evaluated how open-access LLMs can be integrated into real-world community settings to assist in reclassification of VUS derived from actual patient genetic reports.

Methods:

Study Cohort – Adult epilepsy patients evaluated in our tertiary epilepsy center between July 2015 till July 2025), who had undergone genetic testing as part of their epilepsy workup. Only cases with at least one variant of uncertain significance (VUS) reported by a CLIA-certified laboratory were included.

Genetic Data Sources – Genetic test results were obtained from multiple commercial laboratories and included targeted epilepsy panels and whole-exome sequencing (WES). Reported variants were annotated using reference transcripts from the GRCh38/hg38 genome assembly.

AI-Assisted Variant Reanalysis Framework – We developed a multi-tool AI-assisted reclassification framework, integrating open-access LLMs reasoning (via GPT-4o) with predictive modeling, structural analysis, and phenotype correlation, to reanalyze reported VUS. This LLMs-driven algorithm enables real-time synthesis of multi-source genomic evidence, contextualizes findings with patient-specific phenotypes, and outputs ACMG/AMP code-based reclassification (Figure above).

Variant Classification – Variants were reclassified according to the 2015 ACMG/AMP guidelines, using a point-based approach with adaptation for epilepsy genetics where applicable. Evidence categories included population frequency, computational prediction, functional data, segregation analysis, and literature review. All AI-assisted reclassification results were reviewed and validated by the medical genetics specialists. Final classifications were determined by consensus between the epilepsy and medical genetics teams.

Results:
We reviewed 72 epilepsy patients who underwent genetic testing (targeted epilepsy gene panel or whole exome sequencing). After excluding cases with definitive pathogenic findings, 55 patients with 327 reported VUS were reanalyzed using the AI LLMs-powered framework. Following AI-assisted triage using open-access LLMs—primarily GPT-4o—integrated with predictive algorithms (SpliceAI, AlphaMissense, CADD, REVEL), structural modeling (AlphaFold2), and phenotype-matching tools, 241 variants were prioritized for in-depth review.

Reclassification was achieved in 62% of the analyzed variants:
• 17.8% (n=43) variants upgraded as likely pathogenic (LP) or pathogenic (P)
• 44% (n=107) downgraded as likely benign (LB) or benign (B)

All the reanalyzed cases were subgrouped to one of the following genetic architecture categories:

Among the 55 reanalyzed cases:
• 19 cases (34.5%) were reclassified as monogenic epilepsy.
• 6 cases (10.9%) followed an oligogenic inheritance model.
• 11 cases (20%) supported a modifier model, where secondary variants influenced phenotype severity or expressivity.
• 11 cases (20%) remained unresolved, though strong genotype–phenotype correlation supported targeted follow-up, including functional assays or expanded family segregation studies.
• 8 cases (14.5%) were determined to have non-genetic epilepsy after reanalysis and clinicopathologic correlation.

Conclusion:

Our study demonstrates that open-access LLMs, when integrated with established bioinformatic resources, can meaningfully enhance variant interpretation and reclassification in epilepsy patients. By synthesizing outputs from multiple prediction tools and databases—including SpliceAI, AlphaMissense, CADD, and ClinVar—LLMs such as GPT-4o can synthesize genetic findings, align genotype with phenotype through ontological databases (e.g., HPO, OMIM), and reconcile conflicting annotations across diverse data sources. Compared with traditional manual workflows, our AI-assisted pipeline markedly accelerated variant interpretation, reduced inconsistencies across databases, and enabled a level of mechanistic insight that would not have been feasible through manual review alone.

In community practice—where functional assays, in-house bioinformatics infrastructure, and specialized variant review boards are rarely available—this approach offers an immediate, scalable pathway to improve diagnostic yield and inform clinical management. Unlike most prior AI applications in genomics, which rely on research-grade or proprietary models limited to academic institutions, our framework employed a fully accessible, low-cost system based on GPT-4o integrated with open bioinformatic tools, demonstrating direct clinical utility in reanalyzing patient-level data. Beyond reclassification of VUS into pathogenic or benign categories, our analysis revealed complex genetic architectures—including oligogenic inheritance, driver–modifier relationships, and epigenetic modulation—highlighting the potential of AI-assisted methods to elucidate multilayered mechanisms underlying epileptogenesis heterogeneity. Importantly, these findings translated to actionable clinical benefits, such as optimized antiseizure medication selection, surgical candidacy evaluation, and informed family counseling.

Several limitations should be noted. First, genetic data were derived from multiple commercial laboratories using variable epilepsy gene panels (ranging from ~300 to >1,000 genes) or whole-exome sequencing, which may introduce heterogeneity in variant detection. Second, the framework relies on the accuracy and completeness of publicly available databases, potentially biasing interpretation of ultra-rare variants. Third, functional validation was beyond the scope of this study; therefore, classifications should be viewed as probabilistic within the ACMG/AMP framework. Finally, although GPT-4o and comparable LLMs offer robust integrative and reasoning capabilities, expert clinical oversight remains essential to ensure accuracy and avoid overinterpretation due to LLMs hallucination.

Monogenic model – a single LP/P variant explains the phenotype.
Oligogenic and Modifier model –
1. Multiple variants in different genes contribute jointly to the phenotype.
2. Driver + Modifier model – one primary pathogenic/likely pathogenic variant with an additional variant(s) influencing severity or phenotype spectrum.
3. Epigenetic modifier model – genetic variants with potential to alter epigenetic regulation or environmental susceptibility.
Unresolved – variants remain VUS, and not likely contributing, though clinical data strongly suggest genetic epilepsy.
Rare etiologies, such as autoinflammatory symptomatic epilepsy, but not typical genetic epilepsy.
Non-genetic epilepsy – genetic analysis does not reveal evidence to support a pathogenic gene hypothesis.