NeuralKG is a python-based library for diverse representation learning of knowledge graphs implementing Conventional KGEs, GNN-based KGEs, and Rule-based KGEs. We provide comprehensive documents for beginners and an online website to organize an open and shared KG representation learning community.
This article uses this open source toolkit to easily and conveniently train on datasets of biomolecules. You only need to provide the dataset in the required format, and select the required model and hyperparameters to start training.
See NeuralKG for installation tutorial.
Data introduction and processing
NeuralKG needs to process the data into the following five files:
entities.dict
relations.dict
train.txt
valid.txt
test.txt
Where entities.dict
and relationships.dict
is ID\tname
for each line, and ID is the sequence number starting from 0; train.txt
、valid.txt
、test.txt
is Head entity\tRelation\tTail entity
for each.
Drug Repurposing Knowledge Graph
Drug Repurposing Knowledge Graph (DRKG) is a comprehensive biological knowledge graph relating genes, compounds, diseases, biological processes, side effects and symptoms. DRKG includes information from six existing databases including DrugBank, Hetionet, GNBR, String, IntAct and DGIdb, and data collected from recent publications particularly related to Covid19. It includes 97,238 entities belonging to 13 entity-types; and 5,874,261 triplets belonging to 107 edge-types.
They also provide molecule embeddings for most small-molecule drugs in DrugBank using pre-trained GNNs. In particular, Strategies for Pre-training Graph Neural Networks develops multiple approaches for pre-training GNN-based molecular representations, combining supervised molecular property prediction with self-supervised learning approaches.
See GitHub for the details of DRKG, and see download link for DRKG in the format of triples. After converting it to the format required by NeuralKG, the experiment can be started.
Configuration
Neuralkg provides two ways to configure parameters: use YAML format configuration file to adjust various parameters, or use command line to adjust during training. See parameter description for specific parameter functions. You can simply use litmodel_name
to select Conventional KGEs, GNN-based KGEs, and Rule-based KGEs, model_name
to select a specific model.
The YAML file can be obtained by modifying the example of configs, and run the model by:
python --load_config --config_path <your-config.yaml>
The script file can be obtained by modifying the example of scripts, and run the model by:
bash <your-script.sh>
Here, we choose TransE, ComplEx and RotatE of KGE to test the model.
Results
Model | MRR | Hit@1 | Hit@3 | Hit@10 |
---|---|---|---|---|
TransE | 0.235 | 0.087 | 0.334 | 0.461 |
ComplEx | 0.315 | 0.169 | 0.416 | 0.539 |
RotatE | 0.278 | 0.142 | 0.370 | 0.486 |