NeuralKG is a python-based library for diverse representation learning of knowledge graphs implementing Conventional KGEs, GNN-based KGEs, and Rule-based KGEs. NeuralKG provides comprehensive documents and is easy to use.
It is worth mentioning that NeuralKG's use is not limited to the standard benchmarks such as FB15K237, WN18RR. We could construct self-defined Knowledge Graphs datasets and get strong baseline for more comprehensive experiments with NeuralKG. We could use an example to illustrate how to apply NeuralKG on self-defined datasets.
Start Up
For example, 《Mining Implicit Entity Preference from User-Item Interaction Data for Knowledge Graph Completion via Adversarial Learning》 is a paper presented in WWW2020, the official code of the paper is at https://github.com/RUCDM/KB4Rec/tree/master/Projects/UPGAN
We will show how to apply NeuralKG on UPGAN to get more meaningful strong baselines. The environment we use is showed as follows:
- Server: Ubuntu 18.04.5 LTS (GNU/Linux 4.15.0-163-generic x86_64)
- Computation Resources: Nvidia GeForce RTX 3090 * 1
- Python: 3.7.11, with torch 1.8.1 and necessary libraries.
We first use git clone
to get its source code and follow the download link provided in the repository to download the datasets needed for our experiments. In this blog, we only use Amazon-Book dataset, other datasets are equally available for similar operations. After preparing all the files, the file structure of the project is as follows:
├── Model
│ ├── DistMult.py
│ ├── UGAT.py
│ ├── UGAT_mlp.py
│ ├── base_model.py
│ ├── concat_1layer.py
│ ├── concat_2layer.py
│ ├── dot_2layer.py
│ ├── generator.py
│ ├── generator_concat.py
│ └── layers.py
├── README.md
├── checkpoint
├── data
│ ├── Readme.md
│ └── book
│ ├── kg
│ │ ├── e_map.dat
│ │ ├── r_map.dat
│ │ ├── test.dat
│ │ ├── train.dat
│ │ └── valid.dat
│ ├── neuralkg
│ │ ├── entities.dict
│ │ ├── relations.dict
│ │ ├── test.txt
│ │ ├── train.txt
│ │ └── valid.txt
│ └── rs
│ ├── i2kg_map.tsv
│ ├── i_map.dat
│ ├── ratings.txt
│ └── u_map.dat
├── kg_preprocess.py
├── log_compgcn.txt
├── log_rotate.txt
├── log_transe.txt
├── log_transh.txt
├── main_pretrain.py
├── main_upgan.py
├── model.Jpeg
├── pretrain
│ ├── base_trainer.py
│ ├── init.py
│ └── trainer.py
├── run_book.sh
├── run_neuralkg.py
├── train
│ ├── base_trainer.py
│ ├── evaluation.py
│ ├── init.py
│ ├── load_data.py
│ └── trainer.py
└── util
├── Regularization.py
├── dataset.py
├── kernel.py
├── sage_kernel.py
├── triple_kernel.py
└── utils.py
To run the source code, we can directly use the instructions provided by run_book.sh. Below we focus on how to use NeuralKG to enrich our experimental results by training the baseline on a custom dataset (i.e., the book dataset in this project).
Data process
First we need to pre-process the data and convert the dataset in the open source code of the paper into the form we need. The knowledge graph part of the dataset is located under book/kg
and contains two mapping tables of entities/relations and three datasets for training and validation tests, and we need to process it into a form of data acceptable to NeuralKG:
entities.dict
relations.dict
train.txt
test.txt
valid.txt
Then we can write a simple script kg_preprocess.py
to process the data format. After the processing, the file structure under the data directory becomes:
├── Readme.md
└── book
├── kg
│ ├── e_map.dat
│ ├── r_map.dat
│ ├── test.dat
│ ├── train.dat
│ └── valid.dat
├── neuralkg
│ ├── entities.dict
│ ├── relations.dict
│ ├── test.txt
│ ├── train.txt
│ └── valid.txt
└── rs
├── i2kg_map.tsv
├── i_map.dat
├── ratings.txt
└── u_map.dat
The neuralkg directory contains the processed dataset, and we can start using neuralkg to complete more baseline experiments.
Configuration
By reading the original paper we can understand that the authors used TransE, DistMult, ConvE, ConvTransE, R-GCN, KBGAN, CoFM, KTUP, KGAT as baseline models to contrast with their proposed method, and we can use NeuralKG to provide more knowledge graph embedding models to get richer We can try to use TransH, RotatE, CompGCN and other models as our baseline models to make the experimental results more convincing.
NeuralKG provides a configurable way to use the model. We only need to fill in the basic configuration and parameter information of the model using the configuration file in YAML format to train the knowledge graph embedding model very easily.
The template of the configuration file can be found in the config
directory of the NeuralKG project repository, we only need to copy these files and modify a few important parameters, such as changing data_path
to the path where the current dataset is located, as well as the number of training epochs max_epochs
, embedding dimension emb_dim
, number of negative samples num_neg
, etc. that affect the model.
We create three corresponding configuration files config_transh.yaml
, config_rotate.yaml
and config_compgcn.yaml
in the current directory, and set the number of training epochs to 1000 and the embedding dimension to 100, following the experimental settings in the paper.
Then we can create a code file run_neuralkg.py
to call NeuralKG to complete the experiment, the code here can be directly copied from the project repository main.py
In this way, we have completed the basic setup of the experiment, and the following is ready to start the experiment.
Experiments
We open Terminal in the current directory and enter the command.
python run_neuralkg.py --load_config --config_path ./config_transh.yaml
This will allow us to train a TransH model and test it. Similarly, we can train RotatE, CompGCN and other models. It often takes several hours to train the model, so we can hang the process in the background and wait patiently for the results.
The final experimental results we obtained are shown in the following table.
Model | MRR | Hit@1 | Hit@3 | Hit@10 |
---|---|---|---|---|
TransH | 28.5 | 0.225 | 0.312 | 0.400 |
RotatE | 31.6 | 0.253 | 0.344 | 0.448 |
CompGCN | 19.2 | 0.159 | 0.203 | 0.258 |
In this way, we can easily get the performance of the current dataset on multiple classical models, and we can further tune the parameters of the model in order to get stronger baseline results.
Such a process is the same for other datasets, and we can very conveniently use NeuralKG to provide stronger baselines for more classical models when writing papers and conducting comparison experiments.