This repository provides an implementation of node2vec extended with restart probabilities and ensembles:
The extensions are added by Koen Bouwman and Jerry Schonenberg
node2vec is introduced by Aditya Grover and Jure Leskovec.
To run node2vec on the email-Eu-core dataset, execute the following command from the project home directory:
python src/main.py --input email-Eu-core.edgelist --labels email-Eu-core.labels --output results-email-Eu-core
You can check out the other options available to use with node2vec using:
python src/main.py --help
We have added the following parameters to configure the added functionality:
- To configure the bayesian optimisation:
-
--train_setto specify the proportion of dataset used for optimisation -
--bayesian_optto toggle Enable bayesian optimisation -
--iter_bayesianto specify the number of iterations for bayesian optimisation -
--scoringto specify how to evaluate each iteration of bayesian optimisation -
--cross_validationto specify the size of cross validation -
--replicationsto specify the number of replications to evaluate hyperparameter configuration
-
- To configure the restart method:
-
--restartsto toggle the restart functionality -
--tauto set the$tau$ parameter -
--omegato set the$\omega$ parameter -
--epsilonto set the$\varepsilon$ parameter -
--sto set the$s$ parameter
-
- To configure the ensemble method:
-
--partitionsto define how many ensembles you want -
--pnow also supports a sequence of floats -
--qnow also supports a sequence of floats
-
To find the post_processing.py
To run post_process on the email-Eu-core dataset, execute the following command from the project home directory:
python src/post_process.py --dir results-email-Eu-core --partitions 4 --read --write
To run learn about the options for post_process execute the following command from the project home directory:
python src/post_process.py --help
The supported input format is an edgelist:
node1_id_int node2_id_int <weight_float, optional>
The graph is assumed to be undirected and unweighted by default. These options can be changed by setting the appropriate flags.
The output file directory contains the following
cl_args.json: a file with the settings of all the calleble arguments- The
evaldirectory, which contains the following:- directories for each replication with an embeddings.pkl file containing the vector embedding of the input graph for that replication
results.csv: a file with the results of the classifier over all replicationsbest_settings.json: a file that contains the best settings for each calleble argument
- If the program was called with the
--bayesian_optflag the following will also be in the output directory:BO_opt*.pdf: a plot of the bayesian optimisationopt_results.pkl: the scores of each configuration of the bayesian optimisation run
If you find node2vec useful for your research, please consider citing the following paper:
@inproceedings{node2vec-kdd2016,
author = {Grover, Aditya and Leskovec, Jure},
title = {node2vec: Scalable Feature Learning for Networks},
booktitle = {Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
year = {2016}
}