|
| 1 | +# graph2vec |
| 2 | + |
| 3 | +This repository contains the "tensorflow" implementation of our paper "graph2vec: Learning distributed representations of graphs". |
| 4 | +The paper could be found at: https://arxiv.org/pdf/1707.05005.pdf |
| 5 | + |
| 6 | + |
| 7 | +#### Dependencies |
| 8 | +This code is developed in python 2.7. It is ran and tested on Ubuntu 16.04. |
| 9 | +It uses the following python packages: |
| 10 | +1. tensorflow (version == 1.4.0) |
| 11 | +2. networkx (version <= 2.0) |
| 12 | +4. scikit-learn (+scipy, +numpy) |
| 13 | + |
| 14 | +##### The procedure for setting up graph2vec is as follows: |
| 15 | + 1. git clone the repository (command: git clone https://github.com/MLDroid/graph2vec_tf.git ) |
| 16 | + 2. untar the data.tar.gz tarball |
| 17 | + |
| 18 | +##### The procedure for obtaining rooted graph vectors using graph2vec and performing graph classification is as follows: |
| 19 | + 1. move to the folder "src" (command: cd src) (also make sure that kdd 2015 paper's (Deep Graph Kernels) datasets are available in '../data/kdd_datasets/dir_graphs/') |
| 20 | + 2. run main.py --corpus <dataset of graph files> --class_labels_file_name <file containing class labels of graphs to be used for graph classification> file to: |
| 21 | + *Generate the weisfeiler-lehman kernel's rooted subgraphs from all the graphs |
| 22 | + *Train skipgram model to learn graph embeddings. The same will be dumped in ../embeddings/ folder |
| 23 | + *Perform graph classification using the graph embeddings generated in the above step |
| 24 | + 3. example: |
| 25 | + *python main.py --corpus ../data/kdd_datasets/mutag --class_labels_file_name ../data/kdd_datasets/mutag.Labels |
| 26 | + *python main.py --corpus ../data/kdd_datasets/proteins --class_labels_file_name ../data/kdd_datasets/proteins.Labels --batch_size 16 --embedding_size 128 --num_negsample 5 |
| 27 | + |
| 28 | + |
| 29 | +#### Other command line args: |
| 30 | + optional arguments: |
| 31 | + -h, --help show this help message and exit |
| 32 | + -c CORPUS, --corpus CORPUS |
| 33 | + Path to directory containing graph files to be used |
| 34 | + for graph classification or clustering |
| 35 | + -l CLASS_LABELS_FILE_NAME, --class_labels_file_name CLASS_LABELS_FILE_NAME |
| 36 | + File name containg the name of the sample and the |
| 37 | + class labels |
| 38 | + -o OUTPUT_DIR, --output_dir OUTPUT_DIR |
| 39 | + Path to directory for storing output embeddings |
| 40 | + -b BATCH_SIZE, --batch_size BATCH_SIZE |
| 41 | + Number of samples per training batch |
| 42 | + -e EPOCHS, --epochs EPOCHS |
| 43 | + Number of iterations the whole dataset of graphs is |
| 44 | + traversed |
| 45 | + -d EMBEDDING_SIZE, --embedding_size EMBEDDING_SIZE |
| 46 | + Intended graph embedding size to be learnt |
| 47 | + -neg NUM_NEGSAMPLE, --num_negsample NUM_NEGSAMPLE |
| 48 | + Number of negative samples to be used for training |
| 49 | + -lr LEARNING_RATE, --learning_rate LEARNING_RATE |
| 50 | + Learning rate to optimize the loss function |
| 51 | + |
| 52 | + --wlk_h WLK_H Height of WL kernel (i.e., degree of rooted subgraph |
| 53 | + features to be considered for representation learning) |
| 54 | + -lf LABEL_FILED_NAME, --label_filed_name LABEL_FILED_NAME |
| 55 | + Label field to be used for coloring nodes in graphs |
| 56 | + using WL kenrel |
| 57 | + |
| 58 | +## Contact ## |
| 59 | +In case of queries, please email: annamala002@e.ntu.edu.sg OR XZHANG048@e.ntu.edu.sg |
| 60 | + |
| 61 | +#### Reference |
| 62 | + |
| 63 | + Please consider citing the follow paper when you use this code. |
| 64 | + @article{narayanangraph2vec, |
| 65 | + title={graph2vec: Learning distributed representations of graphs}, |
| 66 | + author={Narayanan, Annamalai and Chandramohan, Mahinthan and Venkatesan, Rajasekar and Chen, Lihui and Liu, Yang} |
| 67 | + } |
| 68 | + |
| 69 | + |
| 70 | + |
| 71 | + |
| 72 | + |
0 commit comments