Graph Representation Learning on Heterogeneous Graphs
Extracting contextual knowledge with Metapath2Vec
Graphs are obiquitous. Fun to work with.
They have a strong background theory and are able to represent from simple to complex systems in a very compact way.
The thing is, for us working day by day with machine and deep learning models, a graph structure is not the most comfortable data structure to deal with and to train models on.
After all, everything that a Random Forest model expects is just a bunch of numbers.
Say you want to infer the membership of a particular node in a graph.
Yeah, you could get the adjacency matrix, extract some node features (degree, centralities…) and train some supervised model over those but I believe you would be still considering each row as i.i.d. when they’re not, of course. It’s a graph.
I can tell that using a contextual vector which condenses all structural information of a node would be much more informative and this is why I decided to write a post about this way of representing graphs, the so-called graph representation learning.
One of the most notable algorithms is Node2Vec.
It’s a really nice algorithm and I suggest you to read the original paper. Basically, what it does is to generate ‘sentences’ of nodes by performing biased random walks over the graph, and then uses Word2Vec to learn contextual vectors.
This setting deals only with homogeneous graphs, or better, it does not make distinctions on the type of nodes.
A specialized version of it, called Metapath2Vec, uses metapath-based random walks to learn feature representations for nodes which have different type within the same graph.
First of all, what is a meta-path? A meta-path P is a path which defines how a random walker should traverse different types of nodes.
For example, if we consider the MovieLens dataset and a build a graph with the User, Movie and Genre types the UMGMU represents a path between users who watch a movie of the same genre.
So, from homogeneous walks you switch to heterogeneous ones and then you train a heterogeneous skip-gram model on those. That’s it.
So, in this way, you can adapt graph representation learning to real-world complex graphs.
Learning on heterogeneous graphs: Python example
Let’s go from words to facts.
In order to validate the representational capabilities of the Metapath2Vec model, I decided to embed a graph which I build out of the following dataset: https://grouplens.org/datasets/movielens/100k/
In particular, I considered the User (with user_id field) and the Movie (with movie_id and genre fields) tables, and I created nodes for all users, movies and genres. I then built a graph, connecting users to the movies they rated with 3 or more and movies to the related genres. The desired format for Pytorch Geometric is the following, where you use a dict with tuples that represent relations as keys and values are source and target node indices.
{('user_id', 'has_liked', 'item_id'):
tensor([[ 1, 2, 12, ..., 5780, 5851, 5938],
[1193, 1193, 1193, ..., 2845, 3607, 2909]]),('item_id', 'has_genres', 'genre'):
tensor([[ 1, 1, 1, ..., 3951, 3952, 3952],
[ 2, 3, 4, ..., 7, 7, 15]]), ('genre', 'belongs_to', 'item_id'):
tensor([[ 2, 3, 4, ..., 7, 7, 15],
[ 1, 1, 1, ..., 3951, 3952, 3952]]),('item_id','liked_by','user_id'):
tensor([[1193, 1193, 1193, ..., 2845, 3607, 2909],
[ 1, 2, 12, ..., 5780, 5851, 5938]])}
I instantiated the model with default hyperparameters and created the specific metapath I want for random walks as a list of tuples representing the inner sub-paths.
from torch_geometric.nn import MetaPath2Vecmetapath=[('user_id','has_liked','item_id'),('item_id','has_genres','genre'),('genre','belongs_to','item_id'),('item_id','liked_by','user_id') ]model = MetaPath2Vec(edge_index_dict=edge_index_dict, embedding_dim=128,metapath=metapath,walk_length=100, context_size=10, walks_per_node=1, num_negative_samples=5, sparse=True).to('cpu')
To train the model, I needed a data loader and an Optimizer (the glorious Adam):
loader = model.loader(batch_size=128, shuffle=True, num_workers=8)optimizer = torch.optim.SparseAdam(list(model.parameters()), lr=0.025)
With the following training function:
def train(model, epoch, log_steps=100, eval_steps=2000): model.train()
global loss
total_loss = 0
for i, (pos_rw, neg_rw) in tqdm(enumerate(loader)): optimizer.zero_grad()
loss = model.loss(pos_rw.to('cpu'), neg_rw.to('cpu')
loss.backward()
optimizer.step()
total_loss += loss.item()
And the training loop:
epoch_loss = []for epoch in range(1, 11): train(model,epoch)
print(f"epoch: {epoch}, loss: {loss}")
epoch_loss.append(loss)
Which produced the following loss plot:
If I embed all points with UMAP in 3D, I obtain this nice separated plot:
As you can see, all node types are well separated. Moreover, there are distinct movie clusters. The same can’t be told for user node types, and this may be due to the fact that I didn’t give any other specific user attribute for relations/metapaths as I did with genres for the movies.
Another nice thing we get for free is node similarity. Check the cosine similarity between genres:
Another thing to improve embeddings could be generating multiple metapaths and merging embeddings from those multi-modalities.
Hope you enjoyed the article, and until next time!