Sampled in Pairs and Driven by Text: A New Graph Embedding Framework

Published in WWW 2019, 2018



In graphs with rich texts, incorporating textual information with structural information would benefit constructing expressive graph embeddings. Among various graph embedding models, random walk (RW)-based is one of the most popular and successful groups. However, it is challenged by two issues when applied on graphs with rich texts: (i) sampling efficiency: deriving from the training objective of RW-based models (e.g., DeepWalk and node2vec), we show that RW-based models are likely to generate large amounts of redundant training samples due to three main drawbacks. (ii) text utilization: these models have difficulty in dealing with zero-shot scenarios where graph embedding models have to infer graph structures directly from texts. To solve these problems, we propose a novel framework, namely Text-driven Graph Embedding with Pairs Sampling (TGE-PS). TGE-PS uses Pairs Sampling (PS) to improve the sampling strategy of RW, being able to reduce ~99% training samples while preserving competitive performance. TGE-PS uses Text-driven Graph Embedding (TGE), an inductive graph embedding approach, to generate node embeddings from texts. Since each node contains rich texts, TGE is able to generate high-quality embeddings and provide reasonable predictions on existence of links to unseen nodes. We evaluate TGE-PS on several real-world datasets, and experiment results demonstrate that TGE-PS produces state-of-the-art results on both traditional and zero-shot link prediction tasks.