klinker

klinker overview

Installation

Clone the repo and change into the directory:

git clone https://github.com/dobraczka/klinker.git
cd klinker

For usage with GPU create a micromamba environment:

micromamba env create -n klinker-conda --file=klinker-conda.yaml

Activate it and install the remaining dependencies:

mamba activate klinker-conda
pip install -e .

Alternatively if you don't intend to utilize a GPU you can install it in a virtual environment:

python -m venv klinker-env
source klinker-env/bin/activate
pip install -e .[all]

or via poetry:

poetry install

Usage

Load a dataset:

from sylloge import MovieGraphBenchmark
from klinker.data import KlinkerDataset

ds = KlinkerDataset.from_sylloge(MovieGraphBenchmark(graph_pair="tmdb-tvdb"))

Create blocks and write to parquet:

from klinker.blockers import SimpleRelationalTokenBlocker

blocker = SimpleRelationalTokenBlocker()
blocks = blocker.assign(left=ds.left, right=ds.right, left_rel=ds.left_rel, right_rel=ds.right_rel)
blocks.to_parquet("tmdb-tvdb-tokenblocked")

Read blocks from parquet and evaluate:

from klinker import KlinkerBlockManager
from klinker.eval_metrics import Evaluation

kbm = KlinkerBlockManager.read_parqet("tmdb-tvdb-tokenblocked")
ev = Evaluation.from_dataset(blocks=kbm, dataset=ds)

Reproduce Experiments

The experiment.py has commands for datasets and blockers. You can use python experiment.py --help to show the available commands. Subcommands can also offer help e.g. python experiment.py gcn-blocker --help.

You have to use a dataset command before a blocker command.

For example if you used micromamba for installation:

micromamba run -n klinker-conda python experiment.py movie-graph-benchmark-dataset --graph-pair "tmdb-tvdb" relational-token-blocker

This would be similar to the steps described in the above usage section.