Bases: EmbeddingBlocker
Base class for DeepBlocker strategies.
frame_encoder: DeepBlockerFrameEncoder: DeepBlocker strategy.
frame_encoder_kwargs: keyword arguments for initialisation of encoder
embedding_block_builder_kwargs: keyword arguments for initalising blockbuilder.
save: If true saves the embeddings before using blockbuilding.
save_dir: Directory where to save the embeddings.
force: If true, recalculate the embeddings and overwrite existing. Else use precalculated if present.
frame_encoder: DeepBlocker Encoder class to use for embedding the datasets.
embedding_block_builder: Block building class to create blocks from embeddings.
save: If true saves the embeddings before using blockbuilding.
save_dir: Directory where to save the embeddings.
force: If true, recalculate the embeddings and overwrite existing. Else use precalculated if present.
Examples:
>>> # doctest: +SKIP
>>> from sylloge import MovieGraphBenchmark
>>> from klinker.data import KlinkerDataset
>>> ds = KlinkerDataset.from_sylloge(MovieGraphBenchmark(),clean=True)
>>> from klinker.blockers import DeepBlocker
>>> blocker = DeepBlocker(frame_encoder="autoencoder")
>>> blocks = blocker.assign(left=ds.left, right=ds.right)
Reference
Thirumuruganathan et. al. 'Deep Learning for Blocking in Entity Matching: A Design Space Exploration', VLDB 2021, http://vldb.org/pvldb/vol14/p2459-thirumuruganathan.pdf
Source code in klinker/blockers/embedding/deepblocker.py
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70 | class DeepBlocker(EmbeddingBlocker):
"""Base class for DeepBlocker strategies.
Args:
----
frame_encoder: DeepBlockerFrameEncoder: DeepBlocker strategy.
frame_encoder_kwargs: keyword arguments for initialisation of encoder
embedding_block_builder_kwargs: keyword arguments for initalising blockbuilder.
save: If true saves the embeddings before using blockbuilding.
save_dir: Directory where to save the embeddings.
force: If true, recalculate the embeddings and overwrite existing. Else use precalculated if present.
Attributes:
----------
frame_encoder: DeepBlocker Encoder class to use for embedding the datasets.
embedding_block_builder: Block building class to create blocks from embeddings.
save: If true saves the embeddings before using blockbuilding.
save_dir: Directory where to save the embeddings.
force: If true, recalculate the embeddings and overwrite existing. Else use precalculated if present.
Examples:
--------
>>> # doctest: +SKIP
>>> from sylloge import MovieGraphBenchmark
>>> from klinker.data import KlinkerDataset
>>> ds = KlinkerDataset.from_sylloge(MovieGraphBenchmark(),clean=True)
>>> from klinker.blockers import DeepBlocker
>>> blocker = DeepBlocker(frame_encoder="autoencoder")
>>> blocks = blocker.assign(left=ds.left, right=ds.right)
Quote: Reference
Thirumuruganathan et. al. 'Deep Learning for Blocking in Entity Matching: A Design Space Exploration', VLDB 2021, <http://vldb.org/pvldb/vol14/p2459-thirumuruganathan.pdf>
"""
def __init__(
self,
frame_encoder: HintOrType[DeepBlockerFrameEncoder] = None,
frame_encoder_kwargs: OptionalKwargs = None,
embedding_block_builder: HintOrType[EmbeddingBlockBuilder] = None,
embedding_block_builder_kwargs: OptionalKwargs = None,
save: bool = True,
save_dir: Optional[Union[str, pathlib.Path]] = None,
force: bool = False,
):
frame_encoder = deep_blocker_encoder_resolver.make(
frame_encoder, frame_encoder_kwargs
)
super().__init__(
frame_encoder=frame_encoder,
embedding_block_builder=embedding_block_builder,
embedding_block_builder_kwargs=embedding_block_builder_kwargs,
save=save,
save_dir=save_dir,
force=force,
)
|