Skip to content

deepblocker

DeepBlocker

Bases: EmbeddingBlocker

Base class for DeepBlocker strategies.

Parameters:

Name Type Description Default
frame_encoder HintOrType[DeepBlockerFrameEncoder]

DeepBlockerFrameEncoder: DeepBlocker strategy.

None
frame_encoder_kwargs OptionalKwargs

keyword arguments for initialisation of encoder

None
embedding_block_builder_kwargs OptionalKwargs

keyword arguments for initalising blockbuilder.

None
save bool

If true saves the embeddings before using blockbuilding.

True
save_dir Optional[Union[str, Path]]

Directory where to save the embeddings.

None
force bool

If true, recalculate the embeddings and overwrite existing. Else use precalculated if present.

False

Attributes:

Name Type Description
frame_encoder

DeepBlocker Encoder class to use for embedding the datasets.

embedding_block_builder

Block building class to create blocks from embeddings.

save

If true saves the embeddings before using blockbuilding.

save_dir

Directory where to save the embeddings.

force

If true, recalculate the embeddings and overwrite existing. Else use precalculated if present.

Examples:

>>> # doctest: +SKIP
>>> from sylloge import MovieGraphBenchmark
>>> from klinker.data import KlinkerDataset
>>> ds = KlinkerDataset.from_sylloge(MovieGraphBenchmark(),clean=True)
>>> from klinker.blockers import DeepBlocker
>>> blocker = DeepBlocker(frame_encoder="autoencoder")
>>> blocks = blocker.assign(left=ds.left, right=ds.right)
Reference

Thirumuruganathan et. al. 'Deep Learning for Blocking in Entity Matching: A Design Space Exploration', VLDB 2021, http://vldb.org/pvldb/vol14/p2459-thirumuruganathan.pdf

Source code in klinker/blockers/embedding/deepblocker.py
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
class DeepBlocker(EmbeddingBlocker):
    """Base class for DeepBlocker strategies.

    Args:
        frame_encoder: DeepBlockerFrameEncoder: DeepBlocker strategy.
        frame_encoder_kwargs: keyword arguments for initialisation of encoder
        embedding_block_builder_kwargs: keyword arguments for initalising blockbuilder.
        save: If true saves the embeddings before using blockbuilding.
        save_dir: Directory where to save the embeddings.
        force: If true, recalculate the embeddings and overwrite existing. Else use precalculated if present.

    Attributes:
        frame_encoder: DeepBlocker Encoder class to use for embedding the datasets.
        embedding_block_builder: Block building class to create blocks from embeddings.
        save: If true saves the embeddings before using blockbuilding.
        save_dir: Directory where to save the embeddings.
        force: If true, recalculate the embeddings and overwrite existing. Else use precalculated if present.


    Examples:

        >>> # doctest: +SKIP
        >>> from sylloge import MovieGraphBenchmark
        >>> from klinker.data import KlinkerDataset
        >>> ds = KlinkerDataset.from_sylloge(MovieGraphBenchmark(),clean=True)
        >>> from klinker.blockers import DeepBlocker
        >>> blocker = DeepBlocker(frame_encoder="autoencoder")
        >>> blocks = blocker.assign(left=ds.left, right=ds.right)

    Quote: Reference
        Thirumuruganathan et. al. 'Deep Learning for Blocking in Entity Matching: A Design Space Exploration', VLDB 2021, <http://vldb.org/pvldb/vol14/p2459-thirumuruganathan.pdf>
    """

    def __init__(
        self,
        frame_encoder: HintOrType[DeepBlockerFrameEncoder] = None,
        frame_encoder_kwargs: OptionalKwargs = None,
        embedding_block_builder: HintOrType[EmbeddingBlockBuilder] = None,
        embedding_block_builder_kwargs: OptionalKwargs = None,
        save: bool = True,
        save_dir: Optional[Union[str, pathlib.Path]] = None,
        force: bool = False,
    ):
        frame_encoder = deep_blocker_encoder_resolver.make(
            frame_encoder, frame_encoder_kwargs
        )
        super().__init__(
            frame_encoder=frame_encoder,
            embedding_block_builder=embedding_block_builder,
            embedding_block_builder_kwargs=embedding_block_builder_kwargs,
            save=save,
            save_dir=save_dir,
            force=force,
        )