Unify Python Client¶
Version: 0.1
Client¶
-
class
tamr_unify_client.
Client
(auth, host='localhost', protocol='http', port=9100, base_path='api/versioned/v1')[source]¶ Python Client for Unify API. Each client is specific to a specific origin (protocol, host, port).
Parameters: - auth (
requests.auth.AuthBase
) – Unify-compatible Authentication provider. Recommended: use one of the classes described in Authentication - host (str) – Host address of remote Unify instance (e.g. 10.0.10.0). Default: ‘localhost’
- protocol (str) – Either ‘http’ or ‘https’. Default: ‘http’
- port (int) – Unify instance main port. Default: 9100
- base_path (str) – Base API path. Requests made by this client will be relative to this path. Default:
"api/versioned/v1"
- Usage:
>>> import tamr_unify_client as api >>> from tamr_unify_client.auth import UsernamePasswordAuth >>> auth = UsernamePasswordAuth('my username', 'my password') >>> local = api.Client(auth) # on http://localhost:9100 >>> remote = api.Client(auth, protocol='https', host='10.0.10.0') # on https://10.0.10.0:9100
-
origin
¶ HTTP origin i.e.
<protocol>://<host>[:<port>]
. For additional information, see MDN web docs .Type: str
-
request
(method, endpoint, **kwargs)[source]¶ Sends an authenticated request to the server. The URL for the request will be
"<origin>/<base_path>/<endpoint"
.Parameters:
-
projects
¶ Collection of all projects on this Unify instance.
Returns: Collection of all projects. Return type: ProjectCollection
-
datasets
¶ Collection of all datasets on this Unify instance.
Returns: Collection of all datasets. Return type: DatasetCollection
- auth (
Datasets¶
-
class
tamr_unify_client.models.dataset.collection.
DatasetCollection
(client, api_path='datasets')[source]¶ Collection of
Dataset
s.Parameters: -
by_resource_id
(resource_id)[source]¶ Retrieve a dataset by resource ID.
Parameters: resource_id (str) – The resource ID. E.g. "1"
Returns: The specified dataset. Return type: Dataset
-
by_relative_id
(relative_id)[source]¶ Retrieve a dataset by relative ID.
Parameters: relative_id (str) – The resource ID. E.g. "datasets/1"
Returns: The specified dataset. Return type: Dataset
-
stream
()[source]¶ Stream datasets in this collection. Implicitly called when iterating over this collection.
Returns: Stream of datasets. Return type: Python generator yielding Dataset
- Usage:
>>> for dataset in collection.stream(): # explicit >>> do_stuff(dataset) >>> for dataset in collection: # implicit >>> do_stuff(dataset)
-
-
class
tamr_unify_client.models.dataset.resource.
Dataset
(client, data, alias=None)[source]¶ A Unify dataset.
-
update_records
(records)[source]¶ Send a batch of record creations/updates/deletions to this dataset.
Parameters: records (list[dict]) – Each record should be formatted as specified in the Public Docs for Dataset updates.
-
refresh
(**options)[source]¶ Brings dataset up-to-date if needed, taking whatever actions are required.
Parameters: **options – Options passed to underlying Operation
. Seeapply_options()
.
-
Projects¶
-
class
tamr_unify_client.models.project.collection.
ProjectCollection
(client, api_path='projects')[source]¶ Collection of
Project
s.Parameters: -
by_resource_id
(resource_id)[source]¶ Retrieve a project by resource ID.
Parameters: resource_id (str) – The resource ID. E.g. "1"
Returns: The specified project. Return type: Project
-
by_relative_id
(relative_id)[source]¶ Retrieve a project by relative ID.
Parameters: relative_id (str) – The resource ID. E.g. "projects/1"
Returns: The specified project. Return type: Project
-
stream
()[source]¶ Stream projects in this collection. Implicitly called when iterating over this collection.
Returns: Stream of projects. Return type: Python generator yielding Project
- Usage:
>>> for project in collection.stream(): # explicit >>> do_stuff(project) >>> for project in collection: # implicit >>> do_stuff(project)
-
-
class
tamr_unify_client.models.project.resource.
Project
(client, data, alias=None)[source]¶ A Unify project.
-
unified_dataset
()[source]¶ Unified dataset for this project.
Returns: Unified dataset for this project. Return type: Dataset
-
as_categorization
()[source]¶ Convert this project to a
CategorizationProject
Returns: This project. Return type: CategorizationProject
Raises: TypeError – If the type
of this project is not"CATEGORIZATION"
-
as_mastering
()[source]¶ Convert this project to a
MasteringProject
Returns: This project. Return type: MasteringProject
Raises: TypeError – If the type
of this project is not"DEDUP"
-
-
class
tamr_unify_client.models.project.categorization.
CategorizationProject
(client, data, alias=None)[source]¶ A Categorization project in Unify.
-
model
()[source]¶ Machine learning model for this Categorization project. Learns from verified labels and predicts categorization labels for unlabeled records.
Returns: The machine learning model for categorization. Return type: MachineLearningModel
-
as_categorization
()¶ Convert this project to a
CategorizationProject
Returns: This project. Return type: CategorizationProject
Raises: TypeError – If the type
of this project is not"CATEGORIZATION"
-
as_mastering
()¶ Convert this project to a
MasteringProject
Returns: This project. Return type: MasteringProject
Raises: TypeError – If the type
of this project is not"DEDUP"
-
-
class
tamr_unify_client.models.project.mastering.
MasteringProject
(client, data, alias=None)[source]¶ A Mastering project in Unify.
-
pairs
()[source]¶ Record pairs generated by Unify’s binning model. Pairs are displayed on the “Pairs” page in the Unify UI.
Call
refresh()
from this dataset to regenerate pairs according to the latest binning model.Returns: The record pairs represented as a dataset. Return type: Dataset
-
pair_matching_model
()[source]¶ Machine learning model for pair-matching for this Mastering project. Learns from verified labels and predicts categorization labels for unlabeled pairs.
Calling
predict()
from this dataset will produce new (unpublished) clusters. These clusters are displayed on the “Clusters” page in the Unify UI.Returns: The machine learning model for pair-matching. Return type: MachineLearningModel
-
high_impact_pairs
()[source]¶ High-impact pairs as a dataset. Unify labels pairs as “high-impact” if labeling these pairs would help it learn most quickly (i.e. “Active learning”).
High-impact pairs are displayed with a ⚡ lightning bolt icon on the “Pairs” page in the Unify UI.
Call
refresh()
from this dataset to produce new high-impact pairs according to the latest pair-matching model.Returns: The high-impact pairs represented as a dataset. Return type: Dataset
-
published_clusters
()[source]¶ Published record clusters generated by Unify’s pair-matching model.
Call
refresh()
from this dataset to republish clusters according to the latest clustering.Returns: The published clusters represented as a dataset. Return type: Dataset
-
as_categorization
()¶ Convert this project to a
CategorizationProject
Returns: This project. Return type: CategorizationProject
Raises: TypeError – If the type
of this project is not"CATEGORIZATION"
-
as_mastering
()¶ Convert this project to a
MasteringProject
Returns: This project. Return type: MasteringProject
Raises: TypeError – If the type
of this project is not"DEDUP"
-
Machine Learning Models¶
-
class
tamr_unify_client.models.machine_learning_model.
MachineLearningModel
(client, data, alias=None)[source]¶ A Unify Machine Learning model.
-
train
(**options)[source]¶ Learn from verified labels.
Parameters: **options – Options passed to underlying Operation
. Seeapply_options()
.
-
predict
(**options)[source]¶ Suggest labels for unverified records.
Parameters: **options – Options passed to underlying Operation
. Seeapply_options()
.
-
Operations¶
-
class
tamr_unify_client.models.operation.
Operation
(client, data, alias=None)[source]¶ A long-running operation performed by Unify. Operations appear on the “Jobs” page of the Unify UI.
By design, client-side operations represent server-side operations at a particular point in time (namely, when the operation was fetched from the server). In other words: Operations will not pick up on server-side changes automatically. To get an up-to-date representation, refetch the operation e.g.
op = op.poll()
.-
apply_options
(asynchronous=False, **options)[source]¶ Applies operation options to this operation.
NOTE: This function should not be called directly. Rather, options should be passed in through a higher-level function e.g.
refresh()
.- Synchronous mode:
- Automatically waits for operation to resolve before returning the operation.
- asynchronous mode:
- Immediately return the
'PENDING'
operation. It is up to the user to coordinate this operation with their code viawait()
and/orpoll()
.
Parameters: Returns: Operation with options applied.
Return type:
-
state
¶ Server-side state of this operation.
Operation state can be unresolved (i.e.
state
is one of:'PENDING'
,'RUNNING'
), or resolved (i.e. state is one of:'CANCELED'
,'SUCCEEDED'
,'FAILED'
). Unless opting into asynchronous mode, all exposed operations should be resolved.Note: you only need to manually pick up server-side changes when opting into asynchronous mode when kicking off this operation.
- Usage:
>>> op.status # operation is currently 'PENDING' 'PENDING' >>> op.wait() # continually polls until operation resolves >>> op.status # incorrect usage; operation object status never changes. 'PENDING' >>> op = op.poll() # correct usage; use value returned by Operation.poll or Operation.wait >>> op.status 'SUCCEEDED'
-
poll
()[source]¶ Poll this operation for server-side updates.
Does not update the calling
Operation
object. Instead, returns a newOperation
.Returns: Updated representation of this operation. Return type: Operation
-
wait
(poll_interval_seconds=3, timeout_seconds=None)[source]¶ Continuously polls for this operation’s server-side state.
Parameters: Raises: TimeoutError – If operation takes longer than timeout_seconds to resolve.
Returns: Resolved operation.
Return type: Operation
-
Authentication¶
-
class
tamr_unify_client.auth.
UsernamePasswordAuth
(username, password)[source]¶ Provides username/password authentication for Unify. Specifically, sets the Authorization HTTP header with Unify’s custom BasicCreds format.
Parameters: - Usage:
>>> from tamr_unify_client.auth import UsernamePasswordAuth >>> auth = UsernamePasswordAuth('my username', 'my password') >>> import tamr_unify_client as api >>> unify = api.Client(auth)