Developer Interface¶

Authentication¶

class tamr_unify_client.auth.UsernamePasswordAuth(username, password)[source]¶

Provides username/password authentication for Unify. Specifically, sets the Authorization HTTP header with Unify’s custom BasicCreds format.

Parameters:	username (str) – password (str) –

Usage:

>>> from tamr_unify_client.auth import UsernamePasswordAuth
>>> auth = UsernamePasswordAuth('my username', 'my password')
>>> import tamr_unify_client as api
>>> unify = api.Client(auth)

Client¶

class tamr_unify_client.Client(auth, host='localhost', protocol='http', port=9100, base_path='/api/versioned/v1/', session=None)[source]¶

Python Client for Unify API. Each client is specific to a specific origin (protocol, host, port).

Parameters:

auth (requests.auth.AuthBase) – Unify-compatible Authentication provider. Recommended: use one of the classes described in Authentication
host (str) – Host address of remote Unify instance (e.g. 10.0.10.0). Default: ‘localhost’
protocol (str) – Either ‘http’ or ‘https’. Default: ‘http’
port (int) – Unify instance main port. Default: 9100
base_path (str) – Base API path. Requests made by this client will be relative to this path. Default: ‘api/versioned/v1/’
session (requests.Session) – Session to use for API calls. Default: A new default requests.Session().

Usage:

>>> import tamr_unify_client as api
>>> from tamr_unify_client.auth import UsernamePasswordAuth
>>> auth = UsernamePasswordAuth('my username', 'my password')
>>> local = api.Client(auth) # on http://localhost:9100
>>> remote = api.Client(auth, protocol='https', host='10.0.10.0') # on https://10.0.10.0:9100

origin¶

HTTP origin i.e. <protocol>://<host>[:<port>]. For additional information, see MDN web docs .

Type:	str

request(method, endpoint, **kwargs)[source]¶

Sends an authenticated request to the server. The URL for the request will be "<origin>/<base_path>/<endpoint>".

Parameters:	method (str) – The HTTP method for the request to be sent. endpoint (str) – API endpoint to call (relative to the Base API path for this client).
Returns:	HTTP response
Return type:	`requests.Response`

get(endpoint, **kwargs)[source]¶: Calls request() with the "GET" method.

post(endpoint, **kwargs)[source]¶: Calls request() with the "POST" method.

put(endpoint, **kwargs)[source]¶: Calls request() with the "PUT" method.

delete(endpoint, **kwargs)[source]¶: Calls request() with the "DELETE" method.

projects¶

Collection of all projects on this Unify instance.

Returns:	Collection of all projects.
Return type:	`ProjectCollection`

datasets¶

Collection of all datasets on this Unify instance.

Returns:	Collection of all datasets.
Return type:	`DatasetCollection`

Dataset¶

class tamr_unify_client.models.dataset.resource.Dataset(client, data, alias=None)[source]¶

A Unify dataset.

name¶

Type:	str

external_id¶

Type:	str

description¶

Type:	str

version¶

Type:	str

tags¶

Type:	list[str]

key_attribute_names¶

Type:	list[str]

attributes¶

Attributes of this dataset.

Returns:	Attributes of this dataset.
Return type:	`AttributeCollection`

update_records(records)[source]¶

Send a batch of record creations/updates/deletions to this dataset.

Parameters:	records (iterable[dict]) – Each record should be formatted as specified in the Public Docs for Dataset updates.
Returns:	JSON response body from server.
Return type:	`dict`

refresh(**options)[source]¶: Brings dataset up-to-date if needed, taking whatever actions are required. :param **options: Options passed to underlying Operation .

See apply_options() .

profile()[source]¶

Returns profile information for a dataset.

If profile information has not been generated, call create_profile() first. If the returned profile information is out-of-date, you can call refresh() on the returned object to bring it up-to-date.

Parameters:	**options – Options passed to underlying `Operation` .
Returns:	Dataset Profile information.
Return type:	`DatasetProfile`

create_profile(**options)[source]¶

Create a profile for this dataset.

If a profile already exists, the existing profile will be brought up to date.

Parameters:	**options – Options passed to underlying `Operation` . See `apply_options()` .
Returns:	the operation to create the profile.

records()[source]¶

Stream this dataset’s records as Python dictionaries.

Returns:	Stream of records.
Return type:	Python generator yielding `dict`

status() → tamr_unify_client.models.dataset_status.DatasetStatus[source]¶

Retrieve this dataset’s streamability status.

Returns:	Dataset streamability status.
Return type:	`DatasetStatus`

from_geo_features(features, geo_attr=None)[source]¶

Upsert this dataset from a geospatial FeatureCollection or iterable of Features.

features can be:

An object that implements __geo_interface__ as a FeatureCollection (see https://gist.github.com/sgillies/2217756)
An iterable of features, where each element is a feature dictionary or an object that implements the __geo_interface__ as a Feature
A map where the “features” key contains an iterable of features

See: geopandas.GeoDataFrame.from_features()

If geo_attr is provided, then the named Unify attribute will be used for the geometry. If geo_attr is not provided, then the first attribute on the dataset with geometry type will be used for the geometry.

Parameters:	features – geospatial features geo_attr (str) – (optional) name of the Unify attribute to use for the feature’s geometry

itergeofeatures(geo_attr=None)[source]¶

Returns an iterator that yields feature dictionaries that comply with __geo_interface__

See https://gist.github.com/sgillies/2217756

Parameters:	geo_attr (str) – (optional) name of the Unify attribute to use for the feature’s geometry
Returns:	stream of features
Return type:	Python generator yielding `dict[str, object]`

relative_id¶

Type:	str

resource_id¶

Type:	str

Dataset Profile¶

class tamr_unify_client.models.dataset_profile.DatasetProfile(client, data, alias=None)[source]¶

Profile info of a Unify dataset.

dataset_name¶

The name of the associated dataset.

Type:	str

relative_dataset_id¶

The relative dataset ID of the associated dataset.

Type:	str

is_up_to_date¶

Whether the associated dataset is up to date.

Type:	bool

profiled_data_version¶

The profiled data version.

Type:	str

profiled_at¶

Info about when profile info was generated.

Type:	dict

simple_metrics¶

Simple metrics for profiled dataset.

Type:	list

attribute_profiles¶

Simple metrics for profiled dataset.

Type:	list

refresh(**options)[source]¶

Updates the dataset profile if needed.

The dataset profile is updated on the server; you will need to call profile() to retrieve the updated profile.

Parameters:	**options – Options passed to underlying `Operation` . See `apply_options()` .

relative_id¶

Type:	str

resource_id¶

Type:	str

Dataset Status¶

class tamr_unify_client.models.dataset_status.DatasetStatus(client, data, alias=None)[source]¶

Streamability status of a Unify dataset.

dataset_name¶

The name of the associated dataset.

Type:	str

relative_dataset_id¶

The relative dataset ID of the associated dataset.

Type:	str

is_streamable¶

Whether the associated dataset is available to be streamed.

Type:	bool

relative_id¶

Type:	str

resource_id¶

Type:	str

Datasets¶

class tamr_unify_client.models.dataset.collection.DatasetCollection(client, api_path='datasets')[source]¶

Collection of Dataset s.

Parameters:	client (`Client`) – Client for API call delegation. api_path (str) – API path used to access this collection. E.g. `"projects/1/inputDatasets"`. Default: `"datasets"`.

by_resource_id(resource_id)[source]¶

Retrieve a dataset by resource ID.

Parameters:	resource_id (str) – The resource ID. E.g. `"1"`
Returns:	The specified dataset.
Return type:	`Dataset`

by_relative_id(relative_id)[source]¶

Retrieve a dataset by relative ID.

Parameters:	relative_id (str) – The resource ID. E.g. `"datasets/1"`
Returns:	The specified dataset.
Return type:	`Dataset`

by_external_id(external_id)[source]¶

Retrieve a dataset by external ID.

Parameters:	external_id (str) – The external ID.
Returns:	The specified dataset, if found.
Return type:	`Dataset`
Raises:	KeyError – If no dataset with the specified external_id is found LookupError – If multiple datasets with the specified external_id are found

stream()[source]¶

Stream datasets in this collection. Implicitly called when iterating over this collection.

Returns:	Stream of datasets.
Return type:	Python generator yielding `Dataset`

Usage:

>>> for dataset in collection.stream(): # explicit
>>>     do_stuff(dataset)
>>> for dataset in collection: # implicit
>>>     do_stuff(dataset)

by_name(dataset_name)[source]¶

Lookup a specific dataset in this collection by exact-match on name.

Parameters:	dataset_name (str) – Name of the desired dataset.
Returns:	Dataset with matching name in this collection.
Return type:	`Dataset`
Raises:	KeyError – If no dataset with specified name was found.

create(creation_spec)[source]¶

Create a Dataset in Unify

Parameters:	creation_spec (dict[str, str]) – Dataset creation specification should be formatted as specified in the Public Docs for Creating a Dataset.
Returns:	The created Dataset
Return type:	`Dataset`

Attribute¶

class tamr_unify_client.models.attribute.resource.Attribute(client, data, alias=None)[source]¶

A Unify Attribute.

See https://docs.tamr.com/reference#attribute-types

relative_id¶

Type:	str

name¶

Type:	str

description¶

Type:	str

type¶

Type:	`AttributeType`

is_nullable¶

Type:	bool

resource_id¶

Type:	str

Attribute Type¶

class tamr_unify_client.models.attribute.type.AttributeType(client, data, alias=None)[source]¶

relative_id¶

Type:	str

base_type¶

Type:	str

inner_type¶

Type:	`AttributeType`

attributes¶

Type:	`AttributeCollection`

resource_id¶

Type:	str

Attributes¶

class tamr_unify_client.models.attribute.collection.AttributeCollection(client, data, api_path)[source]¶

Collection of Attribute s.

Parameters:	client (`Client`) – Client for API call delegation. data (dict) – JSON data representing this resource api_path (str) – API path used to access this collection. E.g. `"datasets/1/attributes"`.

by_resource_id(resource_id)[source]¶

Retrieve an attribute by resource ID.

Parameters:	resource_id (str) – The resource ID. E.g. `"AttributeName"`
Returns:	The specified attribute.
Return type:	`Attribute`

by_relative_id(relative_id)[source]¶

Retrieve an attribute by relative ID.

Parameters:	relative_id (str) – The resource ID. E.g. `"datasets/1/attributes/AttributeName"`
Returns:	The specified attribute.
Return type:	`Attribute`

by_external_id(external_id)[source]¶

Retrieve an attribute by external ID.

Since attributes do not have external IDs, this method is not supported and will raise a NotImplementedError .

Parameters:	external_id (str) – The external ID.
Returns:	The specified attribute, if found.
Return type:	`Attribute`
Raises:	KeyError – If no attribute with the specified external_id is found LookupError – If multiple attributes with the specified external_id are found

stream()[source]¶

Stream attributes in this collection. Implicitly called when iterating over this collection.

Returns:	Stream of attributes.
Return type:	Python generator yielding `Attribute`

Usage:

>>> for attribute in collection.stream(): # explicit
>>>     do_stuff(attribute)
>>> for attribute in collection: # implicit
>>>     do_stuff(attribute)

by_name(attribute_name)[source]¶

Lookup a specific attribute in this collection by exact-match on name.

Parameters:	attribute_name (str) – Name of the desired attribute.
Returns:	Attribute with matching name in this collection.
Return type:	`Attribute`
Raises:	KeyError – If no attribute with specified name was found.

create(creation_spec)[source]¶

Create an Attribute in this collection

Parameters:	creation_spec (dict[str, str]) – Attribute creation specification should be formatted as specified in the Public Docs for adding an Attribute.
Returns:	The created Attribute
Return type:	`Attribute`

Machine Learning Models¶

class tamr_unify_client.models.machine_learning_model.MachineLearningModel(client, data, alias=None)[source]¶

A Unify Machine Learning model.

train(**options)[source]¶

Learn from verified labels.

Parameters:	**options – Options passed to underlying `Operation` . See `apply_options()` .

predict(**options)[source]¶

Suggest labels for unverified records.

Parameters:	**options – Options passed to underlying `Operation` . See `apply_options()` .

relative_id¶

Type:	str

resource_id¶

Type:	str

Operations¶

class tamr_unify_client.models.operation.Operation(client, data, alias=None)[source]¶

A long-running operation performed by Unify. Operations appear on the “Jobs” page of the Unify UI.

By design, client-side operations represent server-side operations at a particular point in time (namely, when the operation was fetched from the server). In other words: Operations will not pick up on server-side changes automatically. To get an up-to-date representation, refetch the operation e.g. op = op.poll().

apply_options(asynchronous=False, **options)[source]¶

Applies operation options to this operation.

NOTE: This function should not be called directly. Rather, options should be passed in through a higher-level function e.g. refresh() .

Synchronous mode:: Automatically waits for operation to resolve before returning the operation.
asynchronous mode:: Immediately return the 'PENDING' operation. It is up to the user to coordinate this operation with their code via wait() and/or poll() .

Parameters:	asynchronous (bool) – Whether or not to run in asynchronous mode. Default: `False`. **options – When running in synchronous mode, these options are passed to the underlying `wait()` call.
Returns:	Operation with options applied.
Return type:	`Operation`

type¶

Type:	str

description¶

Type:	str

state¶

Server-side state of this operation.

Operation state can be unresolved (i.e. state is one of: 'PENDING', 'RUNNING'), or resolved (i.e. state is one of: 'CANCELED', 'SUCCEEDED', 'FAILED'). Unless opting into asynchronous mode, all exposed operations should be resolved.

Note: you only need to manually pick up server-side changes when opting into asynchronous mode when kicking off this operation.

Usage:

>>> op.state # operation is currently 'PENDING'
'PENDING'
>>> op.wait() # continually polls until operation resolves
>>> op.state # incorrect usage; operation object state never changes.
'PENDING'
>>> op = op.poll() # correct usage; use value returned by Operation.poll or Operation.wait
>>> op.state
'SUCCEEDED'

poll()[source]¶

Poll this operation for server-side updates.

Does not update the calling Operation object. Instead, returns a new Operation.

Returns:	Updated representation of this operation.
Return type:	`Operation`

wait(poll_interval_seconds=3, timeout_seconds=None)[source]¶

Continuously polls for this operation’s server-side state.

Parameters:	poll_interval_seconds (int) – Time interval (in seconds) between subsequent polls. timeout_seconds (int) – Time (in seconds) to wait for operation to resolve.
Raises:	TimeoutError – If operation takes longer than timeout_seconds to resolve.
Returns:	Resolved operation.
Return type:	`Operation`

succeeded()[source]¶

Convenience method for checking if operation was successful.

Returns:	`True` if operation’s state is `'SUCCEEDED'`, `False` otherwise.
Return type:	`bool`

relative_id¶

Type:	str

resource_id¶

Type:	str

Project¶

class tamr_unify_client.models.project.resource.Project(client, data, alias=None)[source]¶

A Unify project.

name¶

Type:	str

external_id¶

Type:	str

description¶

Type:	str

type¶

One of:: "SCHEMA_MAPPING" "SCHEMA_MAPPING_RECOMMENDATIONS" "CATEGORIZATION" "DEDUP"

Type:	str

attributes¶

Attributes of this project.

Returns:	Attributes of this project.
Return type:	`AttributeCollection`

unified_dataset()[source]¶

Unified dataset for this project.

Returns:	Unified dataset for this project.
Return type:	`Dataset`

as_categorization()[source]¶

Convert this project to a CategorizationProject

Returns:	This project.
Return type:	`CategorizationProject`
Raises:	TypeError – If the `type` of this project is not `"CATEGORIZATION"`

as_mastering()[source]¶

Convert this project to a MasteringProject

Returns:	This project.
Return type:	`MasteringProject`
Raises:	TypeError – If the `type` of this project is not `"DEDUP"`

add_input_dataset(dataset)[source]¶

Associate a dataset with a project in Unify.

By default, datasets are not associated with any projects. They need to be added as input to a project before they can be used as part of that project

Parameters:	project – Unify Project dataset – Unify Dataset
Returns:	HTTP response from the server
Return type:	`requests.Response`

input_datasets()[source]¶

Retrieve a collection of this project’s input datasets.

Returns: The project’s input datasets.

Return type:

class:	~tamr_unify_client.models.dataset.collection.DatasetCollection

relative_id¶

Type:	str

resource_id¶

Type:	str

class tamr_unify_client.models.project.categorization.CategorizationProject(client, data, alias=None)[source]¶

A Categorization project in Unify.

model()[source]¶

Machine learning model for this Categorization project. Learns from verified labels and predicts categorization labels for unlabeled records.

Returns:	The machine learning model for categorization.
Return type:	`MachineLearningModel`

create_taxonomy(creation_spec)[source]¶

Creates a Taxonomy for this Categorization project.

A taxonomy cannot already be associated with this project.

Parameters:	creation_spec – The creation specification for the taxonomy, which can include name.
Type:	dict
Returns:	The new Taxonomy
Return type:	`Taxonomy`

taxonomy()[source]¶

Retrieves the Taxonomy associated with Categorization project.

If a taxonomy is not already associated with this project, call create_taxonomy() first.

Returns:	The project’s Taxonomy
Return type:	`Taxonomy`

add_input_dataset(dataset)¶

Associate a dataset with a project in Unify.

By default, datasets are not associated with any projects. They need to be added as input to a project before they can be used as part of that project

Parameters:	project – Unify Project dataset – Unify Dataset
Returns:	HTTP response from the server
Return type:	`requests.Response`

as_categorization()¶

Convert this project to a CategorizationProject

Returns:	This project.
Return type:	`CategorizationProject`
Raises:	TypeError – If the `type` of this project is not `"CATEGORIZATION"`

as_mastering()¶

Convert this project to a MasteringProject

Returns:	This project.
Return type:	`MasteringProject`
Raises:	TypeError – If the `type` of this project is not `"DEDUP"`

attributes¶

Attributes of this project.

Returns:	Attributes of this project.
Return type:	`AttributeCollection`

description¶

Type:	str

external_id¶

Type:	str

input_datasets()¶

Retrieve a collection of this project’s input datasets.

Returns: The project’s input datasets.

Return type:

class:	~tamr_unify_client.models.dataset.collection.DatasetCollection

name¶

Type:	str

relative_id¶

Type:	str

resource_id¶

Type:	str

type¶

One of:: "SCHEMA_MAPPING" "SCHEMA_MAPPING_RECOMMENDATIONS" "CATEGORIZATION" "DEDUP"

Type:	str

unified_dataset()¶

Unified dataset for this project.

Returns:	Unified dataset for this project.
Return type:	`Dataset`

class tamr_unify_client.models.project.mastering.MasteringProject(client, data, alias=None)[source]¶

A Mastering project in Unify.

pairs()[source]¶

Record pairs generated by Unify’s binning model. Pairs are displayed on the “Pairs” page in the Unify UI.

Call refresh() from this dataset to regenerate pairs according to the latest binning model.

Returns:	The record pairs represented as a dataset.
Return type:	`Dataset`

pair_matching_model()[source]¶

Machine learning model for pair-matching for this Mastering project. Learns from verified labels and predicts categorization labels for unlabeled pairs.

Calling predict() from this dataset will produce new (unpublished) clusters. These clusters are displayed on the “Clusters” page in the Unify UI.

Returns:	The machine learning model for pair-matching.
Return type:	`MachineLearningModel`

high_impact_pairs()[source]¶

High-impact pairs as a dataset. Unify labels pairs as “high-impact” if labeling these pairs would help it learn most quickly (i.e. “Active learning”).

High-impact pairs are displayed with a ⚡ lightning bolt icon on the “Pairs” page in the Unify UI.

Call refresh() from this dataset to produce new high-impact pairs according to the latest pair-matching model.

Returns:	The high-impact pairs represented as a dataset.
Return type:	`Dataset`

record_clusters()[source]¶

Record Clusters as a dataset. Unify clusters labeled pairs using pairs model. These clusters populate the cluster review page and get transient cluster ids, rather than published cluster ids (i.e., “Permanent Ids”)

Call refresh() from this dataset to generate clusters based on to the latest pair-matching model.

Returns:	The record clusters represented as a dataset.
Return type:	`Dataset`

published_clusters()[source]¶

Published record clusters generated by Unify’s pair-matching model.

Returns:	The published clusters represented as a dataset.
Return type:	`Dataset`

estimate_pairs()[source]¶

Returns pair estimate information for a mastering project

Returns:	Pairs Estimate information.
Return type:	`estimated_pair_counts`

record_clusters_with_data()[source]¶

Project’s unified dataset with associated clusters.

Returns:	The record clusters with data represented as a dataset
Return type:	`Dataset`

published_clusters_with_data()[source]¶: Project’s unified dataset with associated clusters. :returns: The published clusters with data represented as a dataset :rtype :class ~tamr_unify_client.models.dataset.resource.Dataset

binning_model()[source]¶

Binning model for this project.

Returns:	Binning model for this project.
Return type:	`BinningModel`

add_input_dataset(dataset)¶

Associate a dataset with a project in Unify.

By default, datasets are not associated with any projects. They need to be added as input to a project before they can be used as part of that project

Parameters:	project – Unify Project dataset – Unify Dataset
Returns:	HTTP response from the server
Return type:	`requests.Response`

as_categorization()¶

Convert this project to a CategorizationProject

Returns:	This project.
Return type:	`CategorizationProject`
Raises:	TypeError – If the `type` of this project is not `"CATEGORIZATION"`

as_mastering()¶

Convert this project to a MasteringProject

Returns:	This project.
Return type:	`MasteringProject`
Raises:	TypeError – If the `type` of this project is not `"DEDUP"`

attributes¶

Attributes of this project.

Returns:	Attributes of this project.
Return type:	`AttributeCollection`

description¶

Type:	str

external_id¶

Type:	str

input_datasets()¶

Retrieve a collection of this project’s input datasets.

Returns: The project’s input datasets.

Return type:

class:	~tamr_unify_client.models.dataset.collection.DatasetCollection

name¶

Type:	str

relative_id¶

Type:	str

resource_id¶

Type:	str

type¶

One of:: "SCHEMA_MAPPING" "SCHEMA_MAPPING_RECOMMENDATIONS" "CATEGORIZATION" "DEDUP"

Type:	str

unified_dataset()¶

Unified dataset for this project.

Returns:	Unified dataset for this project.
Return type:	`Dataset`

class tamr_unify_client.models.project.estimated_pair_counts.EstimatedPairCounts(client, data, alias=None)[source]¶

Estimated Pair Counts info for Mastering Project

is_up_to_date¶

Whether an estimate pairs job has been run since the last edit to the binning model.

Return type:	bool

total_estimate¶

The total number of estimated candidate pairs and generated pairs for the model across all clauses.

Returns:

A dictionary containing candidate pairs and estimated pairs mapped to their corresponding estimated counts. For example:

{

“candidatePairCount”: “54321”,
”generatedPairCount”: “12345”

}

Return type: dict[str, str]

clause_estimates¶

The estimated candidate pair count and generated pair count for each clause in the model.

Returns:

A dictionary containing each clause name mapped to a dictionary containing the corresponding estimated candidate and generated pair counts. For example:

{

“Clause1”: {

“candidatePairCount”: “321”,
”generatedPairCount”: “123”

},

”Clause2”: {

“candidatePairCount”: “654”,
”generatedPairCount”: “456”

}

}

Return type: dict[str, dict[str, str]]

relative_id¶

Type:	str

resource_id¶

Type:	str

Projects¶

class tamr_unify_client.models.project.collection.ProjectCollection(client, api_path='projects')[source]¶

Collection of Project s.

Parameters:	client (`Client`) – Client for API call delegation. api_path (str) – API path used to access this collection. Default: `"projects"`.

by_resource_id(resource_id)[source]¶

Retrieve a project by resource ID.

Parameters:	resource_id (str) – The resource ID. E.g. `"1"`
Returns:	The specified project.
Return type:	`Project`

by_relative_id(relative_id)[source]¶

Retrieve a project by relative ID.

Parameters:	relative_id (str) – The resource ID. E.g. `"projects/1"`
Returns:	The specified project.
Return type:	`Project`

by_external_id(external_id)[source]¶

Retrieve a project by external ID.

Parameters:	external_id (str) – The external ID.
Returns:	The specified project, if found.
Return type:	`Project`
Raises:	KeyError – If no project with the specified external_id is found LookupError – If multiple projects with the specified external_id are found

stream()[source]¶

Stream projects in this collection. Implicitly called when iterating over this collection.

Returns:	Stream of projects.
Return type:	Python generator yielding `Project`

Usage:

>>> for project in collection.stream(): # explicit
>>>     do_stuff(project)
>>> for project in collection: # implicit
>>>     do_stuff(project)

create(creation_spec)[source]¶

Create a Project in Unify

Parameters:	creation_spec (dict[str, str]) – Project creation specification should be formatted as specified in the Public Docs for Creating a Project.
Returns:	The created Project
Return type:	`Project`