Developer Interface¶
Authentication¶
-
class
tamr_unify_client.auth.
UsernamePasswordAuth
(username, password)[source]¶ Provides username/password authentication for Unify. Specifically, sets the Authorization HTTP header with Unify’s custom BasicCreds format.
Parameters: - Usage:
>>> from tamr_unify_client.auth import UsernamePasswordAuth >>> auth = UsernamePasswordAuth('my username', 'my password') >>> import tamr_unify_client as api >>> unify = api.Client(auth)
Client¶
-
class
tamr_unify_client.
Client
(auth, host='localhost', protocol='http', port=9100, base_path='/api/versioned/v1/', session=None)[source]¶ Python Client for Unify API. Each client is specific to a specific origin (protocol, host, port).
Parameters: - auth (
requests.auth.AuthBase
) – Unify-compatible Authentication provider. Recommended: use one of the classes described in Authentication - host (str) – Host address of remote Unify instance (e.g. 10.0.10.0). Default: ‘localhost’
- protocol (str) – Either ‘http’ or ‘https’. Default: ‘http’
- port (int) – Unify instance main port. Default: 9100
- base_path (str) – Base API path. Requests made by this client will be relative to this path. Default: ‘api/versioned/v1/’
- session (requests.Session) – Session to use for API calls. Default: A new default requests.Session().
- Usage:
>>> import tamr_unify_client as api >>> from tamr_unify_client.auth import UsernamePasswordAuth >>> auth = UsernamePasswordAuth('my username', 'my password') >>> local = api.Client(auth) # on http://localhost:9100 >>> remote = api.Client(auth, protocol='https', host='10.0.10.0') # on https://10.0.10.0:9100
-
origin
¶ HTTP origin i.e.
<protocol>://<host>[:<port>]
. For additional information, see MDN web docs .Type: str
-
request
(method, endpoint, **kwargs)[source]¶ Sends an authenticated request to the server. The URL for the request will be
"<origin>/<base_path>/<endpoint>"
.Parameters: Returns: HTTP response
Return type:
-
projects
¶ Collection of all projects on this Unify instance.
Returns: Collection of all projects. Return type: ProjectCollection
-
datasets
¶ Collection of all datasets on this Unify instance.
Returns: Collection of all datasets. Return type: DatasetCollection
- auth (
Attribute¶
Attribute¶
Attribute Collection¶
-
class
tamr_unify_client.attribute.collection.
AttributeCollection
(client, api_path)[source]¶ Collection of
Attribute
s.Parameters: -
by_resource_id
(resource_id)[source]¶ Retrieve an attribute by resource ID.
Parameters: resource_id (str) – The resource ID. E.g. "AttributeName"
Returns: The specified attribute. Return type: Attribute
-
by_relative_id
(relative_id)[source]¶ Retrieve an attribute by relative ID.
Parameters: relative_id (str) – The resource ID. E.g. "datasets/1/attributes/AttributeName"
Returns: The specified attribute. Return type: Attribute
-
by_external_id
(external_id)[source]¶ Retrieve an attribute by external ID.
Since attributes do not have external IDs, this method is not supported and will raise a
NotImplementedError
.Parameters: external_id (str) – The external ID.
Returns: The specified attribute, if found.
Return type: Raises: - KeyError – If no attribute with the specified external_id is found
- LookupError – If multiple attributes with the specified external_id are found
-
stream
()[source]¶ Stream attributes in this collection. Implicitly called when iterating over this collection.
Returns: Stream of attributes. Return type: Python generator yielding Attribute
- Usage:
>>> for attribute in collection.stream(): # explicit >>> do_stuff(attribute) >>> for attribute in collection: # implicit >>> do_stuff(attribute)
-
by_name
(attribute_name)[source]¶ Lookup a specific attribute in this collection by exact-match on name.
Parameters: attribute_name (str) – Name of the desired attribute. Returns: Attribute with matching name in this collection. Return type: Attribute
Raises: KeyError – If no attribute with specified name was found.
-
Attribute Type¶
-
class
tamr_unify_client.attribute.type.
AttributeType
(data)[source]¶ The type of an
Attribute
orSubAttribute
.See https://docs.tamr.com/reference#attribute-types
Parameters: data ( dict
) – JSON data representing this type-
inner_type
¶ Type: AttributeType
-
attributes
¶ Type: list[ SubAttribute
]
-
SubAttribute¶
-
class
tamr_unify_client.attribute.subattribute.
SubAttribute
(data)[source]¶ An attribute which is itself a property of another attribute.
See https://docs.tamr.com/reference#attribute-types
Parameters: data ( dict
) – JSON data representing this attribute-
type
¶ Type: AttributeType
-
Categorization¶
Categorization Project¶
-
class
tamr_unify_client.categorization.project.
CategorizationProject
(client, data, alias=None)[source]¶ A Categorization project in Unify.
-
model
()[source]¶ Machine learning model for this Categorization project. Learns from verified labels and predicts categorization labels for unlabeled records.
Returns: The machine learning model for categorization. Return type: MachineLearningModel
-
create_taxonomy
(creation_spec)[source]¶ Creates a
Taxonomy
for this project.A taxonomy cannot already be associated with this project.
Parameters: creation_spec (dict) – The creation specification for the taxonomy, which can include name. Returns: The new Taxonomy Return type: Taxonomy
-
taxonomy
()[source]¶ Retrieves the
Taxonomy
associated with this project. If a taxonomy is not already associated with this project, callcreate_taxonomy()
first.Returns: The project’s Taxonomy Return type: Taxonomy
-
add_input_dataset
(dataset)¶ Associate a dataset with a project in Unify.
By default, datasets are not associated with any projects. They need to be added as input to a project before they can be used as part of that project
Parameters: dataset ( Dataset
) – The dataset to associate with the project.Returns: HTTP response from the server Return type: requests.Response
-
as_categorization
()¶ Convert this project to a
CategorizationProject
Returns: This project. Return type: CategorizationProject
Raises: TypeError – If the type
of this project is not"CATEGORIZATION"
-
as_mastering
()¶ Convert this project to a
MasteringProject
Returns: This project. Return type: MasteringProject
Raises: TypeError – If the type
of this project is not"DEDUP"
-
attribute_configurations
()¶ Project’s attribute’s configurations.
Returns: The configurations of the attributes of a project. Return type: AttributeConfigurationCollection
-
attribute_mappings
()¶ Project’s attribute’s mappings.
Returns: The attribute mappings of a project. Return type: AttributeMappingCollection
-
attributes
¶ Attributes of this project.
Returns: Attributes of this project. Return type: AttributeCollection
-
input_datasets
()¶ Retrieve a collection of this project’s input datasets.
Returns: The project’s input datasets. Return type: DatasetCollection
-
type
¶ A Unify project type, listed in https://docs.tamr.com/reference#create-a-project.
Type: str
-
Category¶
Category¶
Category Collection¶
-
class
tamr_unify_client.categorization.category.collection.
CategoryCollection
(client, api_path)[source]¶ Collection of
Category
s.Parameters: -
by_resource_id
(resource_id)[source]¶ Retrieve a category by resource ID.
Parameters: resource_id (str) – The resource ID. E.g. "1"
Returns: The specified category. Return type: Category
-
by_relative_id
(relative_id)[source]¶ Retrieve a category by relative ID.
Parameters: relative_id (str) – The relative ID. E.g. "projects/1/categories/1"
Returns: The specified category. Return type: Category
-
by_external_id
(external_id)[source]¶ Retrieve an attribute by external ID.
Since categories do not have external IDs, this method is not supported and will raise a
NotImplementedError
.Parameters: external_id (str) – The external ID.
Returns: The specified category, if found.
Return type: Raises: - KeyError – If no category with the specified external_id is found
- LookupError – If multiple categories with the specified external_id are found
-
stream
()[source]¶ Stream categories in this collection. Implicitly called when iterating over this collection.
Returns: Stream of categories. Return type: Python generator yielding Category
- Usage:
>>> for category in collection.stream(): # explicit >>> do_stuff(category) >>> for category in collection: # implicit >>> do_stuff(category)
-
create
(creation_spec)[source]¶ Creates a new category.
Parameters: creation_spec (dict) – Category creation specification, formatted as specified in the Public Docs for Creating a Category. Returns: The newly created category. Return type: Category
-
Dataset¶
Dataset¶
-
class
tamr_unify_client.dataset.resource.
Dataset
(client, data, alias=None)[source]¶ A Unify dataset.
-
attributes
¶ Attributes of this dataset.
Returns: Attributes of this dataset. Return type: AttributeCollection
-
upsert_records
(records, primary_key_name, **json_args)[source]¶ Creates or updates the specified records.
Parameters: - records (iterable[dict]) – The records to update, as dictionaries.
- primary_key_name (str) – The name of the primary key for these records, which must be a key in each record dictionary.
- **json_args – Arguments to pass to the JSON dumps function, as documented here. Some of these, such as indent, may not work with Unify.
Returns: JSON response body from the server.
Return type:
-
delete_records
(records, primary_key_name)[source]¶ Deletes the specified records.
Parameters: Returns: JSON response body from the server.
Return type:
-
delete_records_by_id
(record_ids)[source]¶ Deletes the specified records.
Parameters: record_ids (iterable) – The IDs of the records to delete. Returns: JSON response body from the server. Return type: dict
-
delete_all_records
()[source]¶ Removes all records from the dataset.
Returns: HTTP response from the server Return type: requests.Response
-
refresh
(**options)[source]¶ Brings dataset up-to-date if needed, taking whatever actions are required.
Parameters: **options – Options passed to underlying Operation
. Seeapply_options()
.Returns: The refresh operation. Return type: Operation
-
profile
()[source]¶ Returns profile information for a dataset.
If profile information has not been generated, call create_profile() first. If the returned profile information is out-of-date, you can call refresh() on the returned object to bring it up-to-date.
Returns: Dataset Profile information. Return type: DatasetProfile
-
create_profile
(**options)[source]¶ Create a profile for this dataset.
If a profile already exists, the existing profile will be brought up to date.
Parameters: **options – Options passed to underlying Operation
. Seeapply_options()
.Returns: The operation to create the profile. Return type: Operation
-
records
()[source]¶ Stream this dataset’s records as Python dictionaries.
Returns: Stream of records. Return type: Python generator yielding dict
-
status
()[source]¶ Retrieve this dataset’s streamability status.
Returns: Dataset streamability status. Return type: DatasetStatus
-
usage
()[source]¶ Retrieve this dataset’s usage by recipes and downstream datasets.
Returns: The dataset’s usage. Return type: DatasetUsage
-
from_geo_features
(features, geo_attr=None)[source]¶ Upsert this dataset from a geospatial FeatureCollection or iterable of Features.
features can be:
- An object that implements
__geo_interface__
as a FeatureCollection (see https://gist.github.com/sgillies/2217756) - An iterable of features, where each element is a feature dictionary or an object
that implements the
__geo_interface__
as a Feature - A map where the “features” key contains an iterable of features
See: geopandas.GeoDataFrame.from_features()
If geo_attr is provided, then the named Unify attribute will be used for the geometry. If geo_attr is not provided, then the first attribute on the dataset with geometry type will be used for the geometry.
Parameters: - features – geospatial features
- geo_attr (str) – (optional) name of the Unify attribute to use for the feature’s geometry
- An object that implements
-
upstream_datasets
()[source]¶ The Dataset’s upstream datasets.
API returns the URIs of the upstream datasets, resulting in a list of DatasetURIs, not actual Datasets.
Returns: A list of the Dataset’s upstream datasets. Return type: list[ DatasetURI
]
-
itergeofeatures
(geo_attr=None)[source]¶ Returns an iterator that yields feature dictionaries that comply with __geo_interface__
See https://gist.github.com/sgillies/2217756
Parameters: geo_attr (str) – (optional) name of the Unify attribute to use for the feature’s geometry Returns: stream of features Return type: Python generator yielding dict[str, object]
-
Dataset Collection¶
-
class
tamr_unify_client.dataset.collection.
DatasetCollection
(client, api_path='datasets')[source]¶ Collection of
Dataset
s.Parameters: -
by_resource_id
(resource_id)[source]¶ Retrieve a dataset by resource ID.
Parameters: resource_id (str) – The resource ID. E.g. "1"
Returns: The specified dataset. Return type: Dataset
-
by_relative_id
(relative_id)[source]¶ Retrieve a dataset by relative ID.
Parameters: relative_id (str) – The resource ID. E.g. "datasets/1"
Returns: The specified dataset. Return type: Dataset
-
by_external_id
(external_id)[source]¶ Retrieve a dataset by external ID.
Parameters: external_id (str) – The external ID.
Returns: The specified dataset, if found.
Return type: Raises: - KeyError – If no dataset with the specified external_id is found
- LookupError – If multiple datasets with the specified external_id are found
-
stream
()[source]¶ Stream datasets in this collection. Implicitly called when iterating over this collection.
Returns: Stream of datasets. Return type: Python generator yielding Dataset
- Usage:
>>> for dataset in collection.stream(): # explicit >>> do_stuff(dataset) >>> for dataset in collection: # implicit >>> do_stuff(dataset)
-
Dataset Profile¶
-
class
tamr_unify_client.dataset.profile.
DatasetProfile
(client, data, alias=None)[source]¶ Profile info of a Unify dataset.
-
refresh
(**options)[source]¶ Updates the dataset profile if needed.
The dataset profile is updated on the server; you will need to call
profile()
to retrieve the updated profile.Parameters: **options – Options passed to underlying Operation
. Seeapply_options()
.Returns: The refresh operation. Return type: Operation
-
Dataset Status¶
Dataset URI¶
Dataset Usage¶
-
class
tamr_unify_client.dataset.usage.
DatasetUsage
(client, data, alias=None)[source]¶ The usage of a dataset and its downstream dependencies.
See https://docs.tamr.com/reference#retrieve-downstream-dataset-usage
-
usage
¶ Type: DatasetUse
-
dependencies
¶ Type: list[ DatasetUse
]
-
Dataset Use¶
-
class
tamr_unify_client.dataset.use.
DatasetUse
(client, data)[source]¶ The use of a dataset in project steps. This is not a BaseResource because it has no API path and cannot be directly retrieved or modified.
See https://docs.tamr.com/reference#retrieve-downstream-dataset-usage
Parameters: -
input_to_project_steps
¶ Type: list[ ProjectStep
]
-
output_from_project_steps
¶ Type: list[ ProjectStep
]
-
Machine Learning Model¶
-
class
tamr_unify_client.base_model.
MachineLearningModel
(client, data, alias=None)[source]¶ A Unify Machine Learning model.
-
train
(**options)[source]¶ Learn from verified labels.
Parameters: **options – Options passed to underlying Operation
. Seeapply_options()
.Returns: The resultant operation. Return type: Operation
-
predict
(**options)[source]¶ Suggest labels for unverified records.
Parameters: **options – Options passed to underlying Operation
. Seeapply_options()
.Returns: The resultant operation. Return type: Operation
-
Mastering¶
Binning Model¶
-
class
tamr_unify_client.mastering.binning_model.
BinningModel
(client, data, alias=None)[source]¶ A binning model object.
-
records
()[source]¶ Stream this object’s records as Python dictionaries.
Returns: Stream of records. Return type: Python generator yielding dict
-
update_records
(records)[source]¶ Send a batch of record creations/updates/deletions to this dataset.
Parameters: records (iterable[dict]) – Each record should be formatted as specified in the Public Docs for Dataset updates. Returns: JSON response body from server. Return type: dict
-
Estimated Pair Counts¶
-
class
tamr_unify_client.mastering.estimated_pair_counts.
EstimatedPairCounts
(client, data, alias=None)[source]¶ Estimated Pair Counts info for Mastering Project
-
is_up_to_date
¶ Whether an estimate pairs job has been run since the last edit to the binning model.
Return type: bool
-
total_estimate
¶ The total number of estimated candidate pairs and generated pairs for the model across all clauses.
Returns: A dictionary containing candidate pairs and estimated pairs mapped to their corresponding estimated counts. For example: {
“candidatePairCount”: “54321”,”generatedPairCount”: “12345”
}
Return type: dict[str, str]
-
clause_estimates
¶ The estimated candidate pair count and generated pair count for each clause in the model.
Returns: A dictionary containing each clause name mapped to a dictionary containing the corresponding estimated candidate and generated pair counts. For example: {
“Clause1”: {“candidatePairCount”: “321”,”generatedPairCount”: “123”
},
”Clause2”: {
“candidatePairCount”: “654”,”generatedPairCount”: “456”
}
}
Return type: dict[str, dict[str, str]]
-
refresh
(**options)[source]¶ Updates the estimated pair counts if needed.
The pair count estimates are updated on the server; you will need to call
estimate_pairs()
to retrieve the updated estimate.Parameters: **options – Options passed to underlying Operation
. Seeapply_options()
.Returns: The refresh operation. Return type: Operation
-
Mastering Project¶
-
class
tamr_unify_client.mastering.project.
MasteringProject
(client, data, alias=None)[source]¶ A Mastering project in Unify.
-
pairs
()[source]¶ Record pairs generated by Unify’s binning model. Pairs are displayed on the “Pairs” page in the Unify UI.
Call
refresh()
from this dataset to regenerate pairs according to the latest binning model.Returns: The record pairs represented as a dataset. Return type: Dataset
-
pair_matching_model
()[source]¶ Machine learning model for pair-matching for this Mastering project. Learns from verified labels and predicts categorization labels for unlabeled pairs.
Calling
predict()
from this dataset will produce new (unpublished) clusters. These clusters are displayed on the “Clusters” page in the Unify UI.Returns: The machine learning model for pair-matching. Return type: MachineLearningModel
-
high_impact_pairs
()[source]¶ High-impact pairs as a dataset. Unify labels pairs as “high-impact” if labeling these pairs would help it learn most quickly (i.e. “Active learning”).
High-impact pairs are displayed with a ⚡ lightning bolt icon on the “Pairs” page in the Unify UI.
Call
refresh()
from this dataset to produce new high-impact pairs according to the latest pair-matching model.Returns: The high-impact pairs represented as a dataset. Return type: Dataset
-
record_clusters
()[source]¶ Record Clusters as a dataset. Unify clusters labeled pairs using pairs model. These clusters populate the cluster review page and get transient cluster ids, rather than published cluster ids (i.e., “Permanent Ids”)
Call
refresh()
from this dataset to generate clusters based on to the latest pair-matching model.Returns: The record clusters represented as a dataset. Return type: Dataset
-
published_clusters
()[source]¶ Published record clusters generated by Unify’s pair-matching model.
Returns: The published clusters represented as a dataset. Return type: Dataset
-
published_clusters_configuration
()[source]¶ Retrieves published clusters configuration for this project.
Returns: The published clusters configuration Return type: PublishedClustersConfiguration
-
published_cluster_ids
()[source]¶ Retrieves published cluster IDs for this project.
Returns: The published cluster ID dataset. Return type: Dataset
-
published_cluster_stats
()[source]¶ Retrieves published cluster stats for this project.
Returns: The published cluster stats dataset. Return type: Dataset
-
published_cluster_versions
(cluster_ids)[source]¶ Retrieves version information for the specified published clusters. See https://docs.tamr.com/reference#retrieve-published-clusters-given-cluster-ids.
Parameters: cluster_ids (iterable[str]) – The persistent IDs of the clusters to get version information for. Returns: A stream of the published clusters. Return type: Python generator yielding PublishedCluster
-
record_published_cluster_versions
(record_ids)[source]¶ Retrieves version information for the published clusters of the given records. See https://docs.tamr.com/reference#retrieve-published-clusters-given-record-ids.
Parameters: record_ids (iterable[str]) – The Tamr IDs of the records to get cluster version information for. Returns: A stream of the relevant published clusters. Return type: Python generator yielding RecordPublishedCluster
-
estimate_pairs
()[source]¶ Returns pair estimate information for a mastering project
Returns: Pairs Estimate information. Return type: EstimatedPairCounts
-
record_clusters_with_data
()[source]¶ Project’s unified dataset with associated clusters.
Returns: The record clusters with data represented as a dataset Return type: Dataset
-
published_clusters_with_data
()[source]¶ Project’s unified dataset with associated clusters.
Returns: The published clusters with data represented as a dataset Return type: Dataset
-
binning_model
()[source]¶ Binning model for this project.
Returns: Binning model for this project. Return type: BinningModel
-
add_input_dataset
(dataset)¶ Associate a dataset with a project in Unify.
By default, datasets are not associated with any projects. They need to be added as input to a project before they can be used as part of that project
Parameters: dataset ( Dataset
) – The dataset to associate with the project.Returns: HTTP response from the server Return type: requests.Response
-
as_categorization
()¶ Convert this project to a
CategorizationProject
Returns: This project. Return type: CategorizationProject
Raises: TypeError – If the type
of this project is not"CATEGORIZATION"
-
as_mastering
()¶ Convert this project to a
MasteringProject
Returns: This project. Return type: MasteringProject
Raises: TypeError – If the type
of this project is not"DEDUP"
-
attribute_configurations
()¶ Project’s attribute’s configurations.
Returns: The configurations of the attributes of a project. Return type: AttributeConfigurationCollection
-
attribute_mappings
()¶ Project’s attribute’s mappings.
Returns: The attribute mappings of a project. Return type: AttributeMappingCollection
-
attributes
¶ Attributes of this project.
Returns: Attributes of this project. Return type: AttributeCollection
-
input_datasets
()¶ Retrieve a collection of this project’s input datasets.
Returns: The project’s input datasets. Return type: DatasetCollection
-
type
¶ A Unify project type, listed in https://docs.tamr.com/reference#create-a-project.
Type: str
-
Published Cluster¶
Metric¶
Published Cluster¶
-
class
tamr_unify_client.mastering.published_cluster.resource.
PublishedCluster
(data)[source]¶ A representation of a published cluster in a mastering project with version information. See https://docs.tamr.com/reference#retrieve-published-clusters-given-cluster-ids.
This is not a BaseResource because it does not have its own API endpoint.
Parameters: data – The JSON entity representing this PublishedCluster
.-
versions
¶ Type: list[ PublishedClusterVersion
]
-
Published Cluster Configuration¶
-
class
tamr_unify_client.mastering.published_cluster.configuration.
PublishedClustersConfiguration
(client, data, alias=None)[source]¶ The configuration of published clusters in a project.
See https://docs.tamr.com/reference#the-published-clusters-configuration-object
Published Cluster Version¶
Record Published Cluster¶
-
class
tamr_unify_client.mastering.published_cluster.record.
RecordPublishedCluster
(data)[source]¶ A representation of a published cluster of a record in a mastering project with version information. See https://docs.tamr.com/reference#retrieve-published-clusters-given-record-ids.
This is not a BaseResource because it does not have its own API endpoint.
Parameters: data – The JSON entity representing this RecordPublishedCluster
.-
versions
¶ Type: list[ RecordPublishedClusterVersion
]
-
Record Published Cluster Version¶
-
class
tamr_unify_client.mastering.published_cluster.record_version.
RecordPublishedClusterVersion
(data)[source]¶ A version of a published cluster in a mastering project.
This is not a BaseResource because it does not have its own API endpoint.
Parameters: data – The JSON entity representing this version.
Operation¶
-
class
tamr_unify_client.operation.
Operation
(client, data, alias=None)[source]¶ A long-running operation performed by Unify. Operations appear on the “Jobs” page of the Unify UI.
By design, client-side operations represent server-side operations at a particular point in time (namely, when the operation was fetched from the server). In other words: Operations will not pick up on server-side changes automatically. To get an up-to-date representation, refetch the operation e.g.
op = op.poll()
.-
apply_options
(asynchronous=False, **options)[source]¶ Applies operation options to this operation.
NOTE: This function should not be called directly. Rather, options should be passed in through a higher-level function e.g.
refresh()
.- Synchronous mode:
- Automatically waits for operation to resolve before returning the operation.
- asynchronous mode:
- Immediately return the
'PENDING'
operation. It is up to the user to coordinate this operation with their code viawait()
and/orpoll()
.
Parameters: Returns: Operation with options applied.
Return type:
-
state
¶ Server-side state of this operation.
Operation state can be unresolved (i.e.
state
is one of:'PENDING'
,'RUNNING'
), or resolved (i.e. state is one of:'CANCELED'
,'SUCCEEDED'
,'FAILED'
). Unless opting into asynchronous mode, all exposed operations should be resolved.Note: you only need to manually pick up server-side changes when opting into asynchronous mode when kicking off this operation.
- Usage:
>>> op.state # operation is currently 'PENDING' 'PENDING' >>> op.wait() # continually polls until operation resolves >>> op.state # incorrect usage; operation object state never changes. 'PENDING' >>> op = op.poll() # correct usage; use value returned by Operation.poll or Operation.wait >>> op.state 'SUCCEEDED'
-
poll
()[source]¶ Poll this operation for server-side updates.
Does not update the calling
Operation
object. Instead, returns a newOperation
.Returns: Updated representation of this operation. Return type: Operation
-
wait
(poll_interval_seconds=3, timeout_seconds=None)[source]¶ Continuously polls for this operation’s server-side state.
Parameters: Raises: TimeoutError – If operation takes longer than timeout_seconds to resolve.
Returns: Resolved operation.
Return type:
-
Project¶
Attribute Configuration¶
Attribute Configuration¶
-
class
tamr_unify_client.project.attribute_configuration.resource.
AttributeConfiguration
(client, data, alias=None)[source]¶ The configurations of Unify Attributes.
See https://docs.tamr.com/reference#the-attribute-configuration-object
Attribute Configuration Collection¶
-
class
tamr_unify_client.project.attribute_configuration.collection.
AttributeConfigurationCollection
(client, api_path)[source]¶ Collection of
AttributeConfiguration
Parameters: -
by_resource_id
(resource_id)[source]¶ Retrieve an attribute configuration by resource ID.
Parameters: resource_id (str) – The resource ID. Returns: The specified attribute configuration. Return type: AttributeConfiguration
-
by_relative_id
(relative_id)[source]¶ Retrieve an attribute configuration by relative ID.
Parameters: relative_id (str) – The relative ID. Returns: The specified attribute configuration. Return type: AttributeConfiguration
-
by_external_id
(external_id)[source]¶ Retrieve an attribute configuration by external ID.
Since attributes do not have external IDs, this method is not supported and will raise a
NotImplementedError
.Parameters: external_id (str) – The external ID.
Returns: The specified attribute, if found.
Return type: Raises: - KeyError – If no attribute with the specified external_id is found
- LookupError – If multiple attributes with the specified external_id are found
- NotImplementedError – AttributeConfiguration does not support external_id
-
stream
()[source]¶ Stream attribute configurations in this collection. Implicitly called when iterating over this collection.
Returns: Stream of attribute configurations. Return type: Python generator yielding AttributeConfiguration
- Usage:
>>> for attributeConfiguration in collection.stream(): # explicit >>> do_stuff(attributeConfiguration) >>> for attributeConfiguration in collection: # implicit >>> do_stuff(attributeConfiguration)
-
create
(creation_spec)[source]¶ Create an Attribute configuration in this collection
Parameters: creation_spec (dict[str, str]) – Attribute configuration creation specification should be formatted as specified in the Public Docs for adding an AttributeConfiguration. Returns: The created Attribute configuration Return type: AttributeConfiguration
-
Attribute Mapping¶
Attribute Mapping¶
-
class
tamr_unify_client.project.attribute_mapping.resource.
AttributeMapping
(data)[source]¶ see https://docs.tamr.com/reference#retrieve-projects-mappings AttributeMapping and AttributeMappingCollection do not inherit from BaseResource and BaseCollection. BC and BR require a specific URL for each individual attribute mapping (ex: /projects/1/attributeMappings/1), but these types of URLs do not exist for attribute mappings
Attribute Mapping Collection¶
-
class
tamr_unify_client.project.attribute_mapping.collection.
AttributeMappingCollection
(client, api_path)[source]¶ Collection of
AttributeMapping
:param map_url: API path used to access this collection. :type api_path: str :param client: Client for API call delegation. :type client:Client
-
by_resource_id
(resource_id)[source]¶ Retrieve an item in this collection by resource ID. :param resource_id: The resource ID. :type resource_id: str :returns: The specified attribute mapping. :rtype:
AttributeMapping
-
by_relative_id
(relative_id)[source]¶ Retrieve an item in this collection by relative ID. :param relative_id: The relative ID. :type relative_id: str :returns: The specified attribute mapping. :rtype:
AttributeMapping
-
create
(creation_spec)[source]¶ Create an Attribute mapping in this collection :param creation_spec: Attribute mapping creation specification should be formatted as specified in the Public Docs for adding an AttributeMapping. :type creation_spec: dict[str, str] :returns: The created Attribute mapping :rtype:
AttributeMapping
-
Project¶
-
class
tamr_unify_client.project.resource.
Project
(client, data, alias=None)[source]¶ A Unify project.
-
type
¶ A Unify project type, listed in https://docs.tamr.com/reference#create-a-project.
Type: str
-
attributes
¶ Attributes of this project.
Returns: Attributes of this project. Return type: AttributeCollection
-
unified_dataset
()[source]¶ Unified dataset for this project.
Returns: Unified dataset for this project. Return type: Dataset
-
as_categorization
()[source]¶ Convert this project to a
CategorizationProject
Returns: This project. Return type: CategorizationProject
Raises: TypeError – If the type
of this project is not"CATEGORIZATION"
-
as_mastering
()[source]¶ Convert this project to a
MasteringProject
Returns: This project. Return type: MasteringProject
Raises: TypeError – If the type
of this project is not"DEDUP"
-
add_input_dataset
(dataset)[source]¶ Associate a dataset with a project in Unify.
By default, datasets are not associated with any projects. They need to be added as input to a project before they can be used as part of that project
Parameters: dataset ( Dataset
) – The dataset to associate with the project.Returns: HTTP response from the server Return type: requests.Response
-
input_datasets
()[source]¶ Retrieve a collection of this project’s input datasets.
Returns: The project’s input datasets. Return type: DatasetCollection
-
attribute_configurations
()[source]¶ Project’s attribute’s configurations.
Returns: The configurations of the attributes of a project. Return type: AttributeConfigurationCollection
-
attribute_mappings
()[source]¶ Project’s attribute’s mappings.
Returns: The attribute mappings of a project. Return type: AttributeMappingCollection
-
Project Collection¶
-
class
tamr_unify_client.project.collection.
ProjectCollection
(client, api_path='projects')[source]¶ Collection of
Project
s.Parameters: -
by_resource_id
(resource_id)[source]¶ Retrieve a project by resource ID.
Parameters: resource_id (str) – The resource ID. E.g. "1"
Returns: The specified project. Return type: Project
-
by_relative_id
(relative_id)[source]¶ Retrieve a project by relative ID.
Parameters: relative_id (str) – The resource ID. E.g. "projects/1"
Returns: The specified project. Return type: Project
-
by_external_id
(external_id)[source]¶ Retrieve a project by external ID.
Parameters: external_id (str) – The external ID.
Returns: The specified project, if found.
Return type: Raises: - KeyError – If no project with the specified external_id is found
- LookupError – If multiple projects with the specified external_id are found
-
stream
()[source]¶ Stream projects in this collection. Implicitly called when iterating over this collection.
Returns: Stream of projects. Return type: Python generator yielding Project
- Usage:
>>> for project in collection.stream(): # explicit >>> do_stuff(project) >>> for project in collection: # implicit >>> do_stuff(project)
-
Project Step¶
-
class
tamr_unify_client.project.step.
ProjectStep
(client, data)[source]¶ A step of a Unify project. This is not a BaseResource because it has no API path and cannot be directly retrieved or modified.
See https://docs.tamr.com/reference#retrieve-downstream-dataset-usage
Parameters: -
type
¶ A Unify project type, listed in https://docs.tamr.com/reference#create-a-project.
Type: str
-