Mastering¶
Binning Model¶
-
class
tamr_unify_client.mastering.binning_model.
BinningModel
(client, data, alias=None)[source]¶ A binning model object.
-
records
()[source]¶ Stream this object’s records as Python dictionaries.
- Returns
Stream of records.
- Return type
Python generator yielding
dict
-
update_records
(records)[source]¶ Send a batch of record creations/updates/deletions to this dataset.
- Parameters
records (iterable[dict]) – Each record should be formatted as specified in the Public Docs for Dataset updates.
- Returns
JSON response body from server.
- Return type
-
delete
()¶ Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.
- Returns
HTTP response from the server
- Return type
-
Estimated Pair Counts¶
-
class
tamr_unify_client.mastering.estimated_pair_counts.
EstimatedPairCounts
(client, data, alias=None)[source]¶ Estimated Pair Counts info for Mastering Project
-
property
is_up_to_date
¶ Whether an estimate pairs job has been run since the last edit to the binning model.
- Return type
-
property
total_estimate
¶ The total number of estimated candidate pairs and generated pairs for the model across all clauses.
-
property
clause_estimates
¶ The estimated candidate pair count and generated pair count for each clause in the model.
- Returns
A dictionary containing each clause name mapped to a dictionary containing the corresponding estimated candidate and generated pair counts. For example:
{
“Clause1”: {
“candidatePairCount”: “321”,
”generatedPairCount”: “123”
},
”Clause2”: {
“candidatePairCount”: “654”,
”generatedPairCount”: “456”
}
}
- Return type
-
refresh
(**options)[source]¶ Updates the estimated pair counts if needed.
The pair count estimates are updated on the server; you will need to call
estimate_pairs()
to retrieve the updated estimate.- Parameters
**options – Options passed to underlying
Operation
. Seeapply_options()
.- Returns
The refresh operation.
- Return type
-
delete
()¶ Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.
- Returns
HTTP response from the server
- Return type
-
property
Mastering Project¶
-
class
tamr_unify_client.mastering.project.
MasteringProject
(client, data, alias=None)[source]¶ A Mastering project in Tamr.
-
pairs
()[source]¶ Record pairs generated by Tamr’s binning model. Pairs are displayed on the “Pairs” page in the Tamr UI.
Call
refresh()
from this dataset to regenerate pairs according to the latest binning model.- Returns
The record pairs represented as a dataset.
- Return type
-
pair_matching_model
()[source]¶ Machine learning model for pair-matching for this Mastering project. Learns from verified labels and predicts categorization labels for unlabeled pairs.
Calling
predict()
from this dataset will produce new (unpublished) clusters. These clusters are displayed on the “Clusters” page in the Tamr UI.- Returns
The machine learning model for pair-matching.
- Return type
-
high_impact_pairs
()[source]¶ High-impact pairs as a dataset. Tamr labels pairs as “high-impact” if labeling these pairs would help it learn most quickly (i.e. “Active learning”).
High-impact pairs are displayed with a ⚡ lightning bolt icon on the “Pairs” page in the Tamr UI.
Call
refresh()
from this dataset to produce new high-impact pairs according to the latest pair-matching model.- Returns
The high-impact pairs represented as a dataset.
- Return type
-
record_clusters
()[source]¶ Record Clusters as a dataset. Tamr clusters labeled pairs using pairs model. These clusters populate the cluster review page and get transient cluster ids, rather than published cluster ids (i.e., “Permanent Ids”)
Call
refresh()
from this dataset to generate clusters based on to the latest pair-matching model.- Returns
The record clusters represented as a dataset.
- Return type
-
published_clusters
()[source]¶ Published record clusters generated by Tamr’s pair-matching model.
- Returns
The published clusters represented as a dataset.
- Return type
-
published_clusters_configuration
()[source]¶ Retrieves published clusters configuration for this project.
- Returns
The published clusters configuration
- Return type
-
published_cluster_ids
()[source]¶ Retrieves published cluster IDs for this project.
- Returns
The published cluster ID dataset.
- Return type
-
published_cluster_stats
()[source]¶ Retrieves published cluster stats for this project.
- Returns
The published cluster stats dataset.
- Return type
-
published_cluster_versions
(cluster_ids)[source]¶ Retrieves version information for the specified published clusters. See https://docs.tamr.com/reference#retrieve-published-clusters-given-cluster-ids.
- Parameters
cluster_ids (iterable[str]) – The persistent IDs of the clusters to get version information for.
- Returns
A stream of the published clusters.
- Return type
Python generator yielding
PublishedCluster
-
record_published_cluster_versions
(record_ids)[source]¶ Retrieves version information for the published clusters of the given records. See https://docs.tamr.com/reference#retrieve-published-clusters-given-record-ids.
- Parameters
record_ids (iterable[str]) – The Tamr IDs of the records to get cluster version information for.
- Returns
A stream of the relevant published clusters.
- Return type
Python generator yielding
RecordPublishedCluster
-
estimate_pairs
()[source]¶ Returns pair estimate information for a mastering project
- Returns
Pairs Estimate information.
- Return type
-
record_clusters_with_data
()[source]¶ Project’s unified dataset with associated clusters.
- Returns
The record clusters with data represented as a dataset
- Return type
-
published_clusters_with_data
()[source]¶ Project’s unified dataset with associated clusters.
- Returns
The published clusters with data represented as a dataset
- Return type
-
binning_model
()[source]¶ Binning model for this project.
- Returns
Binning model for this project.
- Return type
-
add_input_dataset
(dataset)¶ Associate a dataset with a project in Tamr.
By default, datasets are not associated with any projects. They need to be added as input to a project before they can be used as part of that project
- Parameters
dataset (
Dataset
) – The dataset to associate with the project.- Returns
HTTP response from the server
- Return type
-
as_categorization
()¶ Convert this project to a
CategorizationProject
-
as_mastering
()¶ Convert this project to a
MasteringProject
- Returns
This project.
- Return type
- Raises
-
attribute_configurations
()¶ Project’s attribute’s configurations.
- Returns
The configurations of the attributes of a project.
- Return type
-
attribute_mappings
()¶ Project’s attribute’s mappings.
- Returns
The attribute mappings of a project.
- Return type
-
property
attributes
¶ Attributes of this project.
- Returns
Attributes of this project.
- Return type
-
delete
()¶ Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.
- Returns
HTTP response from the server
- Return type
-
input_datasets
()¶ Retrieve a collection of this project’s input datasets.
- Returns
The project’s input datasets.
- Return type
-
remove_input_dataset
(dataset)¶ Remove a dataset from a project.
- Parameters
dataset (
Dataset
) – The dataset to be removed from this project.- Returns
HTTP response from the server
- Return type
-
spec
()¶ Returns this project’s spec.
- Returns
The spec for the project.
- Return type
-
property
type
¶ A Tamr project type, listed in https://docs.tamr.com/reference#create-a-project.
- Type
-
Published Clusters¶
Metric¶
Published Cluster¶
-
class
tamr_unify_client.mastering.published_cluster.resource.
PublishedCluster
(data)[source]¶ A representation of a published cluster in a mastering project with version information. See https://docs.tamr.com/reference#retrieve-published-clusters-given-cluster-ids.
This is not a BaseResource because it does not have its own API endpoint.
- Parameters
data – The JSON entity representing this
PublishedCluster
.
-
property
versions
¶ - Type
list[
PublishedClusterVersion
]
Published Cluster Configuration¶
-
class
tamr_unify_client.mastering.published_cluster.configuration.
PublishedClustersConfiguration
(client, data, alias=None)[source]¶ The configuration of published clusters in a project.
See https://docs.tamr.com/reference#the-published-clusters-configuration-object
-
spec
()[source]¶ Returns a spec representation of this published cluster configuration.
- Returns
The published cluster configuration spec.
- Return type
:class`~tamr_unify_client.mastering.published_cluster.configuration.PublishedClustersConfigurationSpec`
-
delete
()¶ Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.
- Returns
HTTP response from the server
- Return type
-
Published Cluster Version¶
Record Published Cluster¶
-
class
tamr_unify_client.mastering.published_cluster.record.
RecordPublishedCluster
(data)[source]¶ A representation of a published cluster of a record in a mastering project with version information. See https://docs.tamr.com/reference#retrieve-published-clusters-given-record-ids.
This is not a BaseResource because it does not have its own API endpoint.
- Parameters
data – The JSON entity representing this
RecordPublishedCluster
.
-
property
versions
¶ - Type
Record Published Cluster Version¶
-
class
tamr_unify_client.mastering.published_cluster.record_version.
RecordPublishedClusterVersion
(data)[source]¶ A version of a published cluster in a mastering project.
This is not a BaseResource because it does not have its own API endpoint.
- Parameters
data – The JSON entity representing this version.