Datasets¶
Dataset¶
-
class
tamr_unify_client.dataset.resource.
Dataset
(client, data, alias=None)[source]¶ A Tamr dataset.
-
property
attributes
¶ Attributes of this dataset.
- Returns
Attributes of this dataset.
- Return type
-
upsert_from_dataframe
(df, *, primary_key_name, ignore_nan=True)[source]¶ Upserts a record for each row of df with attributes for each column in df.
- Parameters
df (
DataFrame
) – The data to upsert records from.primary_key_name (
str
) – The name of the primary key of the dataset. Must be a column of df.ignore_nan (
bool
) – Whether to convert NaN values to null before upserting records to Tamr. If False and NaN is in df, this function will fail. Optional, default is True.
- Return type
- Returns
JSON response body from the server.
- Raises
KeyError – If primary_key_name is not a column in df.
-
upsert_records
(records, primary_key_name, **json_args)[source]¶ Creates or updates the specified records.
- Parameters
records (iterable[dict]) – The records to update, as dictionaries.
primary_key_name (str) – The name of the primary key for these records, which must be a key in each record dictionary.
**json_args – Arguments to pass to the JSON dumps function, as documented here. Some of these, such as indent, may not work with Tamr.
- Returns
JSON response body from the server.
- Return type
-
delete_records_by_id
(record_ids)[source]¶ Deletes the specified records.
- Parameters
record_ids (iterable) – The IDs of the records to delete.
- Returns
JSON response body from the server.
- Return type
-
delete_all_records
()[source]¶ Removes all records from the dataset.
- Returns
HTTP response from the server
- Return type
-
refresh
(**options)[source]¶ Brings dataset up-to-date if needed, taking whatever actions are required.
- Parameters
**options – Options passed to underlying
Operation
. Seeapply_options()
.- Returns
The refresh operation.
- Return type
-
profile
()[source]¶ Returns profile information for a dataset.
If profile information has not been generated, call create_profile() first. If the returned profile information is out-of-date, you can call refresh() on the returned object to bring it up-to-date.
- Returns
Dataset Profile information.
- Return type
-
create_profile
(**options)[source]¶ Create a profile for this dataset.
If a profile already exists, the existing profile will be brought up to date.
- Parameters
**options – Options passed to underlying
Operation
. Seeapply_options()
.- Returns
The operation to create the profile.
- Return type
-
records
()[source]¶ Stream this dataset’s records as Python dictionaries.
- Returns
Stream of records.
- Return type
Python generator yielding
dict
-
status
()[source]¶ Retrieve this dataset’s streamability status.
- Returns
Dataset streamability status.
- Return type
-
usage
()[source]¶ Retrieve this dataset’s usage by recipes and downstream datasets.
- Returns
The dataset’s usage.
- Return type
-
from_geo_features
(features, geo_attr=None)[source]¶ Upsert this dataset from a geospatial FeatureCollection or iterable of Features.
features can be:
An object that implements
__geo_interface__
as a FeatureCollection (see https://gist.github.com/sgillies/2217756)An iterable of features, where each element is a feature dictionary or an object that implements the
__geo_interface__
as a FeatureA map where the “features” key contains an iterable of features
See: geopandas.GeoDataFrame.from_features()
If geo_attr is provided, then the named Tamr attribute will be used for the geometry. If geo_attr is not provided, then the first attribute on the dataset with geometry type will be used for the geometry.
- Parameters
features – geospatial features
geo_attr (str) – (optional) name of the Tamr attribute to use for the feature’s geometry
-
upstream_datasets
()[source]¶ The Dataset’s upstream datasets.
API returns the URIs of the upstream datasets, resulting in a list of DatasetURIs, not actual Datasets.
- Returns
A list of the Dataset’s upstream datasets.
- Return type
list[
DatasetURI
]
-
delete
(cascade=False)[source]¶ Deletes this dataset, optionally deleting all derived datasets as well.
- Parameters
cascade (bool) – Whether to delete all datasets derived from this one. Optional, default is False. Do not use this option unless you are certain you need it as it can have unindended consequences.
- Returns
HTTP response from the server
- Return type
-
itergeofeatures
(geo_attr=None)[source]¶ Returns an iterator that yields feature dictionaries that comply with __geo_interface__
See https://gist.github.com/sgillies/2217756
- Parameters
geo_attr (str) – (optional) name of the Tamr attribute to use for the feature’s geometry
- Returns
stream of features
- Return type
Python generator yielding
dict[str, object]
-
property
Dataset Spec¶
-
class
tamr_unify_client.dataset.resource.
DatasetSpec
(client, data, api_path)[source]¶ A representation of the server view of a dataset.
-
static
of
(resource)[source]¶ Creates a dataset spec from a dataset.
- Parameters
resource (
Dataset
) – The existing dataset.- Returns
The corresponding dataset spec.
- Return type
-
static
new
()[source]¶ Creates a blank spec that could be used to construct a new dataset.
- Returns
The empty spec.
- Return type
-
from_data
(data)[source]¶ Creates a spec with the same client and API path as this one, but new data.
- Parameters
data (dict) – The data for the new spec.
- Returns
The new spec.
- Return type
-
to_dict
()[source]¶ Returns a version of this spec that conforms to the API representation.
- Returns
The spec’s dict.
- Return type
-
with_name
(new_name)[source]¶ Creates a new spec with the same properties, updating name.
- Parameters
new_name (str) – The new name.
- Returns
A new spec.
- Return type
-
with_external_id
(new_external_id)[source]¶ Creates a new spec with the same properties, updating external ID.
- Parameters
new_external_id (str) – The new external ID.
- Returns
A new spec.
- Return type
-
with_description
(new_description)[source]¶ Creates a new spec with the same properties, updating description.
- Parameters
new_description (str) – The new description.
- Returns
A new spec.
- Return type
-
with_key_attribute_names
(new_key_attribute_names)[source]¶ Creates a new spec with the same properties, updating key attribute names.
Creates a new spec with the same properties, updating tags.
- Parameters
- Returns
A new spec.
- Return type
-
static
Dataset Collection¶
-
class
tamr_unify_client.dataset.collection.
DatasetCollection
(client, api_path='datasets')[source]¶ Collection of
Dataset
s.- Parameters
-
by_external_id
(external_id)[source]¶ Retrieve a dataset by external ID.
- Parameters
external_id (str) – The external ID.
- Returns
The specified dataset, if found.
- Return type
- Raises
KeyError – If no dataset with the specified external_id is found
LookupError – If multiple datasets with the specified external_id are found
-
stream
()[source]¶ Stream datasets in this collection. Implicitly called when iterating over this collection.
- Returns
Stream of datasets.
- Return type
Python generator yielding
Dataset
- Usage:
>>> for dataset in collection.stream(): # explicit >>> do_stuff(dataset) >>> for dataset in collection: # implicit >>> do_stuff(dataset)
-
delete_by_resource_id
(resource_id, cascade=False)[source]¶ Deletes a dataset from this collection by resource_id. Optionally deletes all derived datasets as well.
- Parameters
- Returns
HTTP response from the server.
- Return type
-
create
(creation_spec)[source]¶ Create a Dataset in Tamr
- Parameters
creation_spec (dict[str, str]) – Dataset creation specification should be formatted as specified in the Public Docs for Creating a Dataset.
- Returns
The created Dataset
- Return type
-
create_from_dataframe
(df, primary_key_name, dataset_name, ignore_nan=True)[source]¶ Creates a dataset in this collection with the given name, creates an attribute for each column in the df (with primary_key_name as the key attribute), and upserts a record for each row of df.
Each attribute has the default type ARRAY[STRING], besides the key attribute, which will have type STRING.
This function attempts to ensure atomicity, but it is not guaranteed. If an error occurs while creating attributes or records, an attempt will be made to delete the dataset that was created. However, if this request errors, it will not try again.
- Parameters
df (
pandas.DataFrame
) – The data to create the dataset with.primary_key_name (str) – The name of the primary key of the dataset. Must be a column of df.
dataset_name (str) – What to name the dataset in Tamr. There cannot already be a dataset with this name.
ignore_nan (bool) – Whether to convert NaN values to null before upserting records to Tamr. If False and NaN is in df, this function will fail. Optional, default is True.
- Returns
The newly created dataset.
- Return type
- Raises
KeyError – If primary_key_name is not a column in df.
CreationError – If a step in creating the dataset fails.
-
class
tamr_unify_client.dataset.collection.
CreationError
(error_message)[source]¶ An error from
create_from_dataframe()
-
with_traceback
()¶ Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
-
Dataset Profile¶
-
class
tamr_unify_client.dataset.profile.
DatasetProfile
(client, data, alias=None)[source]¶ Profile info of a Tamr dataset.
-
property
relative_dataset_id
¶ The relative dataset ID of the associated dataset.
-
refresh
(**options)[source]¶ Updates the dataset profile if needed.
The dataset profile is updated on the server; you will need to call
profile()
to retrieve the updated profile.- Parameters
**options – Options passed to underlying
Operation
. Seeapply_options()
.- Returns
The refresh operation.
- Return type
-
delete
()¶ Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.
- Returns
HTTP response from the server
- Return type
-
property
Dataset Status¶
-
class
tamr_unify_client.dataset.status.
DatasetStatus
(client, data, alias=None)[source]¶ Streamability status of a Tamr dataset.
-
property
relative_dataset_id
¶ The relative dataset ID of the associated dataset.
-
property
is_streamable
¶ Whether the associated dataset is available to be streamed.
-
delete
()¶ Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.
- Returns
HTTP response from the server
- Return type
-
property
Dataset URI¶
Dataset Usage¶
-
class
tamr_unify_client.dataset.usage.
DatasetUsage
(client, data, alias=None)[source]¶ The usage of a dataset and its downstream dependencies.
See https://docs.tamr.com/reference#retrieve-downstream-dataset-usage
-
property
usage
¶ - Type
-
property
dependencies
¶ - Type
list[
DatasetUse
]
-
delete
()¶ Deletes this resource. Some resources do not support deletion, and will raise a 405 error if this is called.
- Returns
HTTP response from the server
- Return type
-
property
Dataset Use¶
-
class
tamr_unify_client.dataset.use.
DatasetUse
(client, data)[source]¶ The use of a dataset in project steps. This is not a BaseResource because it has no API path and cannot be directly retrieved or modified.
See https://docs.tamr.com/reference#retrieve-downstream-dataset-usage
- Parameters
-
property
input_to_project_steps
¶ - Type
list[
ProjectStep
]
-
property
output_from_project_steps
¶ - Type
list[
ProjectStep
]