Skip to content

hyperion.adapters.storage.s3

hyperion.adapters.storage.s3

S3 :class:StoragePort adapter (requires boto3 / aioboto3 -- future [aws]).

Self-contained: it does not import the legacy hyperion.infrastructure.aws S3Client (kept untouched for Catalog until S5/S6). The bucket -- and an optional key prefix -- are fixed at construction, so the port surface stays key-only. iter_keys strips the storage-side prefix back off so this adapter shares one key namespace with the memory/filesystem adapters.

S3Storage

S3Storage(bucket, prefix='')

A :class:StoragePort backed by a single S3 bucket (optionally prefixed).

Source code in hyperion/adapters/storage/s3.py
def __init__(self, bucket: str, prefix: str = "") -> None:
    self._bucket = bucket
    self._prefix = prefix
    self._client = boto3.client("s3")
    self._aio_session = aioboto3.Session()

get_attributes

get_attributes(key)

Return metadata for key via head_object.

.. note:: For multipart uploads, head_object returns a composite ETag ("<md5>-<part_count>") that is not a plain content MD5 and differs from the value get_object_attributes would return. :meth:put / :meth:put_async go through boto3's managed upload_fileobj, which switches to a multipart upload for payloads above TransferConfig.multipart_threshold (8 MiB by default), so larger objects do get a composite ETag here. Callers must not treat etag as a portable content MD5 across backends or upload strategies.

Source code in hyperion/adapters/storage/s3.py
def get_attributes(self, key: str) -> ObjectAttributes:
    """Return metadata for *key* via ``head_object``.

    .. note::
        For multipart uploads, ``head_object`` returns a composite ETag
        (``"<md5>-<part_count>"``) that is **not** a plain content MD5
        and differs from the value ``get_object_attributes`` would return.
        :meth:`put` / :meth:`put_async` go through boto3's managed
        ``upload_fileobj``, which switches to a multipart upload for
        payloads above ``TransferConfig.multipart_threshold`` (8 MiB by
        default), so larger objects do get a composite ETag here. Callers
        must not treat ``etag`` as a portable content MD5 across backends
        or upload strategies.
    """
    try:
        response = self._client.head_object(Bucket=self._bucket, Key=self._full_key(key))
    except botocore.exceptions.ClientError as error:
        if _is_not_found(error):
            raise ObjectNotFoundError(key) from error
        raise
    return ObjectAttributes(
        etag=response["ETag"].strip('"'),
        size=int(response["ContentLength"]),
        last_modified=response["LastModified"],
    )