BRB - Manually Archiving Data to AWS S3
Configuring RClone
Before using rclone to copy files to S3 you will need to configure a “remote” for your bucket. You can do so by running “rclone config” and answering the many prompts as shown here for AWS (and here for Wasabi). Doing this will create the file ~/.config/rclone/rclone.conf under the current user’s home directory (if necessary) and add the remote to that file.
**Please note the Wasabi instructions are slightly older – we have made edits below to the rclone config for Wasabi
A few of the “rclone config” prompts may be a bit confusing, as follows:
There are multiple providers of S3-like storage, so you will be asked which provider to use. Select “AWS” unless you are using a different provider.
You will probably want to have your credentials stored in the config file rather than in environment variables, so agree to enter them through the config command.
After supplying the region, you can probably accept the default for the “endpoint”.
You can choose the default value for most of the other options, but may want to select different values for these options:
Some S3 buckets are configured to use KMS server-side encryption, so you may need to select AWS:KMS for that option.
If you would like to copy data directly to GDA, select “DEEP_ARCHIVE” as the storage class.
You can usually skip the “advanced” part of the configuration when it is offered.
If you don’t get these options right when you first run config, you can always run it again to edit the remote, or hand-edit the file to add or fix options. In fact, you will probably need to edit the config file to add one more option that isn’t offered through the basic config:
o If your account does not have bucket creation permissions (this is likely the case), add the line “no_check_bucket = true” (see this discussion for more information).
You can run the config command again to add more remotes, including for different cloud providers and other computers on your network. Each remote must have a unique name, which you will use in the rclone commands described below.
Wasabi
Wasabi is a cloud-based object storage service for a broad range of applications and use cases. Wasabi is designed for individuals and organizations that require a high-performance, reliable, and secure data storage infrastructure at minimal cost.
Wasabi provides an S3 interface which can be configured for use with rclone like this.
No remotes found, make a new one?
n) New remote
s) Set configuration password
n/s> n
name> wasabi
Type of storage to configure.
Choose a number from below, or type in your own value
[snip]
XX / Amazon S3 (also Dreamhost, Ceph, ChinaMobile, ArvanCloud, Minio)
\ "s3"
[snip]
Storage> s3
Choose your s3 provider -> Wasabi
Get AWS credentials from runtime (environment variables or EC2/ECS meta data if no env vars). Only applies if access_key_id and secret_access_key is blank.
Choose a number from below, or type in your own value
1 / Enter AWS credentials in the next step
\ "false"
env_auth> 1
AWS Access Key ID - leave blank for anonymous access or runtime credentials.
access_key_id> YOURACCESSKEY
AWS Secret Access Key (password) - leave blank for anonymous access or runtime credentials.
secret_access_key> YOURSECRETACCESSKEY
Region to connect to.
[snip] 1 / Use this if unsure. Will use v4 signatures and an empty region.
\ ""
region> Leave Blank
Endpoint for S3 API.
Leave blank if using AWS to use the default endpoint for the region.
Specify if using an S3 clone such as Ceph.
endpoint> s3.wasabisys.com
Location constraint - must be set to match the Region. Used when creating buckets only.
Choose a number from below, or type in your own value
1 / Empty for US Region, Northern Virginia, or Pacific Northwest.
\ ""
[snip]
location_constraint>
Canned ACL used when creating buckets and/or storing objects in S3.
For more info visit https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl
Choose a number from below, or type in your own value
1 / Owner gets FULL_CONTROL. No one else has access rights (default).
\ "private"
[snip]
acl>
The server-side encryption algorithm used when storing this object in S3.
Choose a number from below, or type in your own value
1 / None
\ ""
2 / AES256
\ "AES256"
server_side_encryption>
The storage class to use when storing objects in S3.
Choose a number from below, or type in your own value
1 / Default
\ ""
2 / Standard storage class
\ "STANDARD"
3 / Reduced redundancy storage class
\ "REDUCED_REDUNDANCY"
4 / Standard Infrequent Access storage class
\ "STANDARD_IA"
storage_class>
Remote config
--------------------
[wasabi] -> name of remote
env_auth = false
provider = Wasabi
access_key_id = YOURACCESSKEY
secret_access_key = YOURSECRETACCESSKEY
region =
endpoint = s3.wasabisys.com
location_constraint =
acl =
server_side_encryption =
storage_class =
--------------------
y) Yes this is OK
e) Edit this remote
d) Delete this remote
y/e/d> y
This will leave the config file looking like this.
[wasabi]
type = s3
provider = Wasabi
access_key_id = YOURACCESSKEY
secret_access_key = YOURSECRETACCESSKEY
endpoint = s3.wasabisys.com
RClone Command Examples
For your archiving work you may choose to use only the AWS CLI or only Rclone, or whatever combination of the two best meets your needs.
These examples presume that you have a file called foo.txt in the folder where you are issuing the commands and that you are copying that file to the path location “/test” on S3. These examples assume that the commands will be run in sequence to create a file, list it out, then delete it (after which you can list again and see that it is missing).
All command examples use “my-s3” as the configured “remote” and “my-bucket” as the bucket. The RClone syntax to specify an S3 source or destination is <remote>:<bucket>/<path>.
Links are provided to the documentation for each RClone command. Some of the options that are mentioned for a command can be found in its documentation, but others are part of the global options that apply to all RClone commands. Many of those global options relate to RCLone’s ability to synchronize data across two locations, which is not addressed in this document.
$ rclone --version
rclone v1.59.0
Copy
By default, the rclone copy command will copy folders of data, as opposed to the AWS CLI which copies a single file by default. This means that a source folder will automatically be copied recursively, and that a destination path will be treated as a folder into which source contents are copied. This example copies the local file “foo.txt” to a destination “folder” with the path “/test”:
rclone copy foo.txt my-s3:my-bucket/test
This example recursively copies the contents of folder “test” on S3 back to a folder named “test2” in the current local directory. If instead the destination was “.”, the contents of the S3 folder “test” would be copied to the current working directory (i.e., no folder named “test” would be created).
rclone copy my-s3:my-bucket/test test2
Some options that can be helpful when copying data are:
“--checksum” causes RClone to calculate a checksum of the source before copying and then confirm that a checksum calculated on the destination matches it. Typically, this sort of check will be required when you are doing archiving. Rclone doesn’t save or print the checksum, but you can find it in the output from the “long” S3 listings (see the AWS CLI section above).
If you are logging output from these copy operations, or just want to see progress as the command executes, you can have it print progress output with the “--progress” option. Additionally, you can use the “--stats <duration>” option to control the progress output frequency. Using “--stats 30s” prints a reasonable bit of output every 30 seconds.
The “--transfers <num>” option specifies how many individual files to copy in parallel (the default number is four). This is more useful when you are copying a folder full of many files than it is when you are copying one large archive file at a time.
List
Rclone offers several listing commands, but rclone lsf is perhaps the most useful, as its output is designed to be both human and machine readable. See its documentation for information about other list commands.
rclone lsf my-s3:my-bucket/test
Remove
The rclone delete command behaves differently on cloud storage than it does on a Linux file system. For cloud storage like S3, it removes all objects (files) that match the provided path. On a regular file system, it removes the matching files but keep the directories. To recursively remove a file system directory (including all the folders), use the rclone purge command.
rclone delete my-s3:my-bucket/test
This option may be useful when removing objects from S3:
The “--dry-run” option displays the objects that would be deleted before you run the command “for real”.