About rclone
rclone is a command line tool allowing data transfer from and to cloud storage providers, described as 'rsync for the cloud'. This guide will show you how to use it to transfer data to Amazon S3 Glacier.toc
Requirements
You need an AWS account to utilize cloud storage. At this time, these accounts are not provided by the SCU.
Warning |
---|
Be advised that using AWS will incur costs separate from any existing agreement with the SCU, or other WCM groups. Use the Amazon Cost Explorer to estimate the costs for this service. |
Once you have an AWS account, you need to create an IAM user with the permissions to use S3 and Glacier. Please refer to the amazon documentation on how to do that.
Using other cloud providers
See the list of rclone-supported cloud providers for notes on how to set up providers other than AWS. In general, this is similar to the AWS setup shown below - just follow the instructions of the configuration tool.
Configure rclone to work with AWS
Rclone is available on the SCU nodes. The following example is on pascal.med.cornell.edu.
...
Code Block |
---|
|
$ rclone copy -P amazon_store:rclone-tutorial ~/test-download
Transferred: 192M / 192 MBytes, 100%, 77.531 MBytes/s, ETA 0s
Errors: 0
Checks: 0 / 0, -
Transferred: 2 / 2, 100%
Elapsed time: 2.4s
$ ls ~/test-download/
test1.file test2.file |
Tuning rclone performance
By default, rclone is not optimized for our infrastructure. Increasing the maximum number of parallel transfers and the chunk size can increase transfer speed. This will however take more bandwidth and RAM, so depending on which node this is run on, the results will vary. The following flags should be used:
Code Block |
---|
|
--bwlimit=0 # Do not limit bandwidth
--buffer-size=128M # Buffer for each transfer
--checkers=32 # Run 32 checksum checkers in parallel
--transfers=32 # Run 32 transfers in parallel |
Please be advised that the actual performance gain depends on both the source and destination system, as well as the current usage of those systems. Also, depending on the type of data transferred (many small files, or few large files?), results will vary. Use these parameters as a starting point for your individual fine tuning.
Use these parameters as follows:
Code Block |
---|
|
$ rclone --bwlimit=0 --buffer-size=128M --checkers=32 --transfers=32 copy -P ~/local/source amazon_store:bucket-name |
rclone browser - Graphical user interface
...
Please note: This is not suitable to move large amounts of data. Only use this for smaller amounts of data, or for managing inventory.
Related articles
Filter by label (Content by label) |
---|
showLabels | false |
---|
max | 5 |
---|
spaces | com.atlassian.confluence.content.render.xhtml.model.resource.identifiers.SpaceResourceIdentifier@28a7b0 |
---|
showSpace | false |
---|
sort | modified |
---|
reverse | true |
---|
type | page |
---|
cql | label in ( "rclone" , "cloud" , "aws" , "storage" , "glacier" ) and type = "page" and space = "WIKI" |
---|
labels | rclone cloud glacier aws storage |
---|
|