How to copy data to aws
AWS S3 scripts, tips and tricks¶
The following scripts may be useful for quickly uploading, viewing and deleting long term storage files from AWS S3.
Important:
- These commands require AWS CLI access, so you must make sure that is configured before using them.
- They work from within a folder called
git
into which all lab git repos are cloned.
Place scripts and update $PATH¶
To make these scripts available like any other bash command, they must be placed in a location that is present in your $PATH. This can be done in one of the following ways.
OPTION 1. Clone the protocols_tutorials
repository to rhino.¶
- In your
git
folder onscratch
or another suitable location, clone thisprotocols_tutorials
repository with the following command.
- In the same folder, run the following to update your
.bash_profile
with a modified $PATH.
- Run the following command to make the scripts available on the commmand line (or restart the terminal).
Now the scripts can be run as the following commands: cps3
, lss3
, and rms3
.
OPTION 2. Choose another install location.¶
- In your home folder on
rhino
, create a new folder calledopt
by runningmkdir opt
. You can choose a different location or folder name, but you must have write access to the parent folder. cd
into the newopt
folder and copy the scripts below into new files.- Name each file with a logical command name (I called them
cps3
,lss3
, andrms3
). Do not add any extension to the filenames. - Make sure that each script is executable by running
chmod +x cps3
(same forlss3
andrms3
). - Add the following line to your
.bash_profile
:export PATH=$PATH:/home/USERNAME/opt
- Restart the terminal or run
source ~/.bashrc
to make the scripts available as commands. - Now the scripts can be run as commands equivalent to their names, e.g.
cps3
,lss3
, andrms3
.
Scripts¶
1. Copy files to S3 (cps3
)¶
#! /bin/bash
PATH_START=s3://fh-pi-subramaniam-a-eco/git_repos/
PATH_END=$(pwd | grep -o -P "(?<=git/).+$")/
FILES=`ls *`
for FILE in $FILES
do
aws s3 cp $FILE $PATH_START$PATH_END
done
2. Recursively view contents of S3 in the current working directory (lss3
)¶
#! /bin/bash
PATH_START=s3://fh-pi-subramaniam-a-eco/git_repos/
PATH_END=$(pwd | grep -o -P "(?<=git/).+$")/
aws s3 ls $FILE $PATH_START$PATH_END --recursive --human-readable
3. Remove a specific file from S3 (rms3
)¶
#! /bin/bash
PATH_START=s3://fh-pi-subramaniam-a-eco/git_repos/
PATH_END=$(pwd | grep -o -P "(?<=git/).+$")/
FILES=$1
aws s3 rm $PATH_START$PATH_END$1
Upload data¶
- Place raw data where they are used in your analysis workflow (e.g. a
fastq
folder). Symbolic links may also be used. - Run
cps3
from this folder to transfer all files to an analogously named bucket ingit_repos
on S3. Thecps3
command will create the bucket, if it doesn't already exist.
View data¶
- From any folder, run
lss3
to recursively view its contents in the analogous bucket(s) on S3.
Delete data¶
- From a folder containing data that you want to remove from the analogous bucket on S3
(e.g. if you accidentally uploaded something that doesn't belong there),
run
rms3 foo.bar
(wherefoo.bar
is the file to be removed). - To delete data recursively, use a one line
for
loop like this one: