How to copy data to aws
AWS S3 scripts, tips and tricks¶
The following scripts may be useful for quickly uploading, viewing and deleting long term storage files from AWS S3.
Important:
- These commands require AWS CLI access, so you must make sure that is configured before using them.
- They work from within a folder called
gitinto which all lab git repos are cloned.
Place scripts and update $PATH¶
To make these scripts available like any other bash command, they must be placed in a location that is present in your $PATH. This can be done in one of the following ways.
OPTION 1. Clone the protocols_tutorials repository to rhino.¶
- In your
gitfolder onscratchor another suitable location, clone thisprotocols_tutorialsrepository with the following command.
- In the same folder, run the following to update your
.bash_profilewith a modified $PATH.
- Run the following command to make the scripts available on the commmand line (or restart the terminal).
Now the scripts can be run as the following commands: cps3, lss3, and rms3.
OPTION 2. Choose another install location.¶
- In your home folder on
rhino, create a new folder calledoptby runningmkdir opt. You can choose a different location or folder name, but you must have write access to the parent folder. cdinto the newoptfolder and copy the scripts below into new files.- Name each file with a logical command name (I called them
cps3,lss3, andrms3). Do not add any extension to the filenames. - Make sure that each script is executable by running
chmod +x cps3(same forlss3andrms3). - Add the following line to your
.bash_profile:export PATH=$PATH:/home/USERNAME/opt - Restart the terminal or run
source ~/.bashrcto make the scripts available as commands. - Now the scripts can be run as commands equivalent to their names, e.g.
cps3,lss3, andrms3.
Scripts¶
1. Copy files to S3 (cps3)¶
#! /bin/bash
PATH_START=s3://fh-pi-subramaniam-a-eco/git_repos/
PATH_END=$(pwd | grep -o -P "(?<=git/).+$")/
FILES=`ls *`
for FILE in $FILES
do
aws s3 cp $FILE $PATH_START$PATH_END
done
2. Recursively view contents of S3 in the current working directory (lss3)¶
#! /bin/bash
PATH_START=s3://fh-pi-subramaniam-a-eco/git_repos/
PATH_END=$(pwd | grep -o -P "(?<=git/).+$")/
aws s3 ls $FILE $PATH_START$PATH_END --recursive --human-readable
3. Remove a specific file from S3 (rms3)¶
#! /bin/bash
PATH_START=s3://fh-pi-subramaniam-a-eco/git_repos/
PATH_END=$(pwd | grep -o -P "(?<=git/).+$")/
FILES=$1
aws s3 rm $PATH_START$PATH_END$1
Upload data¶
- Place raw data where they are used in your analysis workflow (e.g. a
fastqfolder). Symbolic links may also be used. - Run
cps3from this folder to transfer all files to an analogously named bucket ingit_reposon S3. Thecps3command will create the bucket, if it doesn't already exist.
View data¶
- From any folder, run
lss3to recursively view its contents in the analogous bucket(s) on S3.
Delete data¶
- From a folder containing data that you want to remove from the analogous bucket on S3
(e.g. if you accidentally uploaded something that doesn't belong there),
run
rms3 foo.bar(wherefoo.baris the file to be removed). - To delete data recursively, use a one line
forloop like this one: