Skip to content

How to copy data to aws

AWS S3 scripts, tips and tricks

The following scripts may be useful for quickly uploading, viewing and deleting long term storage files from AWS S3.

Important:

  • These commands require AWS CLI access, so you must make sure that is configured before using them.
  • They work from within a folder called git into which all lab git repos are cloned.

Place scripts and update $PATH

To make these scripts available like any other bash command, they must be placed in a location that is present in your $PATH. This can be done in one of the following ways.

OPTION 1. Clone the protocols_tutorials repository to rhino.

  1. In your git folder on scratch or another suitable location, clone this protocols_tutorials repository with the following command.
git clone git@github.com:rasilab/protocols_tutorials
  1. In the same folder, run the following to update your .bash_profile with a modified $PATH.
echo "export PATH=\$PATH:$(pwd)/protocols_tutorials/docs/scripts" >> ~/.bash_profile
  1. Run the following command to make the scripts available on the commmand line (or restart the terminal).
source ~/.bashrc

Now the scripts can be run as the following commands: cps3, lss3, and rms3.

OPTION 2. Choose another install location.

  1. In your home folder on rhino, create a new folder called opt by running mkdir opt. You can choose a different location or folder name, but you must have write access to the parent folder.
  2. cd into the new opt folder and copy the scripts below into new files.
  3. Name each file with a logical command name (I called them cps3, lss3, and rms3). Do not add any extension to the filenames.
  4. Make sure that each script is executable by running chmod +x cps3 (same for lss3 and rms3).
  5. Add the following line to your .bash_profile: export PATH=$PATH:/home/USERNAME/opt
  6. Restart the terminal or run source ~/.bashrc to make the scripts available as commands.
  7. Now the scripts can be run as commands equivalent to their names, e.g. cps3, lss3, and rms3.

Scripts

1. Copy files to S3 (cps3)

#! /bin/bash

PATH_START=s3://fh-pi-subramaniam-a-eco/git_repos/

PATH_END=$(pwd | grep -o -P "(?<=git/).+$")/

FILES=`ls *`

for FILE in $FILES
        do
                aws s3 cp $FILE $PATH_START$PATH_END
        done

2. Recursively view contents of S3 in the current working directory (lss3)

#! /bin/bash

PATH_START=s3://fh-pi-subramaniam-a-eco/git_repos/

PATH_END=$(pwd | grep -o -P "(?<=git/).+$")/

aws s3 ls $FILE $PATH_START$PATH_END --recursive --human-readable

3. Remove a specific file from S3 (rms3)

#! /bin/bash

PATH_START=s3://fh-pi-subramaniam-a-eco/git_repos/

PATH_END=$(pwd | grep -o -P "(?<=git/).+$")/

FILES=$1

aws s3 rm $PATH_START$PATH_END$1

Upload data

  1. Place raw data where they are used in your analysis workflow (e.g. a fastq folder). Symbolic links may also be used.
  2. Run cps3 from this folder to transfer all files to an analogously named bucket in git_repos on S3. The cps3 command will create the bucket, if it doesn't already exist.

View data

  1. From any folder, run lss3 to recursively view its contents in the analogous bucket(s) on S3.

Delete data

  1. From a folder containing data that you want to remove from the analogous bucket on S3 (e.g. if you accidentally uploaded something that doesn't belong there), run rms3 foo.bar (where foo.bar is the file to be removed).
  2. To delete data recursively, use a one line for loop like this one:
for file in `ls *.fastq`; do rms3 $file; done