NOTICE: You are viewing a page of the openwetware wiki. Our "dewikify" feature makes a wiki page appear as a normal web page. In April 2017, this feature will GO AWAY and this URL will redirect to the source URL on our wiki. We're sorry for the inconvenience.
Notice: The Wilke Lab page has moved to http://wilkelab.org.
The page you are looking at is kept for archival purposes and will not be further updated.
THE WILKE LAB

Home        Contact        People        Research        Publications        Materials

Contents

Using the lab cluster

General

The lab cluster is phylocluster.ccbb.utexas.edu. You will need an account on the cluster to be able to use it.

The operating system on the cluster is Linux. If you have no experience with the Linux shell and command line, you will have to learn about that first. This is a good tutorial for beginners: [1]

The cluster uses the Sun Grid Engine (SGE) to distribute computing tasks over the various compute nodes that are available. You will have to start all jobs that you want to run using SGE. This is different from how you run jobs on your computer at home, so even if you have plenty of experience with Linux on your own computer you may not know your way around SGE.

A brief introduction to using SGE is given here: [2]. There's a lot of material in this document. The most important part is that every job on the cluster should use the provided bare-bones script as a starting point: [3]. If you have many independent runs that can run in parallel and don't need to talk to each other, you should also read this page: [4].

Finally, it is good practice to copy your jobs to /state/partition1 on the compute node before running them. /state/partition1 refers to the local hard drive on the compute node. We will soon provide a tutorial on how to do this exactly.

Storage

Storage locations

There are several locations for data storage. Which location you should use depends on the nature and size of data you have to store.

Important: Never store student grades, other FERPA protected data, social security numbers, etc. on the cluster. These are Category I data and require special security protocols that the cluster doesn't satisfy.


Useful unix commands related to storage

It is a good idea to regularly check up on how much storage you are using and how much is available. To find out to total amount of storage used in a directory, you can use the command du -sh <directory name>. For example:

> du -sh projects
5.2G    projects

To find out how much storage is available and how much is used, use the df command. For example:

> df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda1              5952252   4850948    794064  86% /
/dev/sda3             27948600     77912  26450944   1% /state/partition1
hydra-172.local:/export/WilkeLab
                     2857947040 1520440096 1337506944  54% /share/WilkeLab

Each line corresponds to one separate storage location. In this example, the first line corresponds to the root directory of the local disk. This is where the operating system is stored. The second line represents local temporary storage on the compute node. The third line represents the network storage where /share/WilkeLab resides.

Compressing files

Compress large data files. bzip2 gives the best compression ratio and should be used for very large files (~1G or bigger uncompressed). gzip provides more convenience (better integration with other tools, e.g. zless) and should be used for smaller files. You can also use zip/unzip as a replacement of gzip/bzip2 + tar. Don't use proprietary software to compress files.

As always, you can use the unix command man to find out how to use any of these programs. For example, enter man gzip to learn about how to use gzip. A few common use cases follow below.

Compress file data.txt using bzip2:

> bzip2 data.txt

Uncompress the resulting file data.txt.bz2:

> bunzip2 data.txt.bz2

Create compressed tar archive (using bzip2) from directory data. The resulting file will be called data.tbz2:

> tar cvfj data.tbz2 data

Extract data from compressed tar archive (compressed using bzip2):

> tar xvfj data.tbz2

List contents of a tar file (compressed using bzip2) without actually extracting the files:

> tar tvfj data.tbz2

To use gzip instead of bzip2 from tar, replace the j with a z in the above tar commands. For example, to list the contents of a tar file compressed using gzip, you would enter:

> tar tvfz data.tgz

This site is hosted on OpenWetWare