A Visual Guide to Version Control
Eric's Source Control HOWTO
Version Control with Subversion
26.6.09
19.6.09
End of Moore's Law?
In a 1965 paper, Intel co-founder Gordon Moore described the observed trend that the density of transistors on a computer chip had been doubling every two years. Since then, this notion that every two years, technological advances will double the density of transistors on a chip, became known as Moore's Law. It was always acknowledged that quantum mechanics would eventually provide a limit to these technological advances, but as it turns out, the cost of manufacturing may bring a halt to Moore's law before technical limitations. Michael Feldman, over at HPCWire, predicts that we may see the effective end to Moore's law in the next five years.
Moore observed that the cost per transistor decreased in concert with the shrinking geometries.
And it is really this aspect of the model that is breaking. Eventually you will be unable to sell enough chips to recoup even the capital expenditures.
Moore observed that the cost per transistor decreased in concert with the shrinking geometries.
... it has been apparent for some time that the Moore's Law curve is running counter to the escalating costs of semiconductor manufacturing, which are rising exponentially as process technology shrinks. This is the result of the increased cost of R&D, testing, and the construction of semiconductor fabrication facilities.
And it is really this aspect of the model that is breaking. Eventually you will be unable to sell enough chips to recoup even the capital expenditures.
7.5.09
Unix redirection
The shell and many Unix commands take their input from standard input (
The redirection of I/O, for example to a file, is accomplished by specifying the destination on the command line using a redirection metacharacter followed by the desired destination.
The general form of a command with standard input and output redirection is:
If you are using CSH/TCSH and do not have the noclobber environment variable set, using > and >& to redirect output will overwrite any existing file of that name. Setting noclobber prevents this. Using >! and >&! always forces the file to be overwritten. Use >> and >>& to append output to existing files.
Redirection may fail under some circumstances: 1) if you have the variable noclobber set and you attempt to redirect output to an existing file without forcing an overwrite, 2) if you redirect output to a file you don't have write access to, and 3) if you redirect output to a directory.
Lastly, have you ever wanted to capture the output of a command to a file, but also send it to the screen? The command tee can do just that.
redirects both
Its further discussed in this post
Additional Example:
Redirects standard output to a file named names.
Redirects output of both commands to a file named out.
% pwd; ls -l > out
Redirects output of ls command only to a file named out
Input redirection can be useful, for example, if you have written a FORTRAN program which expects input from the terminal but you want it to read from a file. In the following example, myprog, which was written to read standard input and write standard output, is redirected to read myin and write myout:
You can suppress redirected output and/or errors by sending it to the null device, /dev/null. The example shows redirection of both output and errors:
To redirect standard error and output to different files, you can use grouping:
For the original article on redirection, on which this article is heavily based, and more information about how it differs for the Bourne Shell Family, see this link.
stdin
), write output to standard output (stdout
), and write error output to standard error (stderr
). By default, standard input is connected to the terminal keyboard and standard output and error to the terminal screen.The redirection of I/O, for example to a file, is accomplished by specifying the destination on the command line using a redirection metacharacter followed by the desired destination.
Character | Action |
---|---|
> | Redirect standard output |
>& | Redirect standard output and standard error |
< | Redirect standard input |
>! | Redirect standard output; overwrite file if it exists |
>&! | Redirect standard output and standard error; overwrite file if it exists |
| | Redirect standard output to another command (pipe) |
>> | Append standard output |
>>& | Append standard output and standard error |
The general form of a command with standard input and output redirection is:
% command -[options] [arguments] <> output file
If you are using CSH/TCSH and do not have the noclobber environment variable set, using > and >& to redirect output will overwrite any existing file of that name. Setting noclobber prevents this. Using >! and >&! always forces the file to be overwritten. Use >> and >>& to append output to existing files.
Redirection may fail under some circumstances: 1) if you have the variable noclobber set and you attempt to redirect output to an existing file without forcing an overwrite, 2) if you redirect output to a file you don't have write access to, and 3) if you redirect output to a directory.
Lastly, have you ever wanted to capture the output of a command to a file, but also send it to the screen? The command tee can do just that.
./compile |& tee filename
redirects both
stdout
and stderr
to the file filename.Its further discussed in this post
Additional Example:
% who > names
Redirects standard output to a file named names.
% (pwd; ls -l) > out
Redirects output of both commands to a file named out.
% pwd; ls -l > out
Redirects output of ls command only to a file named out
Input redirection can be useful, for example, if you have written a FORTRAN program which expects input from the terminal but you want it to read from a file. In the following example, myprog, which was written to read standard input and write standard output, is redirected to read myin and write myout:
% myprog <> myout
You can suppress redirected output and/or errors by sending it to the null device, /dev/null. The example shows redirection of both output and errors:
% who >& /dev/null
To redirect standard error and output to different files, you can use grouping:
% (cat myfile > myout) >& myerror
For the original article on redirection, on which this article is heavily based, and more information about how it differs for the Bourne Shell Family, see this link.
Labels:
unix io,
unix redirection,
unix tip
3.5.09
Using tee for redirexction
Have you ever wanted to capture the output of a command to a file, but also send it to the screen? The command tee can do just that. Suppose you have a script that builds your application called compile and it sends pages of output to your screen, so you want to capture it to a file so that can employ UNIX's ability to search for patterns. You can achieve this using the UNIX commands
where the redirection symbols >& direct both the standard output (
redirects both
./compile >& filename
where the redirection symbols >& direct both the standard output (
stdout
) and write error output to standard error (stderr
) to the file named filename. Sometimes you also want to see the output on the screen at the same time. This is where the Unix command tee comes in. For example the Unix commands./compile |& tee filename
redirects both
stdout
and stderr
to the file filename.10.4.09
R Statistical software
R is a statistical computing scripting language not dissimilar to Matlab or Python.
R is a GNU project implementation of the S programming language with lexical scoping semantics inspired by Scheme. The R language has become a de facto standard among statisticians for the development of statistical software.
Resources:
R is a GNU project implementation of the S programming language with lexical scoping semantics inspired by Scheme. The R language has become a de facto standard among statisticians for the development of statistical software.
Resources:
- The R project website.
- R by example - a quick guide to doing things in R.
- Programming in R.
- A series of R tutorials on YouTube
- Statistics with R are some personal notes on using R.
- R packages/libraries archive.
- R Tutorial for Epidemiology.
- Some R resources and examples.
- Using R for psychological research.
- Graphics in R
3.4.09
Mother May I?
Unix uses the concept of permissions and ownership to determine who can access a file or directory. Each file or folder is considered to be owned by a user and a group (typically group to which the user belongs). There are a set of permissions associated with each file or folder that determines what actions on that file or folder are allowed for a particular user. There are three types of permissions:
You can remove a read-only file from a directory only if you have write permission for that folder (-rw-r--r--). Typically directories must have execute permission if you want people to have read permission.
Changing permissions
To change the permissions on a file or folder, use the chmod command. For details, see man chmod. The easiest way to change permissions is to use the symbolic modes where the permission changes are specified by add +, remove -, or set = permissions. Use u to indicate that the change applies to the user permissions, g to indicate that it applies to the group permissions, and o for anyone else.
To change the permissions on the file readme to allow the world to have write permission:
If the file isn't owned by the user, preface the command with sudo and supply a superuser password:
To remove the read and write permissions for the group and the world:
Use the the set = operation to set the permissions to an exact configuration - without regard to what the permissions are currently. To set the file to be readable and writable by the owner, but only readable by group and "other":
Things work similarly for directory permissions. Often you want to change the permissions of a directory and all its contents. Do this with the -R option to chmod.
To change a directory and all its contents to be writable by the owner and group use:
Likewise, the flag X (uppercase) can be combined with the -R option to ensure that a folder and all of its sub-folders have execute permission, but not the files. Typically execute permission is not desirable for files unless they are applications or scripts.
- r - read, allows you to read the contents of a file
- w - write, allows you to write to a file or delete it
- x - execute, allows you to run a file as a script, or cd to a directory.
- User or Owner is the person who created the file
- Group is the group that the owner belongs to
- Other is everyone else in the world.
ls -l readme
might give the results:-rw-r--r-- 1 bob staff 0 Jun 17 23:30 readme
This shows that the file readme is owned by user bob and group staff. The permissions are shown by the sequence of ten characters at the start of that line: -rw-r--r--. The first character is a - for a file. Alternatively if it were a d it would indicate a directory. The next three characters (rw-) indicate the permissions for the owner of the file (bob). The next three characters (r--) indicate the permissions for a user who is not the owner of the file, but who is in the group (staff) that owns the file. The last three characters (r--) indicate the permissions for everyone else.You can remove a read-only file from a directory only if you have write permission for that folder (-rw-r--r--). Typically directories must have execute permission if you want people to have read permission.
Changing permissions
To change the permissions on a file or folder, use the chmod command. For details, see man chmod. The easiest way to change permissions is to use the symbolic modes where the permission changes are specified by add +, remove -, or set = permissions. Use u to indicate that the change applies to the user permissions, g to indicate that it applies to the group permissions, and o for anyone else.
To change the permissions on the file readme to allow the world to have write permission:
chmod o+w readme
If the file isn't owned by the user, preface the command with sudo and supply a superuser password:
sudo chmod g+w readme
To add executable permission to the file (e.g. because it is a script) for the owner of the file and for users in the group that owns the file use:chmod ug+x readme
To remove the read and write permissions for the group and the world:
chmod go-rw readme
Use the the set = operation to set the permissions to an exact configuration - without regard to what the permissions are currently. To set the file to be readable and writable by the owner, but only readable by group and "other":
chmod u=rw,go=r readme
Things work similarly for directory permissions. Often you want to change the permissions of a directory and all its contents. Do this with the -R option to chmod.
To change a directory and all its contents to be writable by the owner and group use:
chmod -R ug+w Folder
Likewise, the flag X (uppercase) can be combined with the -R option to ensure that a folder and all of its sub-folders have execute permission, but not the files. Typically execute permission is not desirable for files unless they are applications or scripts.
chmod -R ugo+X Folder
2.4.09
Fundamental Unix Commands
Overview of the Apple OS X BSD UNIX implementation
Opening a terminal
To access the UNIX subsystems in OS X you need to open the terminal application. The terminal is located in the Utilities folder under applications.
Navigating in the terminal
There are numerous concise ways to specify directory information.
Basic Unix Commands
CP: cp copies a file from a source name to a target name.
cp [modifiers]
Some common modifiers for copy are:
-r for recursive (used to copy whole directories)
-f force (don’t ask me if I want to do it just do it)
-P preserve permissions
For example
MV: mv moves a file from one directory to another, or from one name to another, or a combination of both. It often confuses new users that renaming in Unix is the same as moving.
mv [modifiers]
Some common modifiers for move are:
-r for recursive (used to copy whole directories)
-f force (don’t ask me if I want to do it just do it)
-P preserve permissions
For example
RM: rm removes a file or with the -r modifier, a directory.
rm [modifiers]
Some common modifiers for move are:
-r for recursive (used to delete whole directories)
-f suppresses confirmation prompts asking if you really want to delete read-only files.
For example
CD: cd changes the current working directory.
PWD: pwd displays the current working directory.
MKDIR: mkdir creates a new directory. Some common modifiers for mkdir are:
-p used to create a whole hierarchy of folders in one step.
For example
Some common modifiers for ls are:
-l provides a long listing of the file information.
-lt provides a long listing but now sorted in chronological order
-a lists all files, including hidden dot files.
-d lists the directory itself (the default behavior lists the contents of the directory instead).
For example
Last of all, is the command that helps you find out more about any command. This is the man command.
MAN: man accesses the built in manual pages. For example to find all the modifier options for ls, type:
produces a list of all the man pages that contain copy in their header lines.
This is just the begining, there is more Unix to come, but this will get you started.
Opening a terminal
To access the UNIX subsystems in OS X you need to open the terminal application. The terminal is located in the Utilities folder under applications.
Navigating in the terminal
- The horizontal arrow ←→ keys move the cursor left and right
- The vertical arrow ↑↓ keys page through the command history.
- Use the mouse to select text (by highlighting) and ⌘c and ⌘v to cut and paste text
- Delete deletes text behind the cursor the ⌦ key deletes text after the cursor.
There are numerous concise ways to specify directory information.
- . indicates your current directory
- .. indicates the directory right above you on the tree, or parent to the current directory.
- ~ indicates the users own home directory
- ~smith indicates the user Smith's own home directory
- / the root directory, or top of the tree.
Basic Unix Commands
CP: cp copies a file from a source name to a target name.
cp [modifiers]
Some common modifiers for copy are:
-r for recursive (used to copy whole directories)
-f force (don’t ask me if I want to do it just do it)
-P preserve permissions
For example
cp /usr/share/tcsh/examples/login ~/login
copies the default login file for the tcsh shell located in the /usr/share/tcsh/examples to your home directory. See the article on shells for more information on the login file.MV: mv moves a file from one directory to another, or from one name to another, or a combination of both. It often confuses new users that renaming in Unix is the same as moving.
mv [modifiers]
Some common modifiers for move are:
-r for recursive (used to copy whole directories)
-f force (don’t ask me if I want to do it just do it)
-P preserve permissions
For example
mv login .login
just renames the file login to .login.RM: rm removes a file or with the -r modifier, a directory.
rm [modifiers]
Some common modifiers for move are:
-r for recursive (used to delete whole directories)
-f suppresses confirmation prompts asking if you really want to delete read-only files.
For example
rm -rf /usr/local/bin
removes the directory bin, in /usr/local, if you have permission to do so. Note this is NOT something that is a good idea to try.CD: cd changes the current working directory.
cd
PWD: pwd displays the current working directory.
MKDIR: mkdir creates a new directory. Some common modifiers for mkdir are:
-p used to create a whole hierarchy of folders in one step.
For example
mkdir -p work/version1/code
creates the directory code inside of the directory version1, which is inside a directory work. And work is created in the current working directory.
LS: ls displays the contents of a specified directory. By default it displays the current directory.creates the directory code inside of the directory version1, which is inside a directory work. And work is created in the current working directory.
ls [modifiers]
or
ls [modifiers] Some common modifiers for ls are:
-l provides a long listing of the file information.
-lt provides a long listing but now sorted in chronological order
-a lists all files, including hidden dot files.
-d lists the directory itself (the default behavior lists the contents of the directory instead).
For example
ls -t /usr/share/tcsh/examples
lists the contents of the directory /usr/share/tcsh/examples in the chronological order in which they were created.Last of all, is the command that helps you find out more about any command. This is the man command.
MAN: man accesses the built in manual pages. For example to find all the modifier options for ls, type:
man ls
to find the name of a command which does something specific use the -k modifier.man -k password
lists all the commands which deal with passwords. Alternatively use the command apropos which searches through the header lines of the man pages for whatever keyword you supply, and lists the man pages containing it. For example,apropos copy
produces a list of all the man pages that contain copy in their header lines.
This is just the begining, there is more Unix to come, but this will get you started.
30.3.09
Issues programming on multicore architectures
It turns out many existing distributed memory parallel algorithms are poorly designed for parallel execution on multicore processors, because they have simply been optimized for the wrong design parameters.
Read more here.
In the past we have been striving for algorithms to maximize parallelism and at the same time minimize the communication between the threads. For multicore processors, however, the cost of thread communication is relatively cheap as long as the communicated data resides in a cache shared by the threads. Also, the amount of parallelism that can be explored by a multicore processor is limited by its number of cores multiplied by the number of threads running on each core. Instead, a third parameter is gaining importance for parallel multicore applications: the memory usage.
Read more here.
24.3.09
Customizing your log in shell
Shell resource files:
Every time you log in, the UNIX shell searches your home directory for shell resource files to execute. These files, prefixed with a period, customize the UNIX session. Each shell has its own unique set of resource files. We'll discuss those for C-shell, the Korn shell, and the bash shell.
CSH/TCSH:
For the csh/tcsh shell, the two initialization files are the the the .cshrc (pronounced `dot-see-shirk') file, (alternatively tcsh accepts the either a .cshrc or .tcshrc file), and the (pronounced `dot-login') file. The .cshrc file is executed every time a new C shell is started. The .login is executed after the .cshrc file only when you initially log in. Generally, so that every new copy of the C shell will be able to use them, any alias and set commands should be placed in the file. Although it is also permissible to create a separate third file for aliases called .aliases. the .aliases file must be defined and executed as part of the .cshrc/.tcshrc file, just like the .env file on the Korn shell. In this case it is necessary to Environment variables are typically set in the .login file, as well as library and manual paths, and terminal settings. In many cases, this division isn't clear cut.
KORN:
On login the ksh shell executes a profile file called .profile. The .profile is used to set environment variables and shell options. Aliases can also be put in the Profile file, but it's considered good practice to put them in a separate environment file called .env. The environment file is defined and executed as part of the .profile. This is different from how the C-shell resource files are handled. An example of a korn shell shows that its syntax is based on that of the Bourne shell.
BASH:
On login the bash shell executes a resource file called .bashrc that is responsible for all the shell customization. An example of a bash shell shows that its syntax is similar to the Bourne and Korn shells.
Critical Aliases:
In my mind, the three most important aliases are for the mv, cp, and rm commands. The default behavior of the mv and cp commands over writes any existing file without warning. The default behavior of the rm command removes a file without warning. In UNIX once a file is removed, there is no easy or gauranteed way to get it back. The flag -i forces the mv, cp, and rm commands to ask for comformation before completing. I always add these aliases to any new account, because it can save you from making a careless mistake that wipes out important work.
Otherwise the syntax is:
Useful Aliases:
Some other useful aliases are:
alias c 'clear' shorthand a command.
alias h 'history
alias ls 'ls -F' change the default behavior of a command.
alias cd 'cd \!*; ls' show the contents of a directory when you go to it.
alias exit 'logout' add new names to existing commands
alias print 'enscript -G2rc -dprintername' this one prints a text file as 2 columns.
In addition with the Bash useful functions came be defined, such as this one which automatically picks the correct way to extract an archive.
If you know of any others, send them to me and I'll add them to the list.
Every time you log in, the UNIX shell searches your home directory for shell resource files to execute. These files, prefixed with a period, customize the UNIX session. Each shell has its own unique set of resource files. We'll discuss those for C-shell, the Korn shell, and the bash shell.
CSH/TCSH:
For the csh/tcsh shell, the two initialization files are the the the .cshrc (pronounced `dot-see-shirk') file, (alternatively tcsh accepts the either a .cshrc or .tcshrc file), and the (pronounced `dot-login') file. The .cshrc file is executed every time a new C shell is started. The .login is executed after the .cshrc file only when you initially log in. Generally, so that every new copy of the C shell will be able to use them, any alias and set commands should be placed in the file. Although it is also permissible to create a separate third file for aliases called .aliases. the .aliases file must be defined and executed as part of the .cshrc/.tcshrc file, just like the .env file on the Korn shell. In this case it is necessary to Environment variables are typically set in the .login file, as well as library and manual paths, and terminal settings. In many cases, this division isn't clear cut.
KORN:
On login the ksh shell executes a profile file called .profile. The .profile is used to set environment variables and shell options. Aliases can also be put in the Profile file, but it's considered good practice to put them in a separate environment file called .env. The environment file is defined and executed as part of the .profile. This is different from how the C-shell resource files are handled. An example of a korn shell shows that its syntax is based on that of the Bourne shell.
BASH:
On login the bash shell executes a resource file called .bashrc that is responsible for all the shell customization. An example of a bash shell shows that its syntax is similar to the Bourne and Korn shells.
Critical Aliases:
In my mind, the three most important aliases are for the mv, cp, and rm commands. The default behavior of the mv and cp commands over writes any existing file without warning. The default behavior of the rm command removes a file without warning. In UNIX once a file is removed, there is no easy or gauranteed way to get it back. The flag -i forces the mv, cp, and rm commands to ask for comformation before completing. I always add these aliases to any new account, because it can save you from making a careless mistake that wipes out important work.
In csh/tcsh shells, the syntax is:
alias mv '/usr/bin/mv -i '
alias cp '/usr/bin/cp -i '
alias rm '/usr/bin/rm -i '
alias cp '/usr/bin/cp -i '
alias rm '/usr/bin/rm -i '
Otherwise the syntax is:
alias mv='/usr/bin/mv -i '
alias cp='/usr/bin/cp -i '
alias rm='/usr/bin/rm -i '
alias cp='/usr/bin/cp -i '
alias rm='/usr/bin/rm -i '
Useful Aliases:
Some other useful aliases are:
alias c 'clear' shorthand a command.
alias h 'history
alias ls 'ls -F' change the default behavior of a command.
alias cd 'cd \!*; ls' show the contents of a directory when you go to it.
alias exit 'logout' add new names to existing commands
alias print 'enscript -G2rc -dprintername' this one prints a text file as 2 columns.
In addition with the Bash useful functions came be defined, such as this one which automatically picks the correct way to extract an archive.
function extract() # Handy Extract Program.
{
if [ -f $1 ] ; then
case $1 in
*.tar.bz2) tar xvjf $1 ;;
*.tar.gz) tar xvzf $1 ;;
*.bz2) bunzip2 $1 ;;
*.rar) unrar x $1 ;;
*.gz) gunzip $1 ;;
*.tar) tar xvf $1 ;;
*.tbz2) tar xvjf $1 ;;
*.tgz) tar xvzf $1 ;;
*.zip) unzip $1 ;;
*.Z) uncompress $1 ;;
*.7z) 7z x $1 ;;
*) echo "'$1' cannot be extracted" ;;
esac else echo "'$1' is not a valid file" fi }
If you know of any others, send them to me and I'll add them to the list.
23.3.09
Customizing your work environment: Shells
One of the chief advantages the UNIX OS, is that it provides the user with a high level of customization. This is why one of my pet peeves are system administrators who refuse to set up machines with anything other than the default settings. Productivity is largely a factor of having a setup that works. One person's optimal work environment is not necessarily an other's. As a software developer I don't use the Window's OS because I find it clumsy and non-intuitive, but I know plenty of managers who find it the exact opposite. To each their own.
Changing your shell:
Lets start with how to change your log in shell and then discuss why you might want to. From any command line prompt you can just type the name of the shell you want, e.g. csh, tcsh, korn, bash, etc. But this only gets you into the shell temporally. Next time you log back in you are back to the same shell.
If you don't know, you can find out which shell you're currently using, by typing at the UNIX prompt:
echo $SHELL
To change the shell so that it remains changed when log back in use the chsh command. See this link for examples of how the chsh command differs on various platforms. Unfortunately, the bad news is that this this doesn't always work ,depending on how the system was set up. You may need to resort to bribing your system administrator. May suggest junk food or beer!
Shells:
As a user, the first thing you interact with is the system shell. The UNIX world contains a menagerie of system shells.
In the beginning was the Bourne shell (/bin/sh). The Bourne shell, written by S. R. Bourne, is built around a powerful syntactical language, with all the features that needed to produce structured programs. It has strong provisions for controlling input and output, and expression matching facilities. Its main disadvantage is that it was designed with nearly no concessions to the interactive user
Next came a complete redesign called the C-shell (/bin/csh). Designed at UC Berkeley, the C-shell was designed for interactive use. It used a new input language designed to resemble C and added several new concepts including job control and aliasing. Unfortunately, the new shell was too buggy to produce robust shell scripts, thus the community split between the Bourne shell for scripts and the C-shell for interactive use.
The community eventually decided to fix the bugs in the C-shell, creating the T-shell (/bin/tcsh). The tcsh shell added numerous additional features including command line editing, and command line completion. Unfortunately the tcsh shell never got widely distributed by the various UNIX system manufacturers.
Around the same time David Korn, at AT&T, developed the Korn shell (/bin/ksh) as an extension of the original Bourne shell, including many features that made the C shell good for interactive work. Unfortunately the Korn shell wasn't free, you had to pay AT&T for it.
At about this time, the GNU project was underway and decided that they needed a free shell that took the best parts of the Bourne and Korn shells, as well as features from the C shell and other operating systems. This project became the bash shell (for "Bourne Again SHell"). The Bash shell was quickly adopted for LINUX (where it can be configured to perform just like the Bourne shell), and is the most popular of the free new generation shells.
The final two shells are the Z-shell and RC. The Z-shell (zsh) bears the most resemblance to the Korn shell. The zsh shell features command line editing, spelling correction, word completion and a history mechanism. The rc shell, by contrast, is a basic shell quite similar to sh. The syntax of rc contains more similarities to the C language than even csh.
Where do you find these shells:
Currently the Bourne shell dominates the LINUX world. Straight out of the box, Mac's Darwin also runs bash. The Korn shell seems to be common in the big iron supercomputing world.
References:
For a general introduction to basic UNIX commands, see here or here.
For a bit more on the history of UNIX shells check here or here.
For a through discussion of the Korn shell dot files.
Changing your shell:
Lets start with how to change your log in shell and then discuss why you might want to. From any command line prompt you can just type the name of the shell you want, e.g. csh, tcsh, korn, bash, etc. But this only gets you into the shell temporally. Next time you log back in you are back to the same shell.
If you don't know, you can find out which shell you're currently using, by typing at the UNIX prompt:
echo $SHELL
To change the shell so that it remains changed when log back in use the chsh command. See this link for examples of how the chsh command differs on various platforms. Unfortunately, the bad news is that this this doesn't always work ,depending on how the system was set up. You may need to resort to bribing your system administrator. May suggest junk food or beer!
Shells:
As a user, the first thing you interact with is the system shell. The UNIX world contains a menagerie of system shells.
In the beginning was the Bourne shell (/bin/sh). The Bourne shell, written by S. R. Bourne, is built around a powerful syntactical language, with all the features that needed to produce structured programs. It has strong provisions for controlling input and output, and expression matching facilities. Its main disadvantage is that it was designed with nearly no concessions to the interactive user
Next came a complete redesign called the C-shell (/bin/csh). Designed at UC Berkeley, the C-shell was designed for interactive use. It used a new input language designed to resemble C and added several new concepts including job control and aliasing. Unfortunately, the new shell was too buggy to produce robust shell scripts, thus the community split between the Bourne shell for scripts and the C-shell for interactive use.
The community eventually decided to fix the bugs in the C-shell, creating the T-shell (/bin/tcsh). The tcsh shell added numerous additional features including command line editing, and command line completion. Unfortunately the tcsh shell never got widely distributed by the various UNIX system manufacturers.
Around the same time David Korn, at AT&T, developed the Korn shell (/bin/ksh) as an extension of the original Bourne shell, including many features that made the C shell good for interactive work. Unfortunately the Korn shell wasn't free, you had to pay AT&T for it.
At about this time, the GNU project was underway and decided that they needed a free shell that took the best parts of the Bourne and Korn shells, as well as features from the C shell and other operating systems. This project became the bash shell (for "Bourne Again SHell"). The Bash shell was quickly adopted for LINUX (where it can be configured to perform just like the Bourne shell), and is the most popular of the free new generation shells.
The final two shells are the Z-shell and RC. The Z-shell (zsh) bears the most resemblance to the Korn shell. The zsh shell features command line editing, spelling correction, word completion and a history mechanism. The rc shell, by contrast, is a basic shell quite similar to sh. The syntax of rc contains more similarities to the C language than even csh.
Where do you find these shells:
Currently the Bourne shell dominates the LINUX world. Straight out of the box, Mac's Darwin also runs bash. The Korn shell seems to be common in the big iron supercomputing world.
References:
For a general introduction to basic UNIX commands, see here or here.
For a bit more on the history of UNIX shells check here or here.
For a through discussion of the Korn shell dot files.
19.3.09
How to link to any part of a video on youtube
Ya, I know it is a stretch to say this has anything to do with scientific computing, but its still a pretty useful hack.
18.3.09
Creating & Using libraries in Scientific Code
For the discussion here we're going to restrict ourselves to Unix/Linux computing environments.
Suppose you have a program consisting of a collection source files. Typically scientific applications are constantly evolving, but parts of that code base, utilities, linear algebra solvers, IO routines, etc., may be static. This static code might be duplicated across numerous applications, and each application may have slightly differing versions. A simple way to control this static code is to create a library file out of it, and place it in a shared location.
Library Construction:
There are two types of libraries, static and dynamic, the latter also known as shareable. Both versions of these libraries are functionally equivalent. Lets look at each type.
Static Libraries:
A static library is just an archive of object files, usually indicated by a .a suffix. Using the Unix ar command a collection of object files may be combined to create a library. The benefit of a static library is that it gets attached to the executable of the target application. The executable is self-contained meaning the library's presence is not required when running the program. Under some circumstances, the executable may run faster.
Dynamic Libraries:
Dynamic libraries or shared libraries differ from static libraries in that the library is not part of the executable, but is linked at run time. Therefore dynamic libraries need to be present when the application is run. A dynamic library is typically indicated by the .so suffix. There main advantage of dynamic libraries are that since the executable does not include the library within it, the executable is significantly smaller. The Windows OS relies on dynamic libraries.
Lets look at static libraries and leave dynamic libraries for later. Creating a static library is pretty simple.
$(FC) -c ftest1.f90 ftest2.f90 ftest3.f90
creating three object files ftest1.o ftest2.o ftest3.o, and if the source contains any Modules, a series of module symbol files with the suffix .mod. Keep track of these because we will need them for the linking stage.
ar -cvq libftest.a ftest1.o ftest2.o ftest3.o
which creates an archive named libftest.a
See that was pretty easy.
Now you can see what's in the archive with the archive command
ar -t libftest.a
Running this produces the list:
ftest1.o
ftest2.o
ftest3.o
A second Unix command nm lists all the symbols contained in each of the object files contained in the archive. For our purposes, symbols are associations with routines, common data, and modules in the source code.
Library Use:
The last step here is actual library use. The exact procedure varies somewhat between languages and compilers. So we will focus on the case of Fortran 90/95 code.
Historically the compile and link and loads phase of creating an executable used separate commands. Modern Fortran compilers $(FC) combine all three steps in a single compiler call.
If all the object files and libraries you need are local to the build, or are part of the library path environment variable, things should just build.
Typically this isn't the case and you need to specify the location and name of the libraries. Suppose our previous library called libftest.a is located in the /mydirectory/mylibrary/lib. This is done by adding two terms to the compile statement. The fisrt is the lowercase -l flag for the library name, and the second is the uppercase flag -L for the library path.
$(FC)-o myprogram myprogram.o -L/mydirectory/mylibrary/lib -lftest
This creates an executable called myprogram resulting from the source file myprogram.f90 linked to the library libftest.a located in the directory /mydirectory/mylibrary/lib.
Note two important things:
Now you are not necessarily finished. If the library libftest.a contains Fortran modules, and those modules are declared within the source code myprogram.f90, then you also need to specify the path to the .mod files created above during the compilation.
Module files contain an associated symbol file that holds information needed by program units, subprograms, and interface bodies that USE that module. By default, it is assumed that these symbol files exist in the current directory. For libraries, this is typically not the case, and a path to these .mod files must be specified.
Typically compilers provide a method to specify the module path. The -I flag is intended for specifying the path to any include and/or module files. Some compilers provide a second flag -M intended just for specifying the module files paths. The -I flag is the most common on across compilers. Some compilers, like Absoft, use -p instead, so check the compiler flag options for your compiler to be certain.
The following example creates an executable called myprogram.
$(FC)-o myprogram myprogram.o -I/mydirectory/mylibrary/mod -L/mydirectory/mylibrary/lib -lftest
The executable was created by linking the object file myprogram.o with the library libftest.a which lives in the directory /mydirectory/mylibrary/lib and the corresponding module files which live /mydirectory/mylibrary/mod.
For further info see:
YoLinux
Fortran Programming guide: Libraries
Suppose you have a program consisting of a collection source files. Typically scientific applications are constantly evolving, but parts of that code base, utilities, linear algebra solvers, IO routines, etc., may be static. This static code might be duplicated across numerous applications, and each application may have slightly differing versions. A simple way to control this static code is to create a library file out of it, and place it in a shared location.
Library Construction:
There are two types of libraries, static and dynamic, the latter also known as shareable. Both versions of these libraries are functionally equivalent. Lets look at each type.
Static Libraries:
A static library is just an archive of object files, usually indicated by a .a suffix. Using the Unix ar command a collection of object files may be combined to create a library. The benefit of a static library is that it gets attached to the executable of the target application. The executable is self-contained meaning the library's presence is not required when running the program. Under some circumstances, the executable may run faster.
Dynamic Libraries:
Dynamic libraries or shared libraries differ from static libraries in that the library is not part of the executable, but is linked at run time. Therefore dynamic libraries need to be present when the application is run. A dynamic library is typically indicated by the .so suffix. There main advantage of dynamic libraries are that since the executable does not include the library within it, the executable is significantly smaller. The Windows OS relies on dynamic libraries.
Lets look at static libraries and leave dynamic libraries for later. Creating a static library is pretty simple.
- Compile the source objects
$(FC) -c ftest1.f90 ftest2.f90 ftest3.f90
creating three object files ftest1.o ftest2.o ftest3.o, and if the source contains any Modules, a series of module symbol files with the suffix .mod. Keep track of these because we will need them for the linking stage.
- Create an archive from the object files using the Unix archive ar command.
ar -cvq libftest.a ftest1.o ftest2.o ftest3.o
which creates an archive named libftest.a
- Last step is to create and add a table of contents to te archive. This is done using the UNIX command ranlib. This last step differentiates between an archive of object files and an actual library.
See that was pretty easy.
Now you can see what's in the archive with the archive command
ar -t libftest.a
Running this produces the list:
ftest1.o
ftest2.o
ftest3.o
A second Unix command nm lists all the symbols contained in each of the object files contained in the archive. For our purposes, symbols are associations with routines, common data, and modules in the source code.
Library Use:
The last step here is actual library use. The exact procedure varies somewhat between languages and compilers. So we will focus on the case of Fortran 90/95 code.
Historically the compile and link and loads phase of creating an executable used separate commands. Modern Fortran compilers $(FC) combine all three steps in a single compiler call.
After the compiler compiles the source files, it uses the ld command to link the resulting .o files, any .o files that you specify as input files, and some of the .o and .a files in the product and system library directories. The compiler can then produce a single .o object file or a single executable output file from these object files.
If all the object files and libraries you need are local to the build, or are part of the library path environment variable, things should just build.
Typically this isn't the case and you need to specify the location and name of the libraries. Suppose our previous library called libftest.a is located in the /mydirectory/mylibrary/lib. This is done by adding two terms to the compile statement. The fisrt is the lowercase -l flag for the library name, and the second is the uppercase flag -L for the library path.
$(FC)-o myprogram myprogram.o -L/mydirectory/mylibrary/lib -lftest
This creates an executable called myprogram resulting from the source file myprogram.f90 linked to the library libftest.a located in the directory /mydirectory/mylibrary/lib.
Note two important things:
- There are no spaces between the flags and the entries.
- The library name excludes the prefix lib and the suffix .a
Now you are not necessarily finished. If the library libftest.a contains Fortran modules, and those modules are declared within the source code myprogram.f90, then you also need to specify the path to the .mod files created above during the compilation.
Module files contain an associated symbol file that holds information needed by program units, subprograms, and interface bodies that USE that module. By default, it is assumed that these symbol files exist in the current directory. For libraries, this is typically not the case, and a path to these .mod files must be specified.
Typically compilers provide a method to specify the module path. The -I flag is intended for specifying the path to any include and/or module files. Some compilers provide a second flag -M intended just for specifying the module files paths. The -I flag is the most common on across compilers. Some compilers, like Absoft, use -p instead, so check the compiler flag options for your compiler to be certain.
The following example creates an executable called myprogram.
$(FC)-o myprogram myprogram.o -I/mydirectory/mylibrary/mod -L/mydirectory/mylibrary/lib -lftest
The executable was created by linking the object file myprogram.o with the library libftest.a which lives in the directory /mydirectory/mylibrary/lib and the corresponding module files which live /mydirectory/mylibrary/mod.
For further info see:
YoLinux
Fortran Programming guide: Libraries
17.3.09
A brief introduction to building Fortran Apps
While the Fortran programming language, developed by a team of computer scientists at IBM in the late 1950's, was the first high level language programming language, it has been largely superseded by C and C++ in fields of engineering, and Java and C# in commercial applications. The one place it retains significant use is in the research sciences where large amounts of legacy code is still in use and in computationally intensive tasks, such as weather and climate modeling, computational fluid dynamics, computational chemistry, computational economics, and computational physics.
So lets talk about Fortran.
The current language standard is up to 2003, although the majority of compilers only adhere to the 1990 or 1995 standards. The big break in the language occurred between the 1977 standard and what came after. The 1990+ standard was a significant modernization of the language which added vector operations, pointers, and derived types, among other changes, such as free-form source input and relaxed capitalization rules.
Fortran source files can have a variety of possible suffixes (.f, .F, .f77, .F77, .f95, .f90, .F90, or .F95). The files with a capital suffix (.F, F77, F95, or F90) indicate that the source code should be preprocessed by the C preprocessor (cpp) before being compiled, otherwise the only difference between the suffixes is the adherence to a specific standard.
Compilation
The source code compilation occurs in two steps
The most basic build command looks like this:
$(FC) myprogram.f -o myprogram.exe
It builds the executable myprogram.exe from the source myprogram.f. The -o flag indicates the name of the executable. If omitted, the default name is a.out.
Next, suppose you have a c code and want to link to the math library libm.a. Because the math library is the library path variable you only need to indicate the name of the library with the lowercase -l flag such as
$(CC) -o myprogram.exe -lm myprogram.c
Note that the prefix lib and suffix .a are not included when the library is named. Also note that there are no spaces between the lowercase -l flag and the library name.
Suppose now that the library is not part of the system libraries, such as the netcdf IO libraries. In this case it is necessary to tell the compiler where to find this library. This is done with the uppercase -L library path flag.
$(FC) -o myprogram.exe myprogram.F -L/mylibrarypath/lib -lnetcdf
Because the name of the source code is followed by capital .F suffix, the source code was preprocessed using CPP processor before compiling.
This build looks for the library libnetcdf.a in the directory /mylibrarypath/lib. If the linker finds the library, the build succeeds. If the linker cannot find the library, or the indicated library does not contain the library element or symbol being called, the linker will give you a message roughly saying that you have unresolved or undefined symbols. (See the blog entry on linking and loading for more information on that).
To compile a source file without creating the executable, thus only creating the object file, use the -c flag. Suppose there is a second file called utility.f, to just produce its object file use
$(FC) -c utility.f
Now that we have the object file for utility.f, we can build the executable myprogram.exe by linking to utility.o with the command
$(FC) -o myprogram.exe myprogram.F utility.o -L/mylibrarypath/lib -lnetcdf
or if there are multiple files to link, use a wild card to represent the other object files.
$(FC) -o myprogram.exe myprogram.F *.o -L/mylibrarypath/lib -lnetcdf
So lets talk about Fortran.
The current language standard is up to 2003, although the majority of compilers only adhere to the 1990 or 1995 standards. The big break in the language occurred between the 1977 standard and what came after. The 1990+ standard was a significant modernization of the language which added vector operations, pointers, and derived types, among other changes, such as free-form source input and relaxed capitalization rules.
Fortran source files can have a variety of possible suffixes (.f, .F, .f77, .F77, .f95, .f90, .F90, or .F95). The files with a capital suffix (.F, F77, F95, or F90) indicate that the source code should be preprocessed by the C preprocessor (cpp) before being compiled, otherwise the only difference between the suffixes is the adherence to a specific standard.
Compilation
The source code compilation occurs in two steps
- First the source code is compiled using the Fortran compiler $(FC) to create the object files (*.o). These files consist of binary code and data for a source file.
- Next, the object files are linked. Linkers combine multiple object files along with along with any external objects such as libraries. The linking stage, called transparently by the compiler, calls the linker ld command.
- Lastly, the loader , and loads everything into memory.
The most basic build command looks like this:
$(FC) myprogram.f -o myprogram.exe
It builds the executable myprogram.exe from the source myprogram.f. The -o flag indicates the name of the executable. If omitted, the default name is a.out.
Next, suppose you have a c code and want to link to the math library libm.a. Because the math library is the library path variable you only need to indicate the name of the library with the lowercase -l flag such as
$(CC) -o myprogram.exe -lm myprogram.c
Note that the prefix lib and suffix .a are not included when the library is named. Also note that there are no spaces between the lowercase -l flag and the library name.
Suppose now that the library is not part of the system libraries, such as the netcdf IO libraries. In this case it is necessary to tell the compiler where to find this library. This is done with the uppercase -L library path flag.
$(FC) -o myprogram.exe myprogram.F -L/mylibrarypath/lib -lnetcdf
Because the name of the source code is followed by capital .F suffix, the source code was preprocessed using CPP processor before compiling.
This build looks for the library libnetcdf.a in the directory /mylibrarypath/lib. If the linker finds the library, the build succeeds. If the linker cannot find the library, or the indicated library does not contain the library element or symbol being called, the linker will give you a message roughly saying that you have unresolved or undefined symbols. (See the blog entry on linking and loading for more information on that).
To compile a source file without creating the executable, thus only creating the object file, use the -c flag. Suppose there is a second file called utility.f, to just produce its object file use
$(FC) -c utility.f
Now that we have the object file for utility.f, we can build the executable myprogram.exe by linking to utility.o with the command
$(FC) -o myprogram.exe myprogram.F utility.o -L/mylibrarypath/lib -lnetcdf
or if there are multiple files to link, use a wild card to represent the other object files.
$(FC) -o myprogram.exe myprogram.F *.o -L/mylibrarypath/lib -lnetcdf
15.3.09
New Blog
Hi,
this is a new blog for 2009. Its creation was motivated by a discussion I had with a colleague last week. He was helping me figure out why I was unable to link my main program to a library I had just created. And as he rattled off a series of things to try, I asked how he'd learned all this, since I've never seen a book discussing this particular topic. He answered that he'd just picked up these tips over time, and he's written them all down with the idea of writing a book when he has the free time. We both laughed at the concept of his having any free time with four kids at home and his being the lead on his project.
So this blog is intended to be my attempt at something like that book that he'll never write. Its goal is to discuss topics related to computationally intensive computing, parallel programming, numerical algorithms, computer hardware, work flow tips, and visualization.
I'm hoping along the way to get contributions from others. We'll see what happens.
this is a new blog for 2009. Its creation was motivated by a discussion I had with a colleague last week. He was helping me figure out why I was unable to link my main program to a library I had just created. And as he rattled off a series of things to try, I asked how he'd learned all this, since I've never seen a book discussing this particular topic. He answered that he'd just picked up these tips over time, and he's written them all down with the idea of writing a book when he has the free time. We both laughed at the concept of his having any free time with four kids at home and his being the lead on his project.
So this blog is intended to be my attempt at something like that book that he'll never write. Its goal is to discuss topics related to computationally intensive computing, parallel programming, numerical algorithms, computer hardware, work flow tips, and visualization.
I'm hoping along the way to get contributions from others. We'll see what happens.
Subscribe to:
Posts (Atom)