Producing scientific results on your coworker's computer

Some key words and properties to consider when performing possibly RAM-intensive computations on your machine or one of your coworkers. Some specifics might relate to our local DUNE setup.

Apart from machines of coworkers, you can use the compute servers. There is a compute cluster Allegro to which more or less every member of our group should get access to by asking the IT staff providing a confirmation of your supervisor.

Finding a machine

Remotely log in to the machine

ssh username@machinename
Explore current usage and capabilities
htop

Preparing your computation

Create a working directory, preferably on the remote local hard drive

mkdir -p /srv/public/username/workingdirectory
Prepare the executable and required data. You certainly would like to run a release version of your program. If you compiled with the compiler flag -march=native or similar, make sure that the target machine supports the required operation. If not your program might fail showing the message Illegal instruction or Ungültiger Maschinenbefehl. Compile without these flags when in doubt and your program doesn't run properly. Copy the executable and related data, e.g.
scp your-local-executable username@machinename:/srv/public/username/workingdirectory
scp your-local-data.parset username@machinename:/srv/public/username/workingdirectory
You might need to copy shared libraries additionally.

Running your computations without constant supervision

Remotely log in to the machine (above) and set up a tmux session

ssh username@machinename
tmux
Terminate the process early if it exceeds the available fast memory
ulimit -S someValueLowerThanRAMinKiloBytes
Start your process. I recommend writing its output to a separate logfile
cd /srv/public/username/workingdirectory
./your-local-executable > your-local-logfile.txt
If you had to copy additional shared libraries, you need to specify where they can be found. This can be done by e.g. replacing above call to your executable by
LD_LIBRARY_PATH="path/to/your/copied/libs" ./your-local-executable > your-local-logfile.txt
You can now detach from the tmux-session (Ctrl-B, D) and quit the ssh-session but your program should keep running.

Checking your ongoing computations

Remotely log in to the machine and reconnect to your tmux session.

ssh username@machinename
tmux a
Check if the program is still running or what it does currently
htop
tail your-local-logfile.txt 
Or use e.g. less instead of tail if you want to go through the entire logfile.

Detach from (program still running) or close your tmux session (program done). Log off from the remote machine.

Results storage, good scientific practice

Copy results that should be persistent, e.g. to large data backed up shared drive, and remove temporary data if you will certainly not need it anymore to free the space.

ssh username@machinename
cp /srv/public/username/workingdirectory/relevantdata /storage/mi/username/your/results/folder/hierarchy
rm -r /srv/public/username/workingdirectory/irrelevantdata

Consider tagging your results with additional information on how to reproduce them. I have yet to find a way to incorporate this information conveniently for DUNE executables. At least I would appreciate having a simple build mechanism incorporating the git SHA1-ids of all relevant dune-modules and being able to write this to the logfile.

Note that stuff which is associated with your username might infer legal issues if the group members try to access your results since it is not clear a-priori that this is non-personal data. In order to avoid related hassle, make sure to set the group access rights accordingly or copy your results to group storage space, e.g. /group/ag_numerik (not sure about the policy there).

Created 2019-03-29, Last touched 2019-04-02.