Blog
Producing scientific results on your coworker's computer
Some key words and properties to consider when performing possibly RAM-intensive computations on your machine or one of your coworkers. Some specifics might relate to our local DUNE setup.
Apart from machines of coworkers, you can use the compute servers. There is a compute cluster Allegro to which more or less every member of our group should get access to by asking the IT staff providing a confirmation of your supervisor.
Finding a machine
Remotely log in to the machine
ssh username@machinenameExplore current usage and capabilities
htop
Preparing your computation
Create a working directory, preferably on the remote local hard drive
mkdir -p /srv/public/username/workingdirectoryPrepare the executable and required data. You certainly would like to run a release version of your program. If you compiled with the compiler flag -march=native or similar, make sure that the target machine supports the required operation. If not your program might fail showing the message Illegal instruction or Ungültiger Maschinenbefehl. Compile without these flags when in doubt and your program doesn't run properly. Copy the executable and related data, e.g.
scp your-local-executable username@machinename:/srv/public/username/workingdirectory scp your-local-data.parset username@machinename:/srv/public/username/workingdirectoryYou might need to copy shared libraries additionally.
Running your computations without constant supervision
Remotely log in to the machine (above) and set up a tmux session
ssh username@machinename tmuxTerminate the process early if it exceeds the available fast memory
ulimit -S someValueLowerThanRAMinKiloBytesStart your process. I recommend writing its output to a separate logfile
cd /srv/public/username/workingdirectory ./your-local-executable > your-local-logfile.txtIf you had to copy additional shared libraries, you need to specify where they can be found. This can be done by e.g. replacing above call to your executable by
LD_LIBRARY_PATH="path/to/your/copied/libs" ./your-local-executable > your-local-logfile.txtYou can now detach from the tmux-session (Ctrl-B, D) and quit the ssh-session but your program should keep running.
Checking your ongoing computations
Remotely log in to the machine and reconnect to your tmux session.
ssh username@machinename tmux aCheck if the program is still running or what it does currently
htop tail your-local-logfile.txtOr use e.g. less instead of tail if you want to go through the entire logfile. Detach from (program still running) or close your tmux session (program done). Log off from the remote machine.
Results storage, good scientific practice
Copy results that should be persistent, e.g. to large data backed up shared drive, and remove temporary data if you will certainly not need it anymore to free the space.
ssh username@machinename cp /srv/public/username/workingdirectory/relevantdata /storage/mi/username/your/results/folder/hierarchy rm -r /srv/public/username/workingdirectory/irrelevantdata
Consider tagging your results with additional information on how to reproduce them. I have yet to find a way to incorporate this information conveniently for DUNE executables. At least I would appreciate having a simple build mechanism incorporating the git SHA1-ids of all relevant dune-modules and being able to write this to the logfile.
Note that stuff which is associated with your username might infer legal issues if the group members try to access your results since it is not clear a-priori that this is non-personal data. In order to avoid related hassle, make sure to set the group access rights accordingly or copy your results to group storage space, e.g. /group/ag_numerik (not sure about the policy there).
- Following: None
- Previous: Printing multiple pages per side (2018-02-13)