|
|
Current hardware
|
|
|
================
|
|
|
|
|
|
| Hostname | Status | Role | CPU | RAM | Note |
|
|
|
| Hostname | Status | Cluster Role | CPU | RAM | Note |
|
|
|
|--------------------------------|-----------------------|----------------------------|--------------|-------|----------|
|
|
|
| gondor-resolver.labs.nic.cz | :white\_check\_mark: | :warning: **submit**, exec | 4 @ 2.40GHz | 16 GB | hw, Brno |
|
|
|
| rivendell-resolver.labs.nic.cz | :white\_check\_mark: | :white\_check\_mark: exec | 4 @ 2.40GHz | 16 GB | hw, Brno |
|
|
|
| mordor-resolver.labs.nic.cz | :white\_check\_mark: | :white\_check\_mark: exec | 4 @ 3.00GHz | 64 GB | hw, Brno |
|
|
|
| rohan-resolver.labs.nic.cz | :x: pending reinstall | - | 16 @ 2.40GHz | 64 GB | hw, Brno |
|
|
|
| rivendell-resolver.labs.nic.cz | :white\_check\_mark: | :gear: exec | 4 @ 2.40GHz | 16 GB | hw, Brno |
|
|
|
| mordor-resolver.labs.nic.cz | :white\_check\_mark: | :gear: exec | 4 @ 3.00GHz | 64 GB | hw, Brno |
|
|
|
| rohan-resolver.labs.nic.cz | :white\_check\_mark: | :white\_check\_mark: none | 16 @ 2.40GHz | 64 GB | hw, Brno |
|
|
|
|
|
|
General
|
|
|
-------
|
... | ... | @@ -14,7 +14,7 @@ General |
|
|
- machines are part of a [*HTCondor cluster*](http://research.cs.wisc.edu/htcondor/)
|
|
|
- *do not turn off condor* (or the machine) for **submit** role (cluster functioning and GitLab CI depends on it)
|
|
|
- daily update/reboot happens at 2:30
|
|
|
- login as user `respdiff` (with you gitlab ssh key)
|
|
|
- login as user `respdiff` (with your gitlab ssh key)
|
|
|
- read MOTD for basic usage
|
|
|
- condor *can* be turned off for non-essential machines (all except **submit** role), see below
|
|
|
- machines are managed with Ansible: [knot-resolver-ansible](https://gitlab.labs.nic.cz/knot/knot-resolver-ansible)
|
... | ... | @@ -58,13 +58,12 @@ Executing respdiff |
|
|
Using machines for other testing/development
|
|
|
--------------------------------------------
|
|
|
|
|
|
- `rohan` is currently not part of the cluster and can be used freely
|
|
|
- any machine except **submit** can be temporarily removed from the cluster and used for other workloads
|
|
|
- **HOWTO**:
|
|
|
- **HOWTO (temporarily remove machine from cluster)**:
|
|
|
1. `condor_off`: removes machine from cluster once current job finishes (~10 mins)
|
|
|
2. wait until `condor_status` no longer has the machine hostname in the list
|
|
|
3. (optional) if you need machine overnight, turn off autoupdate (reboots at 2:30) `systemctl stop autoupdate.timer`
|
|
|
4. run your workload
|
|
|
5. make sure machine is in a "clean" state, as before (no additional lingering services, docker containers, processes...)
|
|
|
5. (optional) re-enable autoupdates if you turned them off `systemctl start autoupdate.timer`
|
|
|
6. return machine to cluster: `condor_on`
|
|
|
- **NOTE**: reboot will cause the machine to return to cluster (handled by `condor.service`) |
|
|
\ No newline at end of file |
|
|
5. `systemctl reboot -i`
|
|
|
- **NOTE**: reboot will cause the machine to return to cluster (handled by `condor.service`) |
|
|
\ No newline at end of file |