... | @@ -25,33 +25,13 @@ Current hardware |
... | @@ -25,33 +25,13 @@ Current hardware |
|
| [condor172.knot-resolver.cz](https://condor172.mnt.knot-resolver.cz) | :gun: [shotgun](https://gitlab.nic.cz/knot/resolver-benchmarking) | 2x 3C/6T @ 3.20GHz | 64 GB | 2U |
|
|
| [condor172.knot-resolver.cz](https://condor172.mnt.knot-resolver.cz) | :gun: [shotgun](https://gitlab.nic.cz/knot/resolver-benchmarking) | 2x 3C/6T @ 3.20GHz | 64 GB | 2U |
|
|
| [condor173.knot-resolver.cz](https://condor173.mnt.knot-resolver.cz) | :gun: [shotgun](https://gitlab.nic.cz/knot/resolver-benchmarking) | 2x 6C/6T @ 2.10GHZ | 64 GB | 1U |
|
|
| [condor173.knot-resolver.cz](https://condor173.mnt.knot-resolver.cz) | :gun: [shotgun](https://gitlab.nic.cz/knot/resolver-benchmarking) | 2x 6C/6T @ 2.10GHZ | 64 GB | 1U |
|
|
|
|
|
|
* *certset2* servers have conflicting remote management certificates with the rest, use a dedicated browser profile
|
|
* *certset2* servers have conflicting remote management certificates with the rest, use a dedicated browser profile (`firefox -P`)
|
|
|
|
|
|
General
|
|
General
|
|
-------
|
|
-------
|
|
- login using your username (e.g. `tkrizek`) with your gitlab ssh key
|
|
- login using your username (e.g. `tkrizek`) with your gitlab ssh key
|
|
- machines are managed with Ansible: [knot-resolver-ansible](https://gitlab.labs.nic.cz/knot/knot-resolver-ansible)
|
|
- machines are managed with Ansible: [knot-resolver-ansible](https://gitlab.labs.nic.cz/knot/knot-resolver-ansible)
|
|
|
|
|
|
Condor
|
|
|
|
------
|
|
|
|
- **to execute condor commands manually, use `sudo -iu respdiff`**
|
|
|
|
- machines are part of a [*HTCondor cluster*](http://research.cs.wisc.edu/htcondor/)
|
|
|
|
- CI uses the `MAIN` cluster
|
|
|
|
- machine's current cluster is in MOTD
|
|
|
|
- *do not turn off condor* (or the machine) for **submit** role (cluster functioning and GitLab CI depends on it)
|
|
|
|
- condor *can* be turned off for non-essential machines (all except **submit** role), see below
|
|
|
|
- detached cluster can be created for other testing/development (see [knot-resolver-ansible](https://gitlab.labs.nic.cz/knot/knot-resolver-ansible))
|
|
|
|
- few useful commands:
|
|
|
|
|
|
|
|
```
|
|
|
|
condor_q # on submit machine - display current queue
|
|
|
|
condor_status # list machines in cluster
|
|
|
|
condor_q -c 'ClusterId==42` # list matching jobs; operators <=, <, >, >= also supported
|
|
|
|
condor_rm -c 'ClusterId==42` # removes matching jobs - make sure to use condor_q to check first
|
|
|
|
condor_rm -c JobStatus==5 # remove HELD jobs
|
|
|
|
condor_rm -a # remove ALL jobs - for use in detached cluster, use caution
|
|
|
|
```
|
|
|
|
|
|
|
|
Automatic Events
|
|
Automatic Events
|
|
----------------
|
|
----------------
|
|
- check current status in MOTD
|
|
- check current status in MOTD
|
... | @@ -69,8 +49,10 @@ Networking |
... | @@ -69,8 +49,10 @@ Networking |
|
- no NAT, public IP
|
|
- no NAT, public IP
|
|
- IPv4/IPv6 firewall: none for HW machines running tests (VM has only port 22 open to public)
|
|
- IPv4/IPv6 firewall: none for HW machines running tests (VM has only port 22 open to public)
|
|
|
|
|
|
Using the cluster
|
|
Using the HTCondor cluster
|
|
=================
|
|
==========================
|
|
|
|
|
|
|
|
These machines are used to execute respdiff and VM packaging tests.
|
|
|
|
|
|
Executing respdiff
|
|
Executing respdiff
|
|
------------------
|
|
------------------
|
... | @@ -100,8 +82,8 @@ Executing respdiff |
... | @@ -100,8 +82,8 @@ Executing respdiff |
|
|
|
|
|
Using machines for other testing/development
|
|
Using machines for other testing/development
|
|
--------------------------------------------
|
|
--------------------------------------------
|
|
- if possible, use a machine that isn't part of the MAIN cluster (role: `none`) - these can be used as is, instructions below don't apply
|
|
- if possible, use a machine that isn't part of the MAIN cluster (role: `free`) - these can be used as is, instructions below don't apply
|
|
- any machine except **submit** can be temporarily removed from the MAIN cluster and used for other workloads
|
|
- any `machine``condor_exec`` machine can be temporarily removed from the MAIN cluster and used for other workloads
|
|
- machines in detached clusters can be used with condor turned on (when queue is empty and `autorespdiff.timer` is inactive)
|
|
- machines in detached clusters can be used with condor turned on (when queue is empty and `autorespdiff.timer` is inactive)
|
|
- **HOWTO (temporarily turn off condor for a machine)**:
|
|
- **HOWTO (temporarily turn off condor for a machine)**:
|
|
1. turn off condor and wait (~10m) until current job finishes: `remove-from-cluster`
|
|
1. turn off condor and wait (~10m) until current job finishes: `remove-from-cluster`
|
... | | ... | |