|
|
Current hardware
|
|
|
================
|
|
|
|
|
|
| Hostname | Status | CPU | RAM | Note |
|
|
|
|--------------------------------|-----------------------------------|----------------------|-------|------|
|
|
|
| isengard-resolver.labs.nic.cz | :warning: **condor submit** | 2T @ 2.10GHz | 4 GB | VM |
|
|
|
| [mordor-resolver.labs.nic.cz](https://mordor-resolver.mnt.labs.nic.cz) | :gear: condor exec | 4C/4T @ 3.00GHz | 64 GB | 1U |
|
|
|
| [doriath-resolver.labs.nic.cz](https://doriath-resolver.mnt.labs.nic.cz) | :gear: condor exec | 6C/12T @ 2.40GHz | 16 GB | 1U |
|
|
|
| [condor171.knot-resolver.cz](https://condor171.mnt.knot-resolver.cz) (*certset2*) | :gear: condor exec | 2x 6C/6T @ 2.10GHz | 16 GB | 1U |
|
|
|
| [condor174.knot-resolver.cz](https://condor174.mnt.knot-resolver.cz) | :gear: condor exec | 2x 6C/6T @ 2.10GHz | 16 GB | 1U, extra NIC |
|
|
|
| [condor175.knot-resolver.cz](https://condor175.mnt.knot-resolver.cz) | :gear: condor exec | 4C/8T @ 3.28GHz | 16 GB | 1U |
|
|
|
| [condor176.knot-resolver.cz](https://condor176.mnt.knot-resolver.cz) (*certset2*) | :gear: condor exec | 6C/6T @ 2.80GHz | 16 GB | 1U |
|
|
|
| [condor177.knot-resolver.cz](https://condor177.mnt.knot-resolver.cz) (*certset2*) | :gear: condor exec | 6C/6T @ 2.80GHz | 16 GB | 1U |
|
|
|
| [condor178.knot-resolver.cz](https://condor178.mnt.knot-resolver.cz) | :gear: condor exec | 2x 4C/8T @ 2.60GHz | 32 GB | supermicro |
|
|
|
| [condor179.knot-resolver.cz](https://condor179.mnt.knot-resolver.cz) | :gear: condor exec | 2x 4C/8T @ 2.60GHz | 32 GB | supermicro |
|
|
|
| [condor181.knot-resolver.cz](https://condor181.mnt.knot-resolver.cz) | :construction_site: [LXC](https://gitlab.nic.cz/labs/lxc-gitlab-runner) (3 jobs) | 2x 6C/6T @ 2.10GHz | 16 GB | 1U, extra NIC |
|
|
|
| [condor182.knot-resolver.cz](https://condor182.mnt.knot-resolver.cz) | :construction_site: [LXC](https://gitlab.nic.cz/labs/lxc-gitlab-runner) (8 jobs) | 2x 8C/16T @ 2.50GHz | 32 GB | 2U |
|
|
|
| [condor183.knot-resolver.cz](https://condor183.mnt.knot-resolver.cz) | :construction_site: [LXC](https://gitlab.nic.cz/labs/lxc-gitlab-runner) (4 jobs) | 8C/16T @ 2.50GHz | 32 GB | 2U |
|
|
|
| [condor184.knot-resolver.cz](https://condor184.mnt.knot-resolver.cz) | :hatching_chick: WIP | 2x 4C/8T @ 2.60GHz | 64 GB | supermicro |
|
|
|
| [condor185.knot-resolver.cz](https://condor185.mnt.knot-resolver.cz) | :hatching_chick: WIP | 2x 4C/8T @ 2.60GHz | 64 GB | supermicro |
|
|
|
| [condor186.knot-resolver.cz](https://condor186.mnt.knot-resolver.cz) | :hatching_chick: WIP | 2x 4C/8T @ 2.60GHz | 64 GB | supermicro |
|
|
|
| [condor187.knot-resolver.cz](https://condor187.mnt.knot-resolver.cz) | :hatching_chick: WIP | 2x 4C/8T @ 2.60GHz | 64 GB | supermicro |
|
|
|
| [mirkwood-resolver.labs.nic.cz](https://mirkwood-resolver.mnt.labs.nic.cz) | :arrow_right: free (benchmarks) | 2x 8C/16T @ 2.60GHz | 32 GB | 2U |
|
|
|
| [fangorn-resolver.labs.nic.cz](https://fangorn-resolver.mnt.labs.nic.cz) | :arrow_right: free (benchmarks) | 2x 4C/8T @ 3.10GHZ | 32 GB | 2U |
|
|
|
| [condor172.knot-resolver.cz](https://condor172.mnt.knot-resolver.cz) | :gun: [shotgun](https://gitlab.nic.cz/knot/resolver-benchmarking) | 2x 3C/6T @ 3.20GHz | 64 GB | 2U |
|
|
|
| [condor173.knot-resolver.cz](https://condor173.mnt.knot-resolver.cz) | :gun: [shotgun](https://gitlab.nic.cz/knot/resolver-benchmarking) | 2x 6C/6T @ 2.10GHZ | 64 GB | 1U |
|
|
|
|
|
|
* *certset2* servers have conflicting remote management certificates with the rest, use a dedicated browser profile (`firefox -P`)
|
|
|
|
|
|
General
|
|
|
-------
|
|
|
- login using your username (e.g. `tkrizek`) with your gitlab ssh key
|
|
|
- machines are managed with Ansible: [knot-resolver-ansible](https://gitlab.labs.nic.cz/knot/knot-resolver-ansible)
|
|
|
|
|
|
Automatic Events
|
|
|
----------------
|
|
|
- check current status in MOTD
|
|
|
- `autoupdate.timer` triggers a daily update/reboot at 2:30
|
|
|
- `autorespdiff.timer`
|
|
|
- creates and updates reference data for current master
|
|
|
- runs regularly on submit machine(s)
|
|
|
- keeps adding jobs with `-p 0` (default priority is `5`) and updates reference afterwards
|
|
|
- deletes reports older than 3 days from reference
|
|
|
|
|
|
Networking
|
|
|
----------
|
|
|
|
|
|
- dual-stack, IPv4 and IPv6 (https://wiki.nic.cz/labs/labs)
|
|
|
- no NAT, public IP
|
|
|
- IPv4/IPv6 firewall: none for HW machines running tests (VM has only port 22 open to public)
|
|
|
|
|
|
Using the HTCondor cluster
|
|
|
==========================
|
|
|
|
|
|
These machines are used to execute respdiff and VM packaging tests.
|
|
|
|
|
|
Executing respdiff
|
|
|
------------------
|
|
|
|
|
|
1. *automatic*
|
|
|
- is executed automatically for every commit in GitLab
|
|
|
- not executed:
|
|
|
- for `master` branch
|
|
|
- when no C/lua code changes
|
|
|
2. *manual, triggered from GitLab CI/CD schedule*
|
|
|
- can execute/evaluate multiple runs at once
|
|
|
- forces execution even if there are no code changes
|
|
|
- will not work for `master`, use `nightly` instead if needed
|
|
|
- **USAGE**:
|
|
|
1. select appropriate schedule from https://gitlab.labs.nic.cz/knot/knot-resolver/pipeline_schedules
|
|
|
2. change target branch
|
|
|
3. save
|
|
|
4. run schedule manually
|
|
|
3. *manual, directly from* **submit** *machine*
|
|
|
- `sudo -iu respdiff`
|
|
|
- basic example: `respdiff-job-submit $(respdiff-job-create 88e78c66)`
|
|
|
- `respdiff-job-create --help`
|
|
|
- `respdiff-job-submit --help`
|
|
|
- works with knot-resolver-security as well
|
|
|
- results in `/var/tmp/respdiff-jobs` (symlink: `/home/respdiff/jobs`)
|
|
|
- helper scripts (graphs, stats) in `/var/opt/respdiff/contrib/job_manager/`
|
|
|
|
|
|
Using machines for other testing/development
|
|
|
--------------------------------------------
|
|
|
- if possible, use a machine that isn't part of the MAIN cluster (role: `free`) - these can be used as is, instructions below don't apply
|
|
|
- any `machine``condor_exec`` machine can be temporarily removed from the MAIN cluster and used for other workloads
|
|
|
- machines in detached clusters can be used with condor turned on (when queue is empty and `autorespdiff.timer` is inactive)
|
|
|
- **HOWTO (temporarily turn off condor for a machine)**:
|
|
|
1. turn off condor and wait (~10m) until current job finishes: `remove-from-cluster`
|
|
|
2. (optional) if you need machine overnight, turn off autoupdate (reboots at 2:30) `systemctl stop autoupdate.timer`
|
|
|
3. run your workload
|
|
|
4. `systemctl reboot -i`
|
|
|
- **NOTE**: reboot will cause the machine to return to cluster (handled by `condor.service`) |
|
|
\ No newline at end of file |
|
|
Moved into internal docs |