reduce-questions-by-nsset: filter questions with the same name server sets
Description
Filter questions by selectively omitting those that are served by the same name server sets.
This is the fifth tool in the respdiff query filter toolchain. It's purpose is to make it possible filter/reduce the entire set of all questions in the dataset based on the frequency/uniqueness of their name server set.
Example
respdiff-rs [--lmdb <ENVDIR>] reduce-questions-by-nsset [--factor FACTOR] [--min-questions MIN_QUESTIONS]
Input
questions
LMDB - Read all questions
- Format:
(qname,qtype,qclass) = None
selected_questions
LMDB - Read selected questions
- Format:
(qname,qtype,qclass) = None
nssets
LMDB - Read value
nsset
for eachdomain
- Format:
(qname,qtype,qclass) = nsset
nsset_frequencies
LMDB - Read value
count
for eachnsset
- Format:
nsset = count
Output
selected_questions
LMDB - Write key
domain
for every selected domain - Format:
(qname,qtype,qclass) = None
Operation
- questions = LMDB
selected_questions
if exist else LMDBquestions
- for each qname in questions
- get value
nsset
for keyqname
in LMDBnssets
- get value
count
for keynsset
in LMDBnsset_counts
- [detect if domain is already included - for different QTYPEs?]
- decide whether to include domain based on tool parameters
- if domain should be included
- add question to questions
- get value