Commit 0cf743a3 authored by David Flynn's avatar David Flynn
Browse files

tools: add scripts to generate config files from yaml master

To update the generated config files, run the following:

  $ cd cfg
  $ ../scripts/
parent 0a0366d0
# Warning
The following sub-directories containing configuration files are
automatically generated from the .yaml master files and should not
be manually edited:
Any manual changes that are not reflected in the .yaml master files
is likely to be lost or to cause re-integration issues.
# How to generate the per-data point config files
Run the `../scripts/` from within the cfg directory.
Format and processing of cfg/*.yaml by ``
All YAML config spec files are merged together (see merging rules) prior to
processing. This has an important side-effect that two configuration
categories with the same name will be merged together rather than being
evaluated seperately.
The YAML config spec contains two top-level structures:
`categories`, a set of configuration categories, each containing
a set of sequence names and common encoder and decoder options.
All sequences in a category will use the same common options.
`sequences`, a set of sequence names, each describing common values
used in the generation of the configuration files in all categories.
The main use is to specify properties of the source data, such as its
location file location, or processing options specific only to the
Configuration generation proceeds as follows:
- For each category, a set of sequences are determined
- For each sequence, a set of variants are determined
- For each variant, configuration files are generated and written as:
- `$prefix/$category/$sequence/$variant/encoder.cfg`
- `$prefix/$category/$sequence/$variant/decoder.cfg`
- `$prefix/$category/$sequence/$variant/pcerror.cfg`
To generate an encoder or decoder configuration, cfgOptions are gathered in
the following order:
- `encflags`/`decflags` from the global 'sequences'
for the current sequence name
- `encflags`/`decflags` for the current variant
from the current category
- `encflags`/`decflags` for the current variant
from the current category-sequence
To generate a pcerror configuration, options are the pcerrorflags from the
global sequences for the current sequence name.
Semantics of a yaml-cfg-file
The following description of the YAML config spec uses the following
- `.a=b` represents a map (associative array, dictionary, etc.,) with
the key `a` and value `b`. YAML in-line style: `{ a: b }`
- `[]` represents a list of values. YAML in-line style: `[ ... ]`
- `$x` represents a value (the value may also be a key in a map).
- `/` represents the top-level YAML document.
### Definition of $cfgOption
A $cfgOption represents one of the following structures to generate a
configuration option in the form `$key: $value`
- `.$key=$value` — General case
- `.$key=[].$variant=.$value` — Applies only to the given $variant
### Top-level definitions
- `/.categories=.$categoryName=`...
A configuration category
- `/.sequences=.$sequenceName=`...
A global sequence definition
- `/.sequence-base-dir=$value`
A global base directory that may be overriden by a sequences'
`.base-dir` and `.base-norm-dir` values.
### Inside `/.sequences=.$sequenceName=`...
- `.src=$value`
The source PLY filename for encoding
- `.src-dir=$value`
(optional) The directory name containing the .src file
- `.base-dir=$value`
(optional) A path to a directory containing .src-dir
- `.norm=$value`
(optional) The source PLY filename with normals data
- `.norm-dir=$value`
(optional) The directory containing the .norm file
- `.base-norm-dir=$value`
(optional) A path to a directory containint `.norm-dir`
- `.pcerrorflags=[].$cfgOption`
(optional) an ordered list of sequence-global options for pcerror.cfg
- `.encflags=[].$cfgOption`
(optional) an ordered list of sequence-global options for encoder.cfg
- `.decflags=[].$cfgOption`
(optional) an ordered list of sequence-global options for decoder.cfg
### Inside `/.categories=.$categoryName=`...
- `.encflags=[].$cfgOption`
(optional) an ordered list of category-specific options for encoder.cfg
- `.decflags=[].$cfgOption`
(optional) an ordered list of category-specific options for decoder.cfg
- `.sequences=...`
A set of sequences to generate configurations for in the context
of the current category
### Inside `/.categories=.$categoryName=.sequences=.$sequenceName=...`
- `.encflags=[].$cfgOption`
(optional) an ordered list of category-sequence-specific options
for encoder.cfg
- `.decflags=[].$cfgOption`
(optional) an ordered list of category-sequence-specific options
for decoder.cfg
## Merging rules
Multiple YAML config spec files are recursively merged as follows:
- src:* → dst:undef ⇒ assign src to dst
- src:scalar → dst:scalar ⇒ replaced
- src:hash → dst:hash ⇒ recursive merge of key-value pairs
- src:list → dst:scalar ⇒ assign [src, dst] to dst, removing unique values
- src:list → dst:list ⇒ assign [src, dst] to dst, removing unique values
\ No newline at end of file
use Digest::MD5;
use File::Path qw(make_path);
use Getopt::Long;
use List::MoreUtils;
use Pod::Usage;
use YAML;
use strict;
=head1 NAME - Generate experiment configuration from yaml specification
=head1 SYNOPSIS [options] [yaml-config-spec ...]
=head1 OPTIONS
=over 4
=item B<--prefix>=dir
Sets the output path for the generated configuration tree.
=item B<--output-src-glob-sh>
=item B<--no-output-src-glob-sh> (default)
Do not generate files describing source locations
=item B<--skip-sequences-without-src> (default)
=item B<--no-skip-sequences-without-src>
Do not generate configuration files for sequences that have an empty 'src'
field in the yaml specification. This option is permits a later yaml spec
to effectively remove a sequence from being used in an experiment.
It may be useful to disable this option when generating config files when
the source location of the input data is not known.
=head1 Config specification files
# Command line processing
my $do_help = '';
my $output_src_glob_sh = 0;
my $skip_sequences_without_src = 1;
my $prefix = '.';
'help' => \$do_help,
'prefix=s' => \$prefix,
'output-src-glob-sh!' => \$output_src_glob_sh,
'skip-sequences-without-src!' => \$skip_sequences_without_src,
# display help text and exit if asked, or if no config is provided
pod2usage(0) if $do_help;
pod2usage(1) unless @ARGV;
# load all yaml snippets and merge into a single description
my @origins = @ARGV;
my %cfg;
while (@ARGV) {
my $fname = shift @ARGV;
my $cfg = YAML::LoadFile($fname) or die "$fname: $!";
merge(\%cfg, $cfg);
# dump the merged configuration (allows reproduction)
YAML::DumpFile("$prefix/config-merged.yaml", \%cfg);
# generate encoder/decoder configuration files
# list of configured jobs
my @jobs;
# this just makes later code look simpler
my $cfg = \%cfg;
# iterate over each configuration and described sequences
foreach my $cat_name (sort keys %{$cfg->{categories}}) {
my $cat = $cfg->{categories}{$cat_name};
foreach my $seq_name (sort keys %{$cat->{sequences}}) {
my $cat_seq = $cat->{sequences}{$seq_name};
my $seq = $cfg->{sequences}{$seq_name};
unless (exists $seq->{gops}) {
genSeqVariants($cat, $cat_name, $cat_seq, $seq_name, $seq, $seq);
# split sequence into groups of pictures for parallel execution
my $gop_idx = 0;
foreach my $gop (@{$seq->{gops}}) {
my $gop_idx = sprintf "%03d", $gop_idx++;
my $gop_name = "${seq_name}_gop${gop_idx}";
genSeqVariants($cat, $cat_name, $cat_seq, $gop_name, $gop, $seq);
sub genSeqVariants {
my ($cat, $cat_name, $cat_seq, $seq_name, $gop, $seq) = @_;
# if sequence source isn't defined at top level, skip
if ($skip_sequences_without_src) {
next unless defined $gop->{src};
# generate the list of variants (if any)
my @variants = List::MoreUtils::uniq (
# $cat.sequences.$name.$variant:
(grep {
my $ref = $cat_seq->{$_};
ref $ref eq 'HASH'
and (exists $ref->{encflags} || exists $ref->{decflags})
} keys %$cat_seq),
# $cat.sequences.$name.encflags[].$param.$variant:
# $cat.encflags[].$param.$variant
# $seq.$name.encflags[].$param.$variant
# handle the case of no variants: single case with defaults
push @variants, undef unless @variants;
# for each variant, derive the encoder options
# NB: in the case of no variants, $var = undef
foreach my $var (sort @variants) {
my $cfgdir =
join '/', grep {defined} ($prefix,$cat_name,$seq_name,$var);
print "$cfgdir\n";
push @jobs, "$cfgdir/";
# input sequence file name
if ($gop->{src} && $output_src_glob_sh) {
my $src_seq = join '/', grep {defined} (
(List::MoreUtils::firstval {defined}
open my $fd, ">", "$cfgdir/";
print $fd "$src_seq\n";
if ($gop->{norm} && $output_src_glob_sh) {
my $norm_seq = join '/', grep {defined} (
(List::MoreUtils::firstval {defined}
open my $fd, ">", "$cfgdir/";
print $fd "$norm_seq\n";
# encoder configuration
my @encflags = (
params_from_node($cat->{encflags}, $var),
params_from_node($cat_seq->{encflags}, $var),
# evaluate any value expressions
eval_exprs(\@encflags, $cat_seq, $seq);
write_cfg("$cfgdir/encoder.cfg", \@encflags);
# decoder configuration
my @decflags = (
params_from_node($cat->{decflags}, $var),
params_from_node($cat_seq->{decflags}, $var),
# evaluate any value expressions
eval_exprs(\@decflags, $cat_seq, $seq);
write_cfg("$cfgdir/decoder.cfg", \@decflags);
# pcerror configuration
my @pcerrorflags = (
# evaluate any value expressions
eval_exprs(\@pcerrorflags, $cat_seq, $seq);
write_cfg("$cfgdir/pcerror.cfg", \@pcerrorflags) if (@pcerrorflags);
# utilities
# keywise merge $src into $dst, following the following merge rules:
# - * -> undef = copy
# - scalar -> scalar = replace
# - hash -> hash = recurse
# - list -> scalar = merge unique items (scalars only)
# - list -> list = merge unique items (scalars only)
sub merge {
my ($dst, $src) = @_;
unless (defined $dst) {
$$dst = $$src;
# overwrite existing scalar
unless (ref $src) {
$$dst = $$src;
if (ref $src eq 'HASH') {
foreach my $key (keys %$src) {
# copy sub-tree if key does not exist
unless (exists $$dst{$key}) {
$$dst{$key} = $$src{$key};
# recurse to merge sub-tree
if (ref $$dst{$key}) {
merge($$dst{$key}, $$src{$key});
else {
merge(\$$dst{$key}, $$src{$key});
# merge arrays
# -- this is really only for an array of scalars
if (ref $src eq 'ARRAY') {
my @vals;
push @vals, $dst if defined $dst and ref $dst eq '';
push @vals, @$dst if ref $dst eq 'ARRAY';
push @vals, @$src;
$$dst = [List::MoreUtils::uniq(@vals)];
sub variants_from_node {
my ($node) = @_ or return ();
map {keys %$_}
grep {ref $_ eq 'HASH'}
map {values %$_}
grep {ref $_ eq 'HASH'}
sub params_from_node {
my ($node, $variant) = @_;
return () unless $node;
my @params;
my @todo = @$node;
while (my $item = shift @todo) {
# an unformatted string (not key:value)
unless (ref $item) {
push @params, [$item];
if (ref $item eq 'HASH') {
while (my ($key, $value) = each %$item) {
unless (ref $value) {
# key:value without variants
push @params, [$key, $value];
if (ref $value eq 'HASH') {
# key:value with variants
push @params, [$key, $value->{$variant}]
if exists $value->{$variant};
warn "unhandled node for $value";
if (ref $item eq 'ARRAY') {
unshift @todo, @$item;
return @params;
# Expand in-place all variables and eval statements in the list @$params,
# searching for substitutions as members of @context
sub eval_exprs {
my ($params, @context) = @_;
map {
$_->[1] = eval_expr($_->[1], \@context) if exists $_->[1];
} @$params;
# Return the exansion of $str given the context of the maps in @$contexts.
# Any variable subsitutions found in $str are searched in order
# of @$contexts.
sub eval_expr {
my ($str, $contexts) = @_;
# first find all variables and substitute their values
while ($str =~ m/\$\{([^}]+)\}/gc) {
my $var = $1;
my $var_start = $-[0];
my $var_len = $+[0] - $-[0];
foreach my $ctx (@$contexts) {
next unless exists $ctx->{$var};
substr $str, $var_start, $var_len, $ctx->{$var};
pos $str = $var_start + length($ctx->{$var});
# finally evaluate any eval expressions
pos $str = 0;
while ($str =~ m/\$eval\{([^}]+)\}/gc) {
my $expr = $1;
my $expr_start = $-[0];
my $expr_len = $+[0] - $-[0];
my $val = eval "$expr";
substr $str, $expr_start, $expr_len, $val;
pos $str = $expr_start + length($val);
return $str;
# Print configuration @$opts, to $fd; with one entry per line and where
# each entry in @$opts is either a [key, value] pair to be joined with
# ": ", or just [key].
sub print_cfg {
my ($fd, $opts) = @_;
print $fd "# This file was automatically generated from:\n";
print $fd "# $_\n" foreach (@origins);
local $\ = "\n";
foreach my $opt (@$opts) {
print $fd join(": ", @$opt);
# print config to file iff it differs from file's contents.
# (ie, don't touch mtime if unchanged)
sub write_cfg {
my ($filename, $flags) = @_;
# format config in memory
my $new_cfg = "";
open my $fd, ">", \$new_cfg;
print_cfg($fd, $flags);
close $fd;
# hash it
my $md5_new = Digest::MD5->new;
my $md5_old = Digest::MD5->new;
if (-f $filename) {
open $fd, "<", $filename;
close $fd;
if ($md5_new->digest ne $md5_old->digest) {
print "writing $filename\n";
open $fd, ">", $filename;
print $fd $new_cfg;
close $fd;
# Generate a configuration tree in $PWD from YAML files in the same
# directory.
set -e
script_dir=$(dirname $0)
for f in ctc-*.yaml
$script_dir/ --no-skip-sequences-without-src $f
rm config-merged.yaml
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment