Anchor Families

The Family class provides the fundamental visual block of pyflow. Families provide two distinct roles within suites:

  1. Visually grouping related families/tasks

  2. Logically grouping related families/tasks from an execution perspective

Due to constraints imposed by the order in which ecFlow searches for scripts within the configured files location, by default all tasks with the same name must share the same script located in the files directory (if scripts are deployed by pyflow, they will be deployed to this directory). This means that tasks with the same name must either be avoided, or written to have identical scripts, and is a significant constraint on encapsulation in object-oriented suite design.

For simple agregation of tasks, it is encouraged to use pf.Family or derive from it. This provides minimal encapsulation of tasks, but not of scripts. All tasks with the same name will share the same script. We build such library of classes and objects so we can re-use these components (Tasks, Families, Suites) in different contexts. A given task class could be used in a research workflow and then reused in another operational workflow.

However different contexts may require some differences in the suite execution. To ensure that we still have a concise, maintainable and easily checkable suite, we need to cater for those differences preferably in a single entity (as opposed to spreadout through the suite).

To that aim, we introduce the use of a configuration object that will handle the differences, and therefore interact and configure our objects under each different context.

This results in suites that are configurable for different use-cases and different contexts and build fundamentally different generated suites from the same components

A configuration object can be constructed manually for different use cases or as a result of parsing configuration files. It can be used to:

  • Provide constants and data for specific cases, that will be needed in the suites

  • Switch functionality on/off or modify it

  • Configuration for hosts where to run the tasks

  • Locations of and details of data to process

But most importantly, as objects, these configuration objects can be programmable in themselves (can include code). The suite components can delegate part of the suite definition to these configurators and as such the structure of the suite can be determined by logic in the configuration object if necessary.

[2]:
with CourseSuite('family_example') as s:
    with pf.Family('simple', labels={'example': ''}) as f:
        LabelSetter((f.example, 'example text'))
    MyFamily('derived_family', 5)

s
[2]:
suite family_example
  defstatus suspended
  edit ECF_FILES '/path/to/scratch/files/family_example'
  edit ECF_HOME '/path/to/scratch/out'
  edit ECF_JOB_CMD 'bash -c 'export ECF_PORT=%ECF_PORT%; export ECF_HOST=%ECF_HOST%; export ECF_NAME=%ECF_NAME%; export ECF_PASS=%ECF_PASS%; export ECF_TRYNO=%ECF_TRYNO%; export PATH=/usr/local/apps/ecflow/%ECF_VERSION%/bin:$PATH; ecflow_client --init="$$" && %ECF_JOB% && ecflow_client --complete || ecflow_client --abort ' 1> %ECF_JOBOUT% 2>&1 &'
  edit ECF_KILL_CMD 'pkill -15 -P %ECF_RID%'
  edit ECF_STATUS_CMD 'true'
  edit ECF_OUT '%ECF_HOME%'
  label exec_host "localhost"
  family simple
    label example ""
    task set_labels
  endfamily
  family derived_family
    label total_counters "5"
    task derived_family_0
      edit HALF '0'
      edit LIMIT '0'
      label counter_label "count to 0"
    task derived_family_1
      trigger derived_family_0 eq complete
      edit HALF '1'
      edit LIMIT '2'
      label counter_label "count to 2"
    task derived_family_2
      trigger derived_family_1 eq complete
      edit HALF '2'
      edit LIMIT '4'
      label counter_label "count to 4"
    task derived_family_3
      trigger derived_family_2 eq complete
      edit HALF '3'
      edit LIMIT '6'
      label counter_label "count to 6"
    task derived_family_4
      trigger derived_family_3 eq complete
      edit HALF '4'
      edit LIMIT '8'
      label counter_label "count to 8"
  endfamily
endsuite

For more complex functionality containing groups of tasks that require encapsulation we encourage the use of AnchorFamily.

The AnchorFamily class updates the files location according to the relative path of the family from the suite (or previous AnchorFamily). Within an AnchorFamily, all script lookups are relative to this new location, providing isolation and encapsulation.

All tasks with the same name within an ``AnchorFamily`` must share the same script located in the files location for that ``AnchorFamily``.

As such it is encouraged to:

  • Use AnchorFamily to encapsulate independent units within a suite. Typically these are the subtrees that make sense to deploy as a whole.

  • Use Family to aggregate tasks that could share scripts with each other. This can be within an AnchorFamily.

The following example shows a suite with identical task names using different scripts, by scoping them with the AnchorFamily.

[3]:
with CourseSuite('anchor_families', files=filesdir) as s:
    with pf.Family('f1'):
        pf.Task('test1')        # Script <files>/test1.ecf
    with pf.Family('f2'):
        pf.Task('test1')        # Script <files>/test1.ecf
    with pf.AnchorFamily('f'):
        with pf.Family('f1'):
            pf.Task('test1')    # Script <files>/f/test1.ecf
            pf.Task('test2')    # Script <files>/f/test2.ecf
        with pf.Family('f2'):
            pf.Task('test2')    # Script <files>/f/test2.ecf

s
[3]:
suite anchor_families
  defstatus suspended
  edit ECF_FILES '/path/to/scratch/files'
  edit ECF_HOME '/path/to/scratch/out'
  edit ECF_JOB_CMD 'bash -c 'export ECF_PORT=%ECF_PORT%; export ECF_HOST=%ECF_HOST%; export ECF_NAME=%ECF_NAME%; export ECF_PASS=%ECF_PASS%; export ECF_TRYNO=%ECF_TRYNO%; export PATH=/usr/local/apps/ecflow/%ECF_VERSION%/bin:$PATH; ecflow_client --init="$$" && %ECF_JOB% && ecflow_client --complete || ecflow_client --abort ' 1> %ECF_JOBOUT% 2>&1 &'
  edit ECF_KILL_CMD 'pkill -15 -P %ECF_RID%'
  edit ECF_STATUS_CMD 'true'
  edit ECF_OUT '%ECF_HOME%'
  label exec_host "localhost"
  family f1
    task test1
  endfamily
  family f2
    task test1
  endfamily
  family f
    edit ECF_FILES '/path/to/scratch/files/f'
    edit ECF_INCLUDE '/path/to/scratch/files/f'
    family f1
      task test1
      task test2
    endfamily
    family f2
      task test2
    endfamily
  endfamily
endsuite

This supports 2 ways of attaching scripts to identical Tasks with different parameters:

  • Generate one script per task containing the parameters

  • Use one script that is parameterised by the Variables on the Families and Tasks