Processing Tools

Processing Tools

    Processing and Characterisation

    The processing pipelines used by ASTRON are described below

    Pre-processing Pipeline

    Processing of the raw uv data, which consists of calibration and imaging steps, is handled offline via a series of automated pipelines (as detailed here).

    The first standard  data processing step performed by ASTRON is described below and is called the Pre-processing Pipeline:

    Pre-Processing Pipeline: Flags the data in time and frequency, and optionally averages them in time, frequency, or both (the software that performs this step is labeled DPPP - Default Pre-Processing Pipeline). This stage of the processing also includes, if requested, a subtraction of the contributions of the brightest sources in the sky (the so called "A-team": Cygnus A, Cassiopeia A, Virgo A, etc...) from the visibilities through the 'demixing' algorithm (B. van der Tol, PhD thesis).  Currently, users should specify if demixing is to be used, and which sources should be demixed. Visibility averaging should be chosen to a level that reduces the data volume to a manageable level, while minimizing the effects of time and bandwidth smearing. The averaging parameters, as well as the estimated storage capacity required in the LOFAR Long Term Archive, are also specified by the users through the North Star proposal submission tool.

    Users willing to further process data products generated from the ASTRON pre-processing pipeline can use advanced calibration/imaging pipelines. Currently these pipelines are not offered in production and CEP3 processing time can be requested by users to perform advanced-calibration in a standalone fashion, as described in the LOFAR Imaging Cookbook.  If you are willing to adopt this CEP3 offline option please make this clear in your proposal by answering the relevant question in the technical section of the proposal "Off-line data processing on ASTRON facilities (CEP3) requirement". Alternatively, proposers may download, install and run available advanced pipelines in their own computing facilities.

    Access to Data Products

    All final products will be stored at the LOFAR Long Term Archive where, in the future, significant computing facilities may become available for further re-processing. Users can retrieve datasets from the LTA for reduction and analysis on their own computing resources or through the use of suitable resources on the GRID. More information are available here.

    From the moment the data are made available to the users at the LTA the users will have four weeks available to check the quality of their data and report problems to the Observatory. After this time window has passed, no requests for re-observation will be considered.

    Advanced post processing calibration

    Advanced calibration strategies

    Direction-dependent calibration has shown to improve significantly the quality of the images. An improvement by a factor of 3-5 in the image noise has been obtained when applied to MSSS LBA observations (see B. Adebahr's report at: http://www.lofar.org/operations/lib/exe/fetch.php?media=msss:adebahr-wee...). In HBA observations, the thermal noise limited images have been obtained using SAGECal (see E. Orru' report at: http://www.lofar.org/operations/lib/exe/fetch.php?media=public:lsm_new:2012_09_12_orru_lsm.pdf), factor (https://github.com/lofar-astron/factor) and Kill-MS+DDFacet (https://github.com/saopicc/killMS).

     

    Clock/TEC separation: Initial tests demonstrated that frequency dependent effect due to the clock delays between stations and delays due to TEC differences can in principle be addressed in order to separate instrumental effects from ionospheric direction-dependent effects (see M. Mevius report at: http://www.lofar.org/operations/lib/exe/fetch.php?media=commissioning:maaijke_report.pdf).

    Ionospheric correction is crucial in order to reach the thermal noise and high quality images at LOFAR wavelengths. Two main approaches have been attempted and are currently under investigation in order to address the issue. One involves solving for many directions during the calibration phase (DDC) on short time scales so that the ionospheric effects are absorbed in the calibration solutions (see  e.g. Yatawatta et al. 2013, A&A, 550A, 136Y). The other involves fitting a phase screen to the directional TEC phase solutions and applying this during the imaging stage (method under commission inspired on SPAM algorithm (see Intema et al. 2009, A&A, 501, 1185I).

    The CITT, produced two semi-automatic pipelines called prefactor and factor, implementing the concept of the facet calibration described in van Weeren et al. 2016. Both pipeline run as option of the generic pipeline which is available in the LOFAR software. The prefactor pipeline is available. A stable version and some documentation is available at https://github.com/lofar-astron/prefactor . The factor pipeline is available at https://github.com/lofar-astron/factor and the documentation can be found in the cookbook. 

    Advanced Offline Post Processing Reduction Strategies 

    Expert users have adopted standard LOFAR software to produce images, exploring various analysis strategies, in some cases involving also self-calibration and direction dependent calibration. Their results are reported in Table 1 for both HBA and LBA observations.

     

    1. Cycles and commissioning fields - HBA & LBA

    Commisioner(s) de Gasperin Yatawatta van Weeren  Retana Montenegro
    Band  LBA 30-90 MHz HBA 110-190 MHz HBA 110-190 MHz  HBA 180-220 MHz
    Target Field  RXSJ0603 NCP Toothbrush  J1427385+331241
    Total observing time (hrs)  6 260 8  8
    Bandwidth  6 46 60  0.2
    Resolution $(arcsec)  23x15 6 x 6 6 x 6  28 x 18
    Imaged FOV (deg) 9 12 7  3.3
    Final RMS Noise (mJy/beam)  8 0.03 0.093  5.0
    Equivalent noise over
    2 MHz bandwidth and
    6 hours (mJy/beam)
     14 0.8 0.8
    Noise /thermal noise ratio 4.4 1.2 2  10
     Calibration strategy Transfer of amplitude solutions, FR calibration, clock/TEC separation, self-calibration. Use of Pill LBA pipeline BBS intial
    calibration with 2000 sources,
    robust SAGECAL
    with 20000 sources, excon imaging
    Facet calibration extreme peeling. Similar results obstined using factor LOFAR pipeline (prefactor) for continuum and custom scripts for line study

     

    $ The resolution depends on the stations used for the imaging

     

    Table 1: Examples of sensitivities reached in Cycles and commissioning observations in HBA and LBA

    In the LBA band, expert users have demonstrated that, using a simple analysis strategy that does not employ position dependent calibration or self-calibration, total intensity sensitivities of ~10 times the theoretical thermal noise in relatively long observations (6-10 hours) can be reached. Using more involved calibration techniques (as self-calibration or direction dependent calibration), sensitivities of 4-5 times the theoretical thermal noise have already been achieved.

    For the HBA, direction independent calibration (i.e. prefactor) has proved to reach sensitivities of the order of the order of 5 times the theoretical thermal noise in images at a resolution of 20"-30". On the other hand, a noise of 1-2 times the thermal noise, a resolution of about 5"-10" and high fidelty images  can be achieved using direction dependent calibration techniques (e.g. factor, Sagecal or Kill-MS+DDFacet).

    In order to generate high sensitivity images advanced calibration recipes need to be applied to the pre-processed data released by ASTRON. These are being captured in the development of  the RAPTHOR pipeline, which we expect to offer to the community in 2022. The computing time and resources required to process LOFAR data through the preprocessing pipeline can be found in the

    Computational requirements

    Computational Requirements for the Pre-processing pipeline

    The computational requirements of the imaging mode can be substantial and depend both on observing parameters and image characteristics.

    In the following, we present practical estimates for the Processing over Observing time ratio (P/O ratio) separately for the pre-processing and the imaging steps. Note that when considering the computational requirements for the observing proposals, users should account for BOTH of these factors.

    Pre-processing Time

    Each of the software elements in the pre-processing pipelines has a varied and complex dependence on both the observation parameters and the cluster performance, and hence a scaling relation is difficult to determine.

    To have realistic estimates of pipeline processing times, typical LBA and HBA observations with durations longer than 2 hours and adopting the full Dutch array were selected from the LOFAR pipeline archive and were statistically analyzed. The results are summarized in the following table:

    Type 

     Nr CHAN

    Nr Demixed Sources

    Nr SB  P/O
    LBA  64 0 244 0.25

    LBA

     64

    2

    244 0.51
    LBA  64 0 80 0.2 [CEP2]
    LBA  64 1 80 0.3 [CEP2]
    LBA  64 2 80 1.0 [CEP2]
    LBA  256 2 244  0.72

    HBA

     64

    0

    244 0.81
    HBA  64 2 244  3.0
    HBA  64 0 122 0.9 [CEP2]
    HBA  64 1 122 1.0 [CEP2]
    HBA  64 0 366 1.4 [CEP2]

    HBA

     64

    1

    392 2.0
    HBA  64 0 380 1.5 [CEP2]

    HBA

     64

    0

    480 1.4 [CEP2]
    HBA  256 2 244 4.0

    Table 4: Pre-processing performance for >2h observations with different observation parameters and settings for demix for HBA and LBA. Although the case of 3 demixed sources has not been characterized, a large increase of the P/O ratio for both LBA and HBA is expected. Note that for setups with no CEP4 statistics, we reported the P/O values for the old CEP2 cluster: thus these values must be considered upper limits for CEP4.

    These guidelines have been implemented in NorthStar, such that pipeline durations are automatically computed for the user.

    Note that:

    • The case of 3 demixed sources is expected to drastically increase in terms of P/O ratio for both LBA and HBA and of claimed computing resources. To safeguard the overall operations of the LOFAR system, the Radio Observatory does not support 3 demixed sources on the CEP4 cluster.
    • The processing of data with resolution of 256 channels and demixed source(s) is granted based on a solid scientific justification.

     

    Current Offline Calibration Status and Performance

    In the last couple of years the calibration and imaging software has been expanded considerably allowing users to reach noise levels few factors from the thermal noise.  An advanced, direction-independent calibration pipeline (pre-FACTOR) has been developed and documentation is available in the LOFAR Imaging Cookbook and here).

    Users can request processing time on CEP3 to perform offline calibration and imaging using the most recent tested version of pre-factor or FACTOR pipelines if they do not have the requisite resources available themselves. If you are interested in this offline option, please make this clear in your proposal by answering the relevant question in the technical section of the proposal "Off-line data processing on RO facilities (CEP3) requirement". This is further discussed on the Upcoming Cycle page. Alternatively, proposers may describe how they plan offline processing to achieve the required image quality on their own compute resources.

    Based on users experience one CEP3 node a typical observation of 243 sub bands grouped in blocks of 10 sub bands will need a P/O ~ 80 to be fully processed. Consequently a typical 8-hour observation will require 640 hours to be processed on one node, which is within the amount of time of a default CEP3 reservation block.

    The noise level of the images obtained by using the pre-FACTOR calibration pipeline can reach 4 times the thermal noise (calculated using the noise calculator tool). These values are based on a limited set of cases and on a fraction of the total frequency band. We advise the user to take this number into account as an indication of the best possible result achievable with this pipeline. More detailed information could be found here.

     

    Installing the LOFAR Software Stack 

    The Lofar LTA software stack is the collection of software that is needed to run the Lofar imaging pipeline. That includes all needed libraries with a specific version. An overview of the LOFAR Software Stack, together with a discussion of various aspects of the software stack, are discussed at this Wiki page. Currenly a docker image including the latest LOFAR software can be found in the LOFAR Imaging Cookbook.

    Software Processing Tools

    Installing the LOFAR Software at external computing facilities

    This page will redirect you to build instructions for the LOFAR common software packages.

     

    LOFAR Beam formed / pulsar tools

    A GitHub repository of scripts to use for analysis of beam formed / pulsar data.

     

    LOFAR Imaging Software (external packages)

    A collection of links of documented data reduction tools developed and maintained by external experienced users.

    • Dysco (a compressing storage manager for Casacore mearement sets)
    • LoSoTo (LOFAR solutions tool)
    • LSM Tool (LOFAR Local Sky Model Tool)
    • PyBDSF (Python Blob Detector and Source Finder)
    • RMextract (extract TEC, vTEC, Earthmagnetic field and Rotation Measures from GPS and WMM data for radio interferometry observations)
    • Sagecal (GPU/MIC accelerated radio interferometric calibration program)
    • WSClean (fast widefield interferometric imager)

     

    LOFAR (Generic) Pipeline Framework

    Documentation about how to define a pipeline for the execution with the Generic Pipeline Framework is available here.

     

    LOFAR Imaging Pipelines

    Several imaging pipelines are available for processing LOFAR LBA and HBA data.

    • Factor (Facet calibration for LOFAR)
    • Long Baseline Generic Pipeline (a generic pieline implementation of the LOFAR long baseline pipeline reduction) documentation can be found here.
    • prefactor (Pre facet calibration pipeline)
    • Pill (pipeline for LOFAR LBA imaging data reduction - under development)
    • DDFacet (direction dependent calibration software used for the LOTSS survey)

     

    LOFAR User Script Repository

    A GitHub repository of 3rd party contributions for LOFAR data processing.

    DYSCO

    Summary

    As of 10 September 2018, all LOFAR HBA data products ingested to the Long Term Archive (LTA) will be compressed using Dysco. This decision was made after evaluating the effect of visibility compression on LOFAR measurement sets (see below for more information).

    • Our tests indicate that compressing the LBA and the HBA measurement sets with dysco do not produce any visible differences in the calibrator solutions or the recovered source properties.

    To process the Dysco compressed data, you will need to run the LOFAR software version 3.1 or later (built with the dysco library). The Dysco compression specifications (using 10 bits per float to compress visibility data) and the tests carried out as part of the commissioning effort are valid for any HBA imaging observation with a frequency resolution of at least four channels per subband and a time resolution of 1 second. Note that using 10 bits is a conservative choice and the compression noise should be negligible.

     

    Need for dysco visibility compression

    Modern radio interferometers like LOFAR contain a large number of baselines and record visibility data at a high time and frequency resolution resulting in significant data volumes. A typical 8-hour observing run for the LOFAR Two-metre Sky Survey (LoTSS) produces about 30~TB of preprocessed data. It is important to manage the data growth in the LTA, especially in view of the increasing observing efficiencies. One way to achieve this is to compress the recorded visibility data. Recently, Offringa (2016) proposed a new technique called Dysco to compress interferometric visibility data. The new compression technique is fast, the noise added by data compression is small (within a few per cent of the system noise in the image plane) and has the same characteristics as the normal system noise (for specific information on the compression technique, see Offringa (2016)  and the casacore storage manager available here).

     

    Commissioning tests

    Before integrating the Dysco compression technique in the production pipelines, SDC operations carried out a commissioning effort to characterise how compressing visibility data using Dysco affects the calibration solutions and the images produced.

    Compressing HBA data

    To validate Dysco compression on LOFAR HBA data, we carried out a test observation using the standard LoTSS setup (2x244 subbands, 16 ch/sb, 1s time resolution). The raw visibilities were preprocessed (RFI flagging and averaging) using three different preprocessing pipelines: (i) standard production pipeline without any compression, (ii) enable dysco compression on visibility data, and (iii) enable dysco compression on both visibility data and visibility weights. The data products produced by the three pipeline runs were processed using the direction-independent Prefactor and direction-dependent Factor pipelines.

    Comparing the gain solutions and the images produced by the prefactor and the factor runs show that compressing visibility data and visibility weights have little impact on the final output data products. The key results from this exercise can be summarized as follows:

    • Compressing the measurement sets with dysco does not produce any visible differences in the calibrator gain amplitudes, clock and TEC solutions.
    • Gain solutions for a given facet derived as part of the Factor direction-dependent calibration scheme are similar for a dysco compressed and uncompressed datasets (See Fig. 1).
    • For one facet (containing the brightest facet calibrator), we found that the gain solutions for a few remote stations were different for the dysco compressed case (See Fig. 2). This is caused by the different clean-component models used during the facet selfcal step. However, since the image-domain comparisons are identical between different pipeline products, this is not a cause for concern.
    • The mean ratio of source fluxes (see Fig. 3) between the uncompressed and the dysco compressed datasets is 1.004 +- 0.06.
    • The mean positional offset in both right ascension and declination is less than 0.08 arcsec.
    • For a typical LoTSS observation, the disk space occupied by the compressed visibility data is about a factor of 3.6 smaller than uncompressed data.
    • Since compressing and uncompressing the visibility data is faster than the typical disk read/write times, Dysco compression does not increase the computational cost of the Radio Observatory production pipelines and the processing pipelines used by the users.

     

    Fig 1. Plot showing that Dysco compression has little influence on the amplitude solutions derived for a given facet as part of the Factor direction-dependent calibration pipeline. The three colours indicate the three different datasets used to derive the solutions: red indicates solutions derived from uncompressed data, black indicates solutions for a dataset where only the visibilities were compressed, and green points correspond to solutions from a dataset where both the visibility and the visibility weights were compressed.Fig 2. Plot showing gain solutions for the facet containing the brightest facet calibrator. The gain solutions for dysco compressed data is different for a few remote stations due to the difference in the clean-component model used in the facet selfcal step.

    Fig. 3. Flux ratio between catalogs of sources produced using the uncompressed and the dysco compressed datasets. The mean flux ratio is 1.004 +- 0.064.

    Compressing LBA data

    We used an 8-hour scan on 3C 196 to validate applying dysco compression on LBA data. The observed data were preprocessed by the radio observatory with two different pipelines (i) with dysco visibility compression enables,and (ii) without dysco compression.  Further processing was carried out by Francesco de Gasperin using the standard LBA calibrator pipeline. Comparing the intermediate data products produced by the pipeline, we find that dysco compression has no significant impact on the data products produced by the calibration pipeline. The key results from this exercise are listed below:

    • Based on visual inspection, the calibrator solutions are identical.
    • The mean ratio of source fluxes between the uncompressed and the dysco compressed datasets is 1.007.
    • The largest difference in the pixel values is at the 0.01 Jy/beam level close to the bright central source (3C 196)

     

    How do I know if my data have been compressed?

    Since 10 September 2018, the radio observatory has been recording all HBA imaging observations in Dysco-compressed measurement sets. A new column has been introduced in the LTA to identify if a given data product has been compressed with Dysco. When you browse through your project on the LTA, on the page displaying the correlated data products, the new column Storage Writer identifies if your data has been compressed with Dysco. For example, Fig 4 shows the list of correlated data products for an averaging pipeline. The column Storage Writer specifies that the preprocessed data products have all been stored using the DyscoStorageManager implying that the data has been compressed with Dysco.

    To process these data you will need to run the LOFAR software version 3.1 or later (built with the dysco library) so that DPPP can automatically recognise the way the visibilities have been recorded. Note that compressing already dysco-compressed visibility data will add noise to your data and hence should be avoided.

    For further questions/comments, please contact SDC Operations using our JIRA helpdesk.

    Fig 4. A new column called Storage Writer has been introduced in the LTA correlated data products view to indicate whether it has been compressed with Dysco. This figure shows a list of correlated data products for a given averaging pipeline and the Storage Writer column (containing the string DyscoStorageManager) indicates that these data products have been compressed with Dysco.
    SDC Helpdesk