The processing pipelines used by ASTRON are described below
Processing of the raw uv data, which consists of calibration and imaging steps, is handled offline via a series of automated pipelines (as detailed here).
The first standard data processing step performed by ASTRON is described below and is called the Pre-processing Pipeline:
Pre-Processing Pipeline: Flags the data in time and frequency, and optionally averages them in time, frequency, or both (the software that performs this step is labeled DPPP - Default Pre-Processing Pipeline). This stage of the processing also includes, if requested, a subtraction of the contributions of the brightest sources in the sky (the so called "A-team": Cygnus A, Cassiopeia A, Virgo A, etc...) from the visibilities through the 'demixing' algorithm (B. van der Tol, PhD thesis). Currently, users should specify if demixing is to be used, and which sources should be demixed. Visibility averaging should be chosen to a level that reduces the data volume to a manageable level, while minimizing the effects of time and bandwidth smearing. The averaging parameters, as well as the estimated storage capacity required in the LOFAR Long Term Archive, are also specified by the users through the North Star proposal submission tool.
Users willing to further process data products generated from the ASTRON pre-processing pipeline can use advanced calibration/imaging pipelines. Currently these pipelines are not offered in production and CEP3 processing time can be requested by users to perform advanced-calibration in a standalone fashion, as described in the LOFAR Imaging Cookbook. If you are willing to adopt this CEP3 offline option please make this clear in your proposal by answering the relevant question in the technical section of the proposal "Off-line data processing on ASTRON facilities (CEP3) requirement". Alternatively, proposers may download, install and run available advanced pipelines in their own computing facilities.
All final products will be stored at the LOFAR Long Term Archive where, in the future, significant computing facilities may become available for further re-processing. Users can retrieve datasets from the LTA for reduction and analysis on their own computing resources or through the use of suitable resources on the GRID. More information are available here.
From the moment the data are made available to the users at the LTA the users will have four weeks available to check the quality of their data and report problems to the Observatory. After this time window has passed, no requests for re-observation will be considered.
Direction-dependent calibration has shown to improve significantly the quality of the images. An improvement by a factor of 3-5 in the image noise has been obtained when applied to MSSS LBA observations (see B. Adebahr's report at: http://www.lofar.org/operations/lib/exe/fetch.php?media=msss:adebahr-wee...). In HBA observations, the thermal noise limited images have been obtained using SAGECal (see E. Orru' report at: http://www.lofar.org/operations/lib/exe/fetch.php?media=public:lsm_new:2012_09_12_orru_lsm.pdf), factor (https://github.com/lofar-astron/factor) and Kill-MS+DDFacet (https://github.com/saopicc/killMS).
Clock/TEC separation: Initial tests demonstrated that frequency dependent effect due to the clock delays between stations and delays due to TEC differences can in principle be addressed in order to separate instrumental effects from ionospheric direction-dependent effects (see M. Mevius report at: http://www.lofar.org/operations/lib/exe/fetch.php?media=commissioning:maaijke_report.pdf).
Ionospheric correction is crucial in order to reach the thermal noise and high quality images at LOFAR wavelengths. Two main approaches have been attempted and are currently under investigation in order to address the issue. One involves solving for many directions during the calibration phase (DDC) on short time scales so that the ionospheric effects are absorbed in the calibration solutions (see e.g. Yatawatta et al. 2013, A&A, 550A, 136Y). The other involves fitting a phase screen to the directional TEC phase solutions and applying this during the imaging stage (method under commission inspired on SPAM algorithm (see Intema et al. 2009, A&A, 501, 1185I).
The CITT, produced two semi-automatic pipelines called prefactor and factor, implementing the concept of the facet calibration described in van Weeren et al. 2016. Both pipeline run as option of the generic pipeline which is available in the LOFAR software. The prefactor pipeline is available. A stable version and some documentation is available at https://github.com/lofar-astron/prefactor . The factor pipeline is available at https://github.com/lofar-astron/factor and the documentation can be found in the cookbook.
Expert users have adopted standard LOFAR software to produce images, exploring various analysis strategies, in some cases involving also self-calibration and direction dependent calibration. Their results are reported in Table 1 for both HBA and LBA observations.
1. Cycles and commissioning fields - HBA & LBA
|Commisioner(s)||de Gasperin||Yatawatta||van Weeren||Retana Montenegro|
|Band||LBA 30-90 MHz||HBA 110-190 MHz||HBA 110-190 MHz||HBA 180-220 MHz|
|Total observing time (hrs)||6||260||8||8|
|Resolution $(arcsec)||23x15||6 x 6||6 x 6||28 x 18|
|Imaged FOV (deg)||9||12||7||3.3|
|Final RMS Noise (mJy/beam)||8||0.03||0.093||5.0|
|Equivalent noise over
2 MHz bandwidth and
6 hours (mJy/beam)
|Noise /thermal noise ratio||4.4||1.2||2||10|
|Calibration strategy||Transfer of amplitude solutions, FR calibration, clock/TEC separation, self-calibration. Use of Pill LBA pipeline||BBS intial
calibration with 2000 sources,
with 20000 sources, excon imaging
|Facet calibration extreme peeling. Similar results obstined using factor||LOFAR pipeline (prefactor) for continuum and custom scripts for line study|
$ The resolution depends on the stations used for the imaging
Table 1: Examples of sensitivities reached in Cycles and commissioning observations in HBA and LBA
In the LBA band, expert users have demonstrated that, using a simple analysis strategy that does not employ position dependent calibration or self-calibration, total intensity sensitivities of ~10 times the theoretical thermal noise in relatively long observations (6-10 hours) can be reached. Using more involved calibration techniques (as self-calibration or direction dependent calibration), sensitivities of 4-5 times the theoretical thermal noise have already been achieved.
For the HBA, direction independent calibration (i.e. prefactor) has proved to reach sensitivities of the order of the order of 5 times the theoretical thermal noise in images at a resolution of 20"-30". On the other hand, a noise of 1-2 times the thermal noise, a resolution of about 5"-10" and high fidelty images can be achieved using direction dependent calibration techniques (e.g. factor, Sagecal or Kill-MS+DDFacet).
In order to generate high sensitivity images advanced calibration recipes need to be applied to the pre-processed data released by ASTRON. These are being captured in the development of the RAPTHOR pipeline, which we expect to offer to the community in 2022. The computing time and resources required to process LOFAR data through the preprocessing pipeline can be found in the
The computational requirements of the imaging mode can be substantial and depend both on observing parameters and image characteristics.
In the following, we present practical estimates for the Processing over Observing time ratio (P/O ratio) separately for the pre-processing and the imaging steps. Note that when considering the computational requirements for the observing proposals, users should account for BOTH of these factors.
Each of the software elements in the pre-processing pipelines has a varied and complex dependence on both the observation parameters and the cluster performance, and hence a scaling relation is difficult to determine.
To have realistic estimates of pipeline processing times, typical LBA and HBA observations with durations longer than 2 hours and adopting the full Dutch array were selected from the LOFAR pipeline archive and were statistically analyzed. The results are summarized in the following table:
Nr Demixed Sources
Table 4: Pre-processing performance for >2h observations with different observation parameters and settings for demix for HBA and LBA. Although the case of 3 demixed sources has not been characterized, a large increase of the P/O ratio for both LBA and HBA is expected. Note that for setups with no CEP4 statistics, we reported the P/O values for the old CEP2 cluster: thus these values must be considered upper limits for CEP4.
These guidelines have been implemented in NorthStar, such that pipeline durations are automatically computed for the user.
In the last couple of years the calibration and imaging software has been expanded considerably allowing users to reach noise levels few factors from the thermal noise. An advanced, direction-independent calibration pipeline (pre-FACTOR) has been developed and documentation is available in the LOFAR Imaging Cookbook and here).
Users can request processing time on CEP3 to perform offline calibration and imaging using the most recent tested version of pre-factor or FACTOR pipelines if they do not have the requisite resources available themselves. If you are interested in this offline option, please make this clear in your proposal by answering the relevant question in the technical section of the proposal "Off-line data processing on RO facilities (CEP3) requirement". This is further discussed on the Upcoming Cycle page. Alternatively, proposers may describe how they plan offline processing to achieve the required image quality on their own compute resources.
Based on users experience one CEP3 node a typical observation of 243 sub bands grouped in blocks of 10 sub bands will need a P/O ~ 80 to be fully processed. Consequently a typical 8-hour observation will require 640 hours to be processed on one node, which is within the amount of time of a default CEP3 reservation block.
The noise level of the images obtained by using the pre-FACTOR calibration pipeline can reach 4 times the thermal noise (calculated using the noise calculator tool). These values are based on a limited set of cases and on a fraction of the total frequency band. We advise the user to take this number into account as an indication of the best possible result achievable with this pipeline. More detailed information could be found here.
The Lofar LTA software stack is the collection of software that is needed to run the Lofar imaging pipeline. That includes all needed libraries with a specific version. An overview of the LOFAR Software Stack, together with a discussion of various aspects of the software stack, are discussed at this Wiki page. Currenly a docker image including the latest LOFAR software can be found in the LOFAR Imaging Cookbook.
Installing the LOFAR Software at external computing facilities
This page will redirect you to build instructions for the LOFAR common software packages.
LOFAR Beam formed / pulsar tools
A GitHub repository of scripts to use for analysis of beam formed / pulsar data.
LOFAR Imaging Software (external packages)
A collection of links of documented data reduction tools developed and maintained by external experienced users.
LOFAR (Generic) Pipeline Framework
Documentation about how to define a pipeline for the execution with the Generic Pipeline Framework is available here.
LOFAR Imaging Pipelines
Several imaging pipelines are available for processing LOFAR LBA and HBA data.
A GitHub repository of 3rd party contributions for LOFAR data processing.
As of 10 September 2018, all LOFAR HBA data products ingested to the Long Term Archive (LTA) will be compressed using Dysco. This decision was made after evaluating the effect of visibility compression on LOFAR measurement sets (see below for more information).
To process the Dysco compressed data, you will need to run the LOFAR software version 3.1 or later (built with the dysco library). The Dysco compression specifications (using 10 bits per float to compress visibility data) and the tests carried out as part of the commissioning effort are valid for any HBA imaging observation with a frequency resolution of at least four channels per subband and a time resolution of 1 second. Note that using 10 bits is a conservative choice and the compression noise should be negligible.
Modern radio interferometers like LOFAR contain a large number of baselines and record visibility data at a high time and frequency resolution resulting in significant data volumes. A typical 8-hour observing run for the LOFAR Two-metre Sky Survey (LoTSS) produces about 30~TB of preprocessed data. It is important to manage the data growth in the LTA, especially in view of the increasing observing efficiencies. One way to achieve this is to compress the recorded visibility data. Recently, Offringa (2016) proposed a new technique called Dysco to compress interferometric visibility data. The new compression technique is fast, the noise added by data compression is small (within a few per cent of the system noise in the image plane) and has the same characteristics as the normal system noise (for specific information on the compression technique, see Offringa (2016) and the casacore storage manager available here).
Before integrating the Dysco compression technique in the production pipelines, SDC operations carried out a commissioning effort to characterise how compressing visibility data using Dysco affects the calibration solutions and the images produced.
To validate Dysco compression on LOFAR HBA data, we carried out a test observation using the standard LoTSS setup (2x244 subbands, 16 ch/sb, 1s time resolution). The raw visibilities were preprocessed (RFI flagging and averaging) using three different preprocessing pipelines: (i) standard production pipeline without any compression, (ii) enable dysco compression on visibility data, and (iii) enable dysco compression on both visibility data and visibility weights. The data products produced by the three pipeline runs were processed using the direction-independent Prefactor and direction-dependent Factor pipelines.
Comparing the gain solutions and the images produced by the prefactor and the factor runs show that compressing visibility data and visibility weights have little impact on the final output data products. The key results from this exercise can be summarized as follows:
We used an 8-hour scan on 3C 196 to validate applying dysco compression on LBA data. The observed data were preprocessed by the radio observatory with two different pipelines (i) with dysco visibility compression enables,and (ii) without dysco compression. Further processing was carried out by Francesco de Gasperin using the standard LBA calibrator pipeline. Comparing the intermediate data products produced by the pipeline, we find that dysco compression has no significant impact on the data products produced by the calibration pipeline. The key results from this exercise are listed below:
Since 10 September 2018, the radio observatory has been recording all HBA imaging observations in Dysco-compressed measurement sets. A new column has been introduced in the LTA to identify if a given data product has been compressed with Dysco. When you browse through your project on the LTA, on the page displaying the correlated data products, the new column Storage Writer identifies if your data has been compressed with Dysco. For example, Fig 4 shows the list of correlated data products for an averaging pipeline. The column Storage Writer specifies that the preprocessed data products have all been stored using the DyscoStorageManager implying that the data has been compressed with Dysco.
To process these data you will need to run the LOFAR software version 3.1 or later (built with the dysco library) so that DPPP can automatically recognise the way the visibilities have been recorded. Note that compressing already dysco-compressed visibility data will add noise to your data and hence should be avoided.
For further questions/comments, please contact SDC Operations using our JIRA helpdesk.