This is a short manual on how to search for and retrieve data from the LOFAR Long Term Archive (LTA).
To access the LTA, go to: lta.lofar.eu
In case of problems, please refer to the Frequently Asked Questions section below.
All use of LOFAR data, whether public or otherwise, must adhere to the LOFAR data policy. Note that any publications resulting from use of LOFAR data must include one of the standard acknowledgement phrases given at the end of this policy document.
A LOFAR user account is not needed to browse or search for any public and proprietary data on the LTA. However, in order to be able to stage and subsequently download data from the LTA, you need a LOFAR user account with the right privileges. To set this up, you should follow the steps below:
When you login to the LTA for the first time after being granted access, you may need to reset your password. To reset your password, follow the procedure stated below:
This procedure is necessary to properly couple the permissions of your LOFAR account to your LTA account.
In the case of a lost password, please visit https://webportal.astron.nl/pwm/public/ForgottenPassword to request a new one.
The LTA catalogue can be searched directly without needing an account. Access to all projects and search queries will return results of the entire catalogue because metadata are public for all LTA content.
Staging and subsequent downloading of public data always requires an account with LTA user access privileges. Users that are a member of a submitted LOFAR proposal can use the associated account for accessing the LTA. If you do not have an account yet, you can register through ”Create account“. LTA access privileges will need to be granted by the Science Data Centre Operations (SDCO). Please, once you have created your account, submit a support request to the ASTRON SDC Helpdesk asking for LTA privileges and clearly state your username.
To stage and retrieve project-related data in the LTA which are proprietary, you need to have a LOFAR account that is enabled for the archive and coupled with the projects of interest. In case your account should be coupled with a project and currently is not, you can request the SDCO to add you to the list of co-authors of the project. When you send such a request, you must add the project's PI in cc. After SDCO adds you to the project, you might get an email asking you to set a new password in ASTRON Web Applications Password Self Service. Please note that this will set a new password not just for the LTA but for the other LOFAR services as well.
Please read the LOFAR Data Policy for more information about proprietary vs public data.
Data navigation in the archive
The LTA menu, as shown below, gives access to the main functionalities.
Users can log in to the LTA through the LOGIN button at the top right.
To start a search in the LTA, click on the SEARCH DATA button on the menu. This will, by default, show the basic search page, where users can select the data product type of interest and perform a cone search. An advanced search mode, with more parameters per data type, can also be selected by clicking on the drop menu on the left side.
A “project” can be selected by clicking on the BROWSE PROJECTS button on the menu. This is particularly useful for project related searches, as well as for checking either public projects or user's co-author membership. At this stage, several actions are allowed. A quick overview of how to search data is below, followed by how to stage and download the selected data.
The Basic Search module allows searching for data within a specified pointing (coordinates) and specifying whether to perform a search on observations and/or pipelines.
Several reference systems are available. In particular: to search for Solar datasets the Sun reference system should be selected in combination with the type of process of interest (e.g. Observation and/or Averaging Pipeline) before pressing the Search button. Note that when a project has been preselected, the search will be confined to only that project.
The Advanced Search modules allow you to specify coordinates and specific parameters of the observation or pipeline products that you are looking for.
To discover Solar datasets, users should select the Sun reference system, together with a specification of the parameters of interest before pressing the Search button.
A search on observations will return the setup of the telescope at the time of observing, but may not return any downloadable data. Typically, only pipeline products are archived and these can be directly searched for by selecting the Pipeline modules. If observations are selected for the query, it will be possible to find and select related pipelines on the results page.
This figure shows one of the different drop-down options for Advanced Search, which will return any observation for a known SAS ID.
Under the Project Search menu is a table showing projects that can be selected to restrict all next data searches to that project only. Use the “Search” button to select the project and go to the search page, use the "Show data" button to select the project and to show all data in it. Alternatively, click on the project name to view the project details. The first column shows if you are a member of the project or if the project is public.
There are some useful ways to find your data in the LTA that allow for easy selecting and navigation of your data. This section will elaborate a few more advanced options for browsing the LTA.
While using the Advanced Search tab, you can select a range of SAS IDs by using colons. For example if you want all observations and pipelines that have SAS ID in the range of 432000 to 432190, you can supply it as: 432000:432190 in the search box. Similarly you can insert a comma separated query for a (nonsuccessive) list of SAS IDs. Note that you can query only up to a 100 items at a time.
Once you have submitted a search query, you land on a page showing a table-like overview of the submitted object/SAS IDs. This table has several headers that change depending on the type of data that you searched for. For a more advanced way to inspect the returned query, you can change the columns displayed through the "edit columns" button. This will make a new window pop up with a range of options that you can select or deselect. In order to apply the changes, you must click the submit button at the very bottom of the checklist.
When you are looking at the results of your query, there are several different menus and levels you can inspect. For each searched SAS ID or object, you can dive into the blue-coloured entries. One of the most useful and important entries is the "data products" tab. Once you click on 'data products', it shows all the subbands or pointings that were part of this SAS ID and each can be individually selected for download. Similarly, if you searched for Observation Data through the Basic search tab that only has pipeline products on the LTA, you can navigate from the Observation information of that SAS ID to the downloadable pipeline products.
For advanced use cases, a service is available at https://lta-dbview.lofar.eu/ that gives the option to run your own queries on the database or build them using a tables view. General help on usage is accessible from the service page.
A potentially useful query is shown below, that gives you all files for a certain SAS ID.
SELECT fo.URI, dp."dataProductType", dp."dataProductIdentifier",
dp."processIdentifier"
FROM AWOPER."DataProduct+" dp,
AWOPER.FileObject fo,
AWOPER."Process+" pr
WHERE dp."processIdentifier" = pr."processIdentifier"
AND pr."observationId" = '123456'
AND fo.data_object = dp."object_id"
AND dp."isValid"> 0
In this '123456' should be replaced with the SAS ID of an Observation/Pipeline you are looking for. Even though this is a bit confusing, pipeline SAS IDs are stored under "observationId", just as for observations. To be able to run this query, you have to go to the link above, login as the right user, select the right project, and then put this query into the “Manual SQL”.
Example: You can also modify these queries. Available tables and fields can be inspected by following the "Tables" link at the top of the DBView page. For example if you want to also know the MD5 checksum, you can run:
SELECT fo.URI, fo.hash_md5, dp."dataProductType", dp."dataProductIdentifier",
dp."processIdentifier"
FROM AWOPER."DataProduct+" dp,
AWOPER.FileObject fo,
AWOPER."Process+" pr
WHERE dp."processIdentifier" = pr."processIdentifier"
AND pr."observationId" = '123456'
AND fo.data_object = dp."object_id"
AND dp."isValid"> 0
Depending on the search parameters, e.g., which data products were requested (observation, pipeline), lists of observations and/or pipelines will be returned (see observations example above). Note that not all observations and pipelines will have archived data that can be retrieved. Please verify availability of data first. If the data is available and the user has access/rights to download this data, then there are several options to select data for staging:
Note that observations often have no raw data in the archive, but the metadata is visible because subsequent pipelines have processed the raw data further which are then ingested into the LTA. To get to the pipelines related to observations, use “Show Pipelines”.
To see whether observations or pipelines have data products in the LTA, look for the “Number of Correlated/BeamFormed Data Products” column. These columns, as well as a few others, can also be used to navigate to the relevant data products.
Once you have a list of data products on your screen, the “Release Date” will tell you when the data are available for public download. If the data is public, or you are a member of the project, the “checkbox” column will be selectable and staging can proceed. You can also hover with your mouse over the checkbox and get more information, like the size, location and checksums.
Some data could have had problems somewhere in the automation and control part of the LOFAR software during observation or processing. Sometimes a few subbands might be affected, sometimes an entire observation. Science Data Centre Operations will check the data, (re)run things manually or fix things if needed and then archive the data. This does mean that the automation and control sometimes loses track of the files and the archiving process has no information beyond the Observation ID and filename itself. In such cases, a few subbands or an entire observation might end up under “Unspecified Process”. We do attempt to fix things at a later date, but that is not always feasible. If the files were archived, the data itself is usable. What is missing is the information that the LTA needs to properly label and query the data.
If an Observation is missing, or is missing subbands, please check if it ended up under Unspecified.
Once you have a list of data products, observations or pipelines, you can use the check boxes to select which files you want to download. The first check box can be used to select or deselect all files/observations on a page.
The LOFAR Archive stores data on magnetic tape. This means that it cannot be downloaded right away; it has to be copied from tape to disk first. This process is called 'staging'.
When you have made your selection of files, click on stage selected. This shows you the following message shown below, where you can press submit. It means that a request has been sent to the LTA staging service to start retrieving the requested files from the tape and make them available on disk. You will get a confirmation e-mail, to acknowledge that your staging request was received and the process was queued. When the files are staged, you will get a notification email informing you that your data are ready for retrieval.
The e-mail that you get when the staging on disk is complete gives you a list of files and has several attachments. Amongst them are two files html.txt
and srm.txt
:
There are two different ways to download your files with these attachments: http and srm.
We also attach plain lists of the files/SURLs that were scheduled for staging (in the confirmation mail), those that were successfully staged, and those that could not be staged (in the success / partial success notifications).
TBB data needs to be staged by hand. Please send a request at https://support.astron.nl/sdchelpdesk to stage the data for you, specifying the filenames to be staged. To download the data, please follow the instruction under Download Data for proper authentication. Data will then be available for download using:
wget --no-check-certificate https://lofar-download.grid.surfsara.nl/lofigrid/SRMFifoGet.py?surl=<filename> .
The filename should start with srm://
You will need a valid LTA account to access this data.
You can download your requested data with the files attached to your e-mail notification. There are different possibilities and tools to do this. If you're unsure, which one to use, please refer to the according FAQ Answer at the bottom of this page.
If you open html.txt,
it contains a list of http links that you can feed to a unix command line tool like wget
or curl
or even use in a browser.
For wget you can use the following command line:
wget -i html.txt
This will download the files in html.txt
to the current directory (option '-i' reads the urls from the specified file).
Preferably, especially when downloading large files, you should also use option '-c'. This will continue unfinished earlier downloads instead of starting a fresh download of the whole file. (Make sure to first delete existing files that contain error messages instead of data, if you use this option):
wget -ci html.txt
Note that wget does not overwrite existing files. If you use the continue option ('-c') it will append any missing parts to the existing file. If you don't use the continue option and there is a file present (e.g. from a stopped earlier download), wget creates a new file by appending a number (e.g., '.1') to the filename.
NB: Do not set the username and password on the wget command line because this allows other users on the system to view them in the process list. Instead you should create a file ~/.wgetrc with two lines according to the following example:
user=lofaruser
password=secret
NB: This is only an example, you have to edit the file and enter your own personal user name and password!
Make sure that file access authorizations for the .wgetrc file only allow access by you (the owner) to avoid leaking your credentials. e.g.:
chmod 600 .wgetrc
There is no easy way to have wget rename the files as part of the command directly. It does not accept the -O flag inside a file it gets with -i. You can either rename files afterward, e.g. using the following command:
find . -name "SRMFifoGet*" | awk -F %2F '{system("mv "$0" "$NF)}'
or add the -O option to each line in html.txt but then feed each line to wget separately like this: cat html.txt | xargs wget
. By default the html.txt file does not contain such options.
The following Python script will take care of renaming and untarring the downloaded files:
#!/usr/bin/env python
"""This script is a pipeline to untar raw MS. 1 input is needed: the working directory.
AUTHOR: J.B.R. OONK (ASTRON/LEIDEN UNIV. 2015) EDITED BY: M.IACOBELLI (ASTRON)"""
import glob, os, sys
usage = 'Usage: %s inDIR' % sys.argv[0]
try:
path = sys.argv[1]
except:
print(usage) ; sys.exit(1)
#path = "./" # FILE DIRECTORY
filelist = sorted(glob.glob(path+'*.tar'))
print('LIST:', filelist)
#FILE STRING SEPARATORS
sp1d='%'
sp2d='2F'
extn='.MS'
extt='.tar'
#LOOP
print('##### STARTING THE LOOP #####')
for infile_orig in filelist:
#GET FILE
infiletar = os.path.basename(infile_orig)
infile = infiletar
print('doing file: ', infile)
spl1=infile.split(sp1d)[11]
spl2=spl1.split(sp2d)[1]
spl3=spl2.split(extn)[0]
newname = spl3+extn+extt
inFILE = path+infile
newFILE = path+newname
# SPECIFY FILE MV COMMAND
command='mv ' + inFILE + ' ' +newFILE
print(command)
# CARRY OUT FILENAME CHANGE !!!
# - COMMENT FOR TESTING OUTPUT
# - UNCOMMENT TO PERFORM FILE MV COMMAND
os.system(command)
os.system('tar -xvf '+newFILE)
os.system('rm -r '+newFILE)
print('finished rename / untar / remove of: ', newFILE)
The file srm.txt
contains a list of srm locations which you would feed to a grid client like gfal-copy
. SRM is a GRID specific protocol that is currently supported for data at the LTA locations. It is faster, especially if you have significantly more than 1 Gbit/s bandwidth. It requires a valid GRID certificate and installation of grid client tools such as the gfal suite.
Contact SDC Operations via ASTRON SDC helpdesk if you think you might need a GRID account but it cannot be provided by your own institute.
We advise users to look into using the gfal client library and command line tools. SURF provides good documentation on usage of grid storage client software, where they show examples of how some gfal scripts can be used.
Below is a collection of frequently asked questions, that include general questions together with some troubleshoot questions.
It is important to understand the data volumes of LOFAR are pretty huge and handling them requires different technologies than what we all know and use in our everyday life. For instance, LTA data is stored on magnetic tape and has to be copied to a hard drive (getting 'staged') before it can be retrieved. To transfer these amounts of data within reasonable time requires careful consideration and special tools. We try to make the LTA as convenient to use as possible, e.g. by providing http downloads for users without Grid certificate and supporting the use of Grid client tools for those who want or need the extra performance. We are aware of the fact that data retrieval is quite close to the backend technology and we hope to be able to provide solutions with higher abstraction in the future. But it will always be necessary to prepare data for download, so users are expected to plan a bit ahead (sorry!).
This depends. As a rule of thumb, we ask you to keep your requests below 5 TB in volume and less than a 1000 files. Also, the total file count in all your running requests should not exceed 5000 files at any point in time. Specifically, there are essentially two things to consider: the capabilities of your own system and the capabilities of the LTA services.
The most important thing to know about LTA capabilities, is that the disk pool that temporarily holds your data and from where it can be downloaded, has a limited capacity. This means that the data you requested is only available for download for a limited time (since the space is needed for new requests at some point). Your data is only guaranteed to stay available for 7 days. It can be re-requested after that, but you should never request more data than you can download within a few days. In most cases this is limited by the capabilities of your own system, especially your network connection (and available local storage space, of course).
The second most important LTA limit is the number of files that can be processed at the same time. Some projects do not have a lot of data volume, but the data is distributed over very many files. With large file counts, the management of the request itself puts a large load on the system. There is a maximum queue size of 10,000 files for all user requests together. So make sure to only occupy a fraction of that and wait until earlier submitted requests have finished (you get notified) before you submit new requests.
Note that the larger your request, the longer it takes until you can retrieve the first file. Also, please limit the number of requests running in parallel to a few, especially when they contain many files. In principle, we avoid introducing hard limits, but rely on reasonable user behaviour. This also means that you can block the system for a long time or, in the worst case, even bring it down. So please act responsibly or we might have to enforce some limits in the future to keep the system available for other users. Be aware, that we may cancel your request(s) in excessive cases to maintain LTA operation.
If you, by accident, staged some 100,000 files or 100 TB of data, please contact the ASTRON SDC helpdesk, so that we can stop these requests, thanks!
These are technical terms that refer to the storage backend of the LTA. Each of the three LTA sites (in Amsterdam, Juelich, and Poznan) operates an SRM (Storage Resource Management) system. Each SRM system consists of magnetic tape storage and hard disk storage. Both are addressed by a common file system, where each file has a specific locality: it can be either on disk ('online') or on tape ('nearline') or both. The usual case for LTA data is, that it is on tape only. Since the tape is not directly accessible but only through an (automated) tape library, the data on it first has to be copied from tape to disk, in order to retrieve it. This process is called 'staging'. Only while the data is (also) on disk, you will be able to download it. To save cost, the disk pool is of limited capacity and only meant for temporary caching data that a user wants to access. After 7 days, all data is automatically 'released', which means that it may be deleted from the disk storage, as soon as the space is required for other data. It then has to be staged again in order to become accessible again.
Usually, you don't have to worry about the details. But be aware, that data retrieval is a two-step procedure: 1) preparation for download ('staging') and 2) the download itself. Also, take care not to request too much data at the same time.
In principle, yes, this is the only supported procedure, at the moment. There are advanced methods based on programming interfaces and libraries but these require a certain level of expertise to install and use (see this page for more information). When using the advanced access methods you are expected to take extra care to apply fair use practices and not to overly stress the systems. If you are an 'expert user', are self-dependent enough to figure out how to work with this, and have a good reason, please contact the ASTRON SDC helpdesk for some instructions and an emphatic admonition to take extra care.
That depends. In short: Http downloads are the easiest (e.g. via wget), but downloads via SRM tools can be faster and are encouraged for large amounts.
The SRM systems which the LTA sites operate are integrated in the Grid. To work with them directly, you need a Grid certificate. To allow users without a Grid certificate to download LTA data, we operate webservers as a frontend to the SRM backend. These webservers provide the requested data via http downloads. The webservers are not excessively capable machines and meant for occasional users. If you retrieve huge amounts of data on a regular timescale, please work with SRM directly, especially if you own a Grid certificate.
You may want to read this FAQ Answer as well to make a decision: My downloads are too slow. What can I do?
First of all, you have to check how slow your download really is. If wget shows an estimated time of arrival of several hours, this does not necessarily mean that the download is 'slow': some files in the LTA are also just really huge. In most cases, your local network connection will be the bottleneck. For instance, a standard 'Fast Ethernet' network connection allows download speeds of around 120 MB/s at a maximum. Our systems are able to handle that, easily. In case you can rule out your network connection as the bottleneck: there are different ways to download your data and not all provide the same performance. By our experience, this is the order of performance:
You may also want to read this FAQ Answer for further explanation: There are different ways to download. Which one is the best
You are welcome to contact Science Data Centre Operations in case of problems that you cannot solve yourself. However, we kindly ask you to include all important information in your inquiry, so that we can quickly help you with your problem without too much back and forth:
If the LTA catalog did not show any error when you submitted your request, then it is safe to assume that your request was registered in our staging system. Usually, you should get a notification mail within a few minutes. If you did not receive the notification within an hour of submission, then our staging service may be down. Note that your request is not lost in this case and will be picked up after the service is back online. In urgent cases or if you are not sure that something went wrong while submitting your request, please contact the ASTRON SDC helpdesk.
After you got a notification that your request was scheduled, it is in our database and there's hardly a possibility that it got lost. Staging requests can take up to a day or two, but will finish a lot sooner in most cases. This depends on your request's size, but also on how busy the storage systems are from other user's requests at that moment. Sometimes, the LTA storage systems are down for maintenance and this can delay the whole procedure. You can check for downtimes here.
It is not alarming when your request did not finish in 24 hours, even when your last request finished within 10 minutes. In urgent cases or if you did not receive a notification after 48 hours, please contact the ASTRON SDC helpdesk.
This means that the storage backend could not fulfil the request at all. This might mean that the system itself is fine, but none of the files from your request could be staged (e.g. missing files). Check the error message from your mail notification for details. The notification can also indicate that there is a general problem with the storage system or with the staging service itself, i.e. something is broken or down for maintenance. We try to detect all temporary issues and only inform users in case that something is wrong with their request itself, but we cannot foresee all eventualities. If you cannot make sense out of the error message, or don't know how to deal with it, please contact the ASTRON SDC helpdesk.
If you used the stager API to submit your request, please first check whether you made a mistake, e.g. entered the wrong SURLs.
Note: We get notified of these issues as well and will usually re-schedule failed requests due to server issues after the problem was solved. So please first check whether you got a 'Data ready for retrieval' notification for the same request id after the error notification. If you did, the problem was already resolved.
In general, this means that the SRM system works fine, but there was a problem processing your request. As a result, some of your files could be staged, and some could not. Your mail notification should include a list of which files could not be prepared for download successfully together with an error message to provide the cause. If the error message says 'Incorrect URL: host does not match', this means that you combined files in a requests that are stored on two different SRM locations (e.g. one file at SURF (Amsterdam) and one file at Juelich). When one SRM location gets the request, it can only stage the local files. So in order to avoid this error, you have to request the files from different locations independently. Other messages should be self-explanatory, e.g. if a file is missing. If you cannot make sense out of the error message, or don't know how to deal with it, please contact the ASTRON SDC helpdesk.
If you used the stager API to submit your request, please first check whether you made a mistake, e.g. entered the wrong SURLs.
Note: We get notified of these issues as well and will usually re-schedule failed requests due to server issues after the problem was solved. So please first check whether you got a 'Data ready for retrieval' notification for the same request id after the error notification. If you did, the problem was already resolved.
It is currently not possible to stop a staging request via the web interface. It is possible to use the stager API (see here) for this. Alternatively, stay calm and ask ASTRON SDC helpdesk to stop the request for you.
Most errors should result in a 404/50x return code. However, some error messages are still returned as a message. Please read the error message carefully. In many cases, it should give you some indication of what went wrong. If this does not help you, please contact the ASTRON SDC helpdesk or retry after a few hours.
Important: If you use wget with option '-c', please note the following: wget does not check the contents of an existing file, so when restarting wget with option '-c' (continue) to retrieve the failed files, it will append the later data chunk to the existing file that contains the error message (and not the first section of you data). Make sure to delete the existing error files (should be obvious by the small file size) before calling 'wget -ci' again, to avoid corrupted data. If you already ended up with a corrupted file, you have to delete that and re-retrieve the whole file.
Check if the files are much smaller than you expect. Something might have gone wrong with the transfer. Please check the beginning of your files, e.g. with the linux 'head' or 'less' commands. If there is an error message, please refer to the question above. Otherwise, please try to re-retrieve an affected file. If this does not help, please contact the ASTRON SDC helpdesk.
This usually means the storage backend system is overloaded and you should try again in a few hours.
Maybe the SRM system is down for maintenance, please check the LTA portal or the LOFAR Downtimes page. If there is nothing going on, there is probably something wrong with the download service. Please try again a bit later and submit a support request to the ASTRON SDC helpdesk if the issue persists.
This can indicate too many users downloading at the same time. Please try again a bit later. There is also a limit of simultaneous downloads you are allowed to start yourself. Please limit yourself to four simultaneous downloads, the overall download rate will not improve with a larger number of connections.
This happens when you try to select a project when you were not logged into the LTA. Please first select another tab, e.g. search, then try to select your project again.
SRM tools are no longer widely supported and should be avoided. With that warning in mind: the SRM tools ignore the system's default Java heap space settings and the default is not incredibly high. You are probably trying to process a long list of files. Either reduce the amount of files in that request or increase the SRM-specific heap space by setting an environment variable 'SRM_JAVA_OPTIONS' with a higher value (e.g. '-Xms256m -Xmx256m'; default is '-Xms64m -Xmx64m').
SRM tools are no longer widely supported and should be avoided. With that warning in mind: your firewall is probably not allowing active ftp transfers. Make sure that you call srmcp with option '-server_mode=passive'.
SRM tools are no longer widely supported and should be avoided. With that warning in mind: ensure you have run 'voms-proxy-init
to generate an up-to-date proxy file. In case the error persists: The SRM tools apparently do not always use the default proxy file location $HOME/.proxy o
r you used a non-standard proxy location in voms-proxy-init''
.
X509_USER_PROXY
environment variable to your .proxy file, e.g.export X509_USER_PROXY=$HOME/.proxy
-x509_user_proxy=<path-to-.proxy-file>
, e.g.srmcp -x509_user_proxy=$HOME/.proxy <rest-of-command>
SRM tools are no longer widely supported and should be avoided. With that warning in mind: this indicates an issue with creating a secure connection to the server. There is either an issue with your personal certificate/proxy/key or with the set of trusted server certificates.
openssl rsa -des3 -in .globus/userkey.pem -out .globus/userkey.pem
'Retry with options for maximum verbosity and/or debug (see command help and/or man pages), which will print a lot of debug information to stdout. If this does not help you to figure out what is going wrong, submit a support request to the ASTRON SDC helpdesk. (Please refer to the "I want to contact the Science Data Center Operations" on what to include in your request).
Last updated: May 2024
SEE ALSO: