Research Computation Facility for GOSAT-2
Background
The Greenhouse gases Observing SATellite, GOSAT (IBUKI) was launched in January 2009 as the world’s first satellite dedicated to observe greenhouse gases from space. The large amount of the GOSAT observational data has been processed and distributed through the GOSAT Data Handling Facility (GOSAT DHF). To ensure the research and development of the data processing algorithms, the National Institute for Environmental Studies (hereinafter referred to “NIES”) installed the GOSAT Research Computation Facility (hereinafter referred to “RCF”) in March 2010.
RCF processed a total of 58-year data of short wavelength infrared during the six-year operation. Based on the research processing results, the processing algorithms for the short wavelength infrared data were revised and the accuracy of the column-averaged dry-air mole fractions of carbon dioxide and methane was improved greatly.
Through this achievement, the necessity of the computing facility for the research and development of the algorithms was recognized, which led to introduce the Research Computation Facility for GOSAT-2 (hereinafter referred to “RCF2”). The main purpose of RCF2 is to research and develop the GOSAT-2 data processing algorithms steadily as the whole GOSAT-2 project based on the GOSAT data.
The major users of RCF were only researchers at NIES, while RCF2 became available to researchers from the outside relating to the GOSAT-2 project.
RCF2 operation
RCF2 has been operated by the GOSAT-2 project of NIES Satellite Observation Center. The operational status of RCF2 is as follows:
March 2016 | installed RCF2 |
September 2016 | started service for users at NIES |
December 2016 | started service for users outside NIES |
Specifications of RCF2
The specifications of RCF2 (those of RCF are shown in parentheses.)
The name of main parts
Theoretical peak performance
Energy efficiency
* Ranked number 8 on the Green500's energy-efficient
supercomputers list (as of June 2017).
https://www.top500.org/green500/lists/2017/06/
Shared storage capacity
Interconnect performance
Characteristics of RCF2
RCF2 installed EcoManager2, a successor of EcoManager which was the original function of RCF. EcoManager was initially designed to only save energy interlocked with simple jobs. With experience during the RCF operation, some functions such as timing adjustment for start the compute node were added to EcoManager. Additionally, EcoManager2 has more functions such as auto-balancing of compute node utilization, auto-confirmation of compute node soundness, and assigning compute node redundancy.
New functions added to EcoManager2 are as follows:
With EcoManager, compute nodes were statically linked with job queues, therefore, those linked with a job queue frequently used had more utilization time and a larger number of start/stop than the average.
In general, the number of failures increases as utilization time and the number of start/stop rise. Accordingly, with EcoManager2, static links between job queues and compute nodes were removed, instead the function was implemented to allocate a compute node for a job dynamically based on the past utilization and balance compute node utilization automatically.
When EcoManager started compute node, failure such as unexecuted jobs and irregular stops occasionally occurred due to start failure and other troubles. In each case, operators isolated the cause of the failure and restarted it manually. EcoManager2 is equipped with a function to automatically check compute node soundness immediately after the computer node starts, which is the first phase of manually isolating the failure cause.
In RCF, some spare compute nodes were prepared as static cold standbys, while in RCF2, EcoManager2 implemented a dynamic hot standby function to start compute nodes which exceed the requested number and assign available normal nodes.
* The photo on the top banner: Interconnect switch of RCF2