Space Weather Analytics Dataset
We have built a large-scale dataset of the solar active region was specifically for space weather analytics for Solar Flares (SWAN-SF). The dataset contains metadata from over 4,000 trajectories of the solar active region patches, which are integrated with carefully curated solar flare data over the best part of one complete solar cycle, almost as long as SDO/HMI commissioning time.
This dataset is machine learning ready. It allows different machine learning tools and techniques to uniformly use specific sections of the dataset for training, testing, and validation, while accurately comparing their performance to determine the most promising ones
See the publication here
Overall workflow of the local trajectory outlier detection method.
Outlier detection has become one of the core tasks in spatio-temporal data mining. It plays an essential role in data quality improvement for the machine learning models and recognizing the anomalous patterns, which may remarkably deviate from expected patterns among the trajectory datasets. In this work, we propose a clustering-based technique to detect local outliers in trajectory datasets by utilizing spatial and temporal attributes of moving objects.
See the publication here
Local Outlier Detection
for Spatiotemporal Trajectories
This local outlier detection involves three phases. In the first phase, we apply a temporal partition procedure to divide the raw trajectory into multiple trajectory segments and extract trajectory features from spatial and temporal attributes for each trajectory segment. Then, we generate template features of trajectory segments by applying a clustering schema in the second phase. Finally, we use the abnormal score - a novel dissimilarity measure, which quantifies the disparity among the query and template trajectory segments in terms of trajectory features and hence determines the local outliers based on the distribution of abnormal score. To demonstrate the effectiveness of our method, we conduct three case studies on the real-life spatio-temporal trajectory datasets from the solar astroinformatics domain (i.e., solar active regions, coronal mass ejections, polarity inversion lines (PIL)). Our experimental results show that our local outlier detection approach can effectively discover the erroneous reports from the reporting module and abnormal phenomenon in various spatio-temporal trajectory datasets.
Polarity Inversion Lines Detection
Magnetic polarity inversion line (PIL) in solar active regions have been recognized as essential features for the occurrence of solar flares and the prediction of the flaring phenomenon. In this work, we provide a software framework that detects PILs from the line-of-sight (LoS) or the radial component of the magnetic field vector in active region magnetogram patches. The PIL detection procedure is based on an edge detection technique along with magnetic field strength and PIL size filter. First, we identify positive and negative polarity regions with a magnetic field strength threshold. Then, we utilize the Canny edge detector and morphological operations to both positive and negative regions to identify coarse PILs. Finally, we generate PILs by applying magnetic field strength and PIL size filter to the coarse PILs as mentioned above.
Three binary masks overlay the magnetogram with a normalized magnetogram field strength map. The color bar indicates the intensity of magnetic field strength range from -1500 Gauss to +1500 Gauss.
Moreover, we provide feature extraction functions to obtain the properties of PILs (i.e., PIL size, the area of polarity inversion, the masked unsigned flux of enclosing PIL, convexity, eigenvalues, fractal dimension, and Hu moments of PIL shape), and produce three PIL-related binary masks (i.e., PIL, the region of polarity inversion, and the convex hull of PIL) for each Longitudinal magnetogram patch.
See the publication here
Qualitative evaluation of SubPixel CNN and ResNet models trained using cropped (2k by 2k) magnetograms
We show that through our experimental evaluation our models perform better than baselines and Sub-Pixel CNN super resolution model provides viable results for magnetogram super resolution.
Magnetograms Super-Resolution
Image super-resolution is a branch of image processing that is concerned with enhancing the spatial resolution and quality of images by learning the intrinsic details and relations between the lower resolution input and the higher resolution output images. It is widely accepted as an ill-posed problem, which has seen tremendous advancements with deep learning based models. In this work, we present two magnetogram super resolution models, Sub-Pixel Convolutional Neural Network (CNN) and Enhanced Deep Residual Networks (ResNet), which can be used for improving the spatial resolution of solar magnetograms. While the ill-posed nature of problem is still a challenge, there are several application areas, including space weather prediction, which can greatly benefit from the improved spatial resolution of solar magnetograms.
See the publication here
Rare-Event Time Series Prediction
We present a case study for time series prediction models in extreme class-imbalance problems. We have extracted multiple properties from the Space Weather Analytics for Solar Flares (SWAN-SF) benchmark dataset which comprises of magnetic features from over 4075 active regions over a period of 9 years to create the forecasting dataset used in this study. In the extracted dataset, the class-imbalance ratio is 1:60, where the minority class is formed by instances of strong solar flares (GOES M-and X-class). This ratio reaches 1:800 if we only consider the strongest class of flares (GOES X-class). We have explored remedies to tackle the class-imbalance issue such as undersampling, oversampling, and misclassification weights.
Frequency and imbalance ratio of all five flare classes across different partitions of SWAN-SF benchmark dataset.
In the process, we elaborate on common mistakes and pitfalls caused by ignoring the side effects of these remedies, including how and why they weaken the robustness of the trained models while seemingly improving the performance.
See the publication here