Abstract:Aiming at the inter-class homogeneity and intra-class multimodality problems of remote sensing images and the problem of high dependence on data holding of existing remote sensing image scene recognition methods based on fully supervised paradigm. In this paper, a self-supervised contrastive learning based on global attention for remote sensing scene classification(GACL), is proposed. First, a spatio-temporal invariant-based data enhancement module is proposed and constructed to learn the consistent features of remote sensing images in different spatio-temporal spaces; then, in order to fully explore and establish the spatial contextual relationships between images, a residual global attention-based feature extraction module is constructed; finally, in order to fully learn the invariant information in multilayered features, and to reduce the interference of the sample imbalance on the recognition accuracy, a focus loss and multilayered features-based approach is proposed and constructed. interference, a composite contrast loss function is constructed based on focus loss and multilayer contrast loss. The experiments on the NWPU, UCM and MLRSNet datasets achieve an accuracy of 79.83%, 83.01% and 94.46%, respectively. The effectiveness and superiority of GACL are verified.