 
    
  Bit-depth enhancement (BDE) plays an important role in providing high bit-depth data support for high-dynamic range (HDR) display. Although convolutional neural network (CNN) based BDE methods have achieved top performance, multiscale feature extraction and fusion still suffer from some inherent architectural flaws. Moreover, the training-data-scarce scene has not been effectively explored. To this end, this paper proposes an innovative multiscale recurrent fusion transformer (MRFT) framework, which contains three key components, i.e. multiscale transformer feature encoder, recurrent feature fusion module, and prior knowledge injection. Specifically, the multiscale transformer feature encoder consists of a prior-injected context encoder (PICE) and a multiscale local feature encoder (MLFE). PICE leverages the vanilla self-attention mechanism to extract the global context correlating spatially-distant contents for distinguishing long-distance false contours. MLFE exploits the local self-attention mechanism with varied window sizes to capture different-scale detail features. Then, a hierarchical recurrent decoder (HRD) is proposed as the recurrent feature fusion module to fuse multiscale visual information with global guidance. Via the circular query-key mechanism, global-to-local information is progressively fused. Furthermore, we propose a two-stage alternating optimization strategy for prior knowledge injection. By pre-parameterizing the global auxiliary priors, the training dilemma on the data-scarce domain is significantly alleviated. Extensive analyses on multiple benchmark datasets demonstrate the superiority of our MRFT in terms of quantitative measures and aesthetic effects.Xin Wen, Weizhi Nie, Jing Liu, Yuting Su:MRFT: Multiscale Recurrent Fusion Transformer Based Prior Knowledge for Bit-Depth Enhancement. IEEE Trans. Circuits Syst. Video Technol. 33(10):5562-5575 (2023).