Goal: To investigate the contextual and temporal distribution of false positives (FPs) in a state-of-the-art deep learning (DL)-based atrial fibrillation (AF) detection algorithm when applied to an electrocardiogram (ECG) dataset collected under free-living ambulatory conditions. We hypothesize that under such conditions, the FPs detected by a DL model might have some correlations with the patient’s ambulatory contexts. Method: First, a DL model is trained and evaluated on three public arrhythmia datasets from PhysioNet. It is ensured that the model has state-of-the-art performance on these public datasets. Thereafter, the same model is applied to a 215-days long contextualized single-channel ECG dataset collected under free-living ambulatory conditions. Through a manual examination of the model’s output, ground truth is obtained and the correlations between the patient’s ambulatory contexts and the true/false positive rate are analyzed. Results: Nearly 62% of the segments marked as AF by the model were ≤ 50 seconds in length, and 99.9% of them were FPs. Among these non-trivial short segments of FPs, almost 78% were mainly associated with three specific contextual events; change in activity, change in body position (especially during the night), and sudden movement acceleration. Moreover, the number of FPs detected by the DL model are higher in female than in male participants. Finally, true positive (TP) AF segments are found more in the morning and late evening. Significance: These findings may have significant implications for the current use and future design of DL models for AF detection, and help understand the role of context information in reducing the FP rate in real-time AF detection under free-living conditions.