- Research
- Open access
- Published:
Reliability of pre-admission patient-reported outcome measures postoperatively assessed via proxies: a prospective, multicenter observational study
Critical Care volume 29, Article number: 200 (2025)
Abstract
Background
Pre-admission status obtained through patient-reported outcome measures is an essential metric in both clinical and research settings for prognostication and treatment decisions. It is frequently collected by proxies, although its reliability has yet to be thoroughly investigated. The objective was to determine the reliability of proxy assessments regarding pre-ICU admission status via patient-reported outcome measures and to explore the impact of the ICU setting on these assessments.
Methods
Prospective multicentre observational study in two tertiary care university hospitals in Germany, including surgical adult patients able to independently answer the patient-reported outcome measures (SF-36, EQ-5D-5L, WHODAS 2.0, IADL, and Barthel Index) with a proxy available. Patients were interviewed pre-operatively, while proxies were interviewed post-operatively in the ICU or normal ward, depending on the patient's location. The reliability of patient-reported outcome measures was analyzed using Bland–Altman plots and Cohen’s kappa.
Results
Of 204 patient-proxy pairs, 102 were admitted to an ICU. The median patient and proxy age were 69 and 64 years, with 41% and 68% female, respectively. Bland–Altman plots demonstrated insufficient reliability of proxy ratings, as the 95% limits of agreement fell outside the minimal clinically important difference (MCID) for all questionnaires. However, a significant bias was evident only among ICU patients, showing worse ratings from proxies for the WHODAS 2.0, IADL, and subsets of the SF-36. For the EQ-5D, bias appeared in both the ICU and non-ICU cohorts. The dichotomous analysis of the within-pairs-difference supported the findings, revealing a high proportion of pairs with differences outside the MCID (n (%)—SF-36 PCS normal ward: 47 (46%), ICU: 58 (57%); SF-36 MCS normal ward: 59 (58%), ICU: 62 (61%); WHODAS 2.0 normal ward: 58 (57%), ICU: 78 (77%); EQ-5D-5L normal ward: 40 (39%), ICU: 46 (45%); Barthel-Index normal ward: 22 (22%), ICU: 21 (21%)). Cohen’s kappa indicated moderate reliability for the IADL.
Conclusion
The reliability of proxy assessment with the instruments used was insufficient, exhibiting a significant bias in the pre-admission status of ICU patients; therefore, it should be applied with caution.
Trial registration: ClinicalTrials.gov (NCT03785444—28th of December 2018).
Graphical abstract

Introduction
Patients'physical, mental and cognitive status before intensive care is a critical factor influencing the course of treatment and its outcome, especially in an ageing population [1, 2]. In particular, the pre-admission status is an independent and robust predictor for intensive care unit (ICU) and hospital mortality [3, 4]. Additionally, pre-admission status affects morbidity, as it has been shown to impact dependency, quality of life, and readmission rates after discharge [5]. Therefore, it is logical—albeit ethically debated—that the inclusion of pre-admission status in triage tools has been proposed in situations of ICU bed shortages, such as during the COVID-19 pandemic [6]. Typically, pre-admission status is assessed through various patient-reported outcome measures (PROM), which demonstrate a strong correlation with actual performance status following discharge from the ICU [7].
Unfortunately, the pre-admission status of patients in ICUs is often unknown due to the frequently unexpected onset of critical illness. Therefore, it was proposed that the missing information on pre-admission status be obtained through proxy assessment [8]. However, the proxy assessment had an inherent bias in a propensity score-matched analysis of community-dwelling Medicare beneficiaries [9]. Similarly, a propensity score-matched cohort, consisting of more than 22,000 participants selected from the general population in a national health survey in China, demonstrated a systematic bias wherein proxies undervalued the health status [10]. A systematic review concluded that the quality of proxy assessment was heterogeneous and depended on the instrument and the setting [11]. Consequently, it is challenging to extrapolate results to different cohorts and outcome measures. Earlier studies suggested insufficient reliability in the ICU cohort due to multiple biases. The trial by Rogers et al. gathered patient and proxy responses at ICU discharge, thus eliminating the confounder that might impact proxy response when the patient's outcome is unclear. Hofhuis et al. included only patients able to respond at ICU admission, rendering proxy assessment redundant. Hence, the evidence remains inconclusive [12, 13]. Furthermore, there is a lack of studies investigating the impact of the ICU setting on proxy assessment.
Although the reliability of ICU proxy assessment of PROMs is uncertain, it is widely utilized in clinical settings for prognostication and treatment decisions, as well as in research for study inclusion. This study aimed to prospectively investigate the reliability of proxy assessments concerning pre-ICU admission status acquired through PROMs and to demonstrate the influence of the ICU setting on the proxy assessment.
Methods
This prospective two-centre observational study was approved by the institutional ethics committee of the medical faculty of the Technical University of Munich (542/18S) and conducted at Charité-Universitätsmedizin Berlin and Klinikum rechts der Isar of the Technical University of Munich, following the Declaration of Helsinki. It was registered at ClinicalTrials.gov (NCT03785444) on December 28th, 2018.
Participants
Patients were eligible for recruitment if they were at least 18 years old and a proxy was able to be present during the post-operative period. A proxy was seen as a person that is emotionally or socially connected to the patient. A familial or legal tie was not a requirement in order to reflect the modern and diverse environment of social relationships. The inability of the patient to answer the PROMs independently before the operation was an exclusion criterion. Patients and proxies were enrolled in the study after obtaining written informed consent. Screening was conducted using a convenient sampling method.
Data collection
Data was collected by trained and dedicated study staff after enrolment to accurately represent the scenario in which proxies are asked to respond when patients cannot. Consequently, patients were interviewed preoperatively in the normal ward or during the pre-anaesthesia consultation. As a result, proxies were interviewed postoperatively, i.e., when patients would be non-responsive, either in the normal ward, in the ICU, or by telephone. This depended mainly on the availability of the proxy, mirroring a real-world clinical scenario. The responses were provided by both patients and proxies in written form or verbally, and the interviewer documented them in the case report form. The interview covered baseline demographic variables and all the items from the different PROMs. The interview duration depended on the respondent and was not a priori set or limited.
Variables
Demographic data was recorded for the patients (i.e., age, sex, hospital length of stay, pre-existing medical conditions, pre-existing medication for those illnesses, number of prior surgeries as well as surgical discipline for the upcoming surgery) and proxies (i.e., age and sex), as also collected, along with qualitative and quantitative values regarding their relationships, including relationship status, whether they share a household, and how often they visit, call, or text. Various validated patient-reported outcome measures (PROMs), commonly used in studies investigating long-term outcomes of critical illness and recommended by multiple core outcome sets, were employed to assess the patients’ current status [14, 15]. Health-related quality of life was evaluated using the Medical Outcome Study Short-Form Health Survey 36 (SF-36), as recommended by the core outcome sets for acute respiratory failure, along with nutritional and metabolic interventions. Overall disability and health were assessed with the World Health Organization Disability Assessment Schedule 2.0 (WHODAS 2.0), due to its strong correlation with performance-based outcome measures, as well as the EQ-5D-5L, which is recommended by the core outcome set for respiratory failure. Independence in daily living was measured using the instrumental activities of daily living (IADL) scale and the Barthel index, as advised by the core outcome set for nutritional and metabolic interventions, with the modification that these two measures were to be reported for the situation two weeks prior to the current hospital admission [7, 16,17,18,19,20,21,22].
Sample size
The results of patient versus proxies were compared using Bland-Altman diagrams. For that, a test size of 100 patients has been recommended [23]. This resulted in 200 patients (100 ICU patients and 100 non-ICU patients).
Statistical analysis
For the SF-36, the Mental and Physical Component Score (MCS/PCS) were calculated, and missing data was handled according to the manual by Ware et al. [16, 17]. The sum of the 12 items from the WHODAS 2.0 was calculated and converted to a percentage, as previously done by Hodgson et al. and Shulman et al.; missing data were handled as outlined by the manual [24,25,26]. The index score for the EQ-5D-5L was calculated according to the German value set [27]. IADL and Barthel sum scores were calculated as described in the initial publication [20, 22].
Missing items in the scores were imputed using a classification and regression tree. Following this, patient scores were calculated, incorporating both reported and imputed items. A sensitivity analysis excluding imputed values is provided in the supplement.
Descriptive statistics are presented for continuous variables based on the evaluation of normality of distribution using the Shapiro–Wilk test and reported as either median (interquartile range—IQR) or mean (standard deviation—SD). Categorical variables are presented as counts (percentages). Differences between groups were assessed similarly with either a t-test or Mann–Whitney U test for continuous variables, depending on the results of the Shapiro–Wilk test, and using the Chi-Square test for categorical variables.
Interrater reliability for continuous variables was evaluated via Bland–Altman plots and unweighted and weighted Cohen’s kappa for categorical variables. The clinical acceptability of the deviation was assessed by comparing the 95%-limits of agreement to the minimal clinically important difference (MCID) for the respective questionnaire (5 for the SF-36 PCS, 5.5 for the MCS, 5 for the WHODAS 2.0, 0.18 for the EQ-5D and 9.8 for the Barthel Index) [26, 28,29,30]. The rationale behind this comparison is that if the deviation between patients and proxies is larger than the MCID a clinically significant difference between two assessments could be caused only by the difference in responder. A systematic bias was evaluated via the 95% confidence interval of the mean difference. The difference was derived by subtracting the proxy score from the patient scores. This difference was also analyzed dichotomously with the MCID as the cut-off value. Statistical analysis was performed with IBM SPSS Statistics for Macintosh, Version 27.0 (IBM Corp, Armonk, NY, USA) and R Foundation for Statistical Computing (Vienna, Austria). Figures were created using GraphPad Prism version 7.0.0 for Macintosh (GraphPad Software, San Diego, CA, USA) and R Foundation for Statistical Computing (Vienna, Austria).
Results
Study cohort
A total of 368 patient proxy pairs were screened and 204 of those were recruited between 25 th of February 2019 and 27 th of July 2020. 102 (50%) patients were postoperatively admitted to the ICU (Figure S1). Patients were 41% females, had a median age of 69 [interquartile range (IQR) 59–77] years and were mainly admitted for visceral or cardiac surgery. The median hospital length of stay was 13.0 [IQR 8.0–21.0] days, with a median ICU length of stay of 3.0 [IQR 1.0–5.0] days. Patients admitted to the ICU postoperatively were significantly older, had more comorbidities requiring medication, were mostly cardiac surgical patients, and presented worse pre-operative PROMs compared to those admitted to the normal ward postoperatively (Table 1). Their proxies had a median age of 64 [IQR 53–71] years, were 68% females and were mostly spouses with a common household, therefore seeing each other daily. Thirty-eight postoperative interviews were conducted in the ICU (Table 2).
Study short-form health survey 36 (SF-36)
The Bland–Altman plots showed 95% limits of agreement for the PCS and MCS widely outside the MCID of 5 for the PCS and 5.5 for the MCS [28] for both the ICU (Fig. 1a, c) and normal ward (Fig. 1b, d) patients, indicating insufficient reliability. This was corroborated by the dichotomous analysis, which showed a high proportion of pairs that had scoring difference outside the MCID. Normal ward and ICU patients displayed similar proportions in this analysis (Table 3). While for normal ward patients, no bias was present, a significant bias towards a worse evaluation by the proxies for the PCS and MCS in ICU patients was observed (Fig. 1a–d). Sensitivity analysis without imputation showed similar results (Supplementary File 1: Figure S1 and Table S1).
Reliability for regular ward and ICU in Bland–Altman plots for the SF-36. Displayed are the mean value of the response from patients and proxies on the x-axis and the difference between their answers on the y-axis. The blue lines indicate the mean difference and the corresponding 95% CI. The yellow lines indicate the upper and lower 95% limits of agreement with the respective 95% CI. Panel a shows the Physical Component Score for the ICU cohort and b for the normal ward cohort while c shows the Mental Component Score for the ICU cohort and d for the normal ward cohort. The SF-36 PCS (normal ward – mean difference [95% limits of agreement]: 0.9 [− 18.6 to 20.4]; ICU – mean difference [95% limits of agreement]: 3.5 [− 15.4 to 22.5]) shows insufficient reliability and a significant bias in the ICU patients (normal ward – mean difference [95% CI]: 0.9 [− 1.0 to 2.9]; ICU – mean difference [95%CI]: 3.5 [1.6–5.4]). The SF-36 MCS (normal ward – mean difference [95% limits of agreement]: 0.6 [− 20.4 to 21.5]; ICU – mean difference [95% limits of agreement]: 3.8 [− 20.0 to 27.5]) shows insufficient reliability and a significant bias in the ICU patients (normal ward – mean difference [95% CI]: 0.6 [− 1.5 to 2.7]; ICU – mean difference [95%CI]: 3.8 [1.4–6.2]). ICU intensive care unit; NW normal ward
WHO disability assessment schedule 2.0
Similar to the SF-36, the Bland–Altman plots demonstrated 95% limits of agreement for the WHODAS 2.0 widely outside the MCID of 5 [26] for both the ICU (Fig. 2a) and normal ward (Fig. 2b) patients, indicating insufficient reliability. Moreover, a significant bias towards a worse evaluation by the proxies was evident in ICU patients, which could not be identified in normal ward patients (Fig. 2a, b). This was again corroborated by the dichotomous analysis, which showed a more significant proportion of ICU patients outside the MCID than normal ward patients (Table 3). Sensitivity analysis without imputation showed similar results (Supplementary File 1 Figure S2 and Table S1).
Reliability for regular ward and ICU in Bland–Altman plots for the WHODAS 2.0, Barthel-Index and IADL. Displayed are the mean value of the response from patients and proxies on the x-axis and the difference between their answers on the y-axis. The blue lines indicate the mean difference and the corresponding 95% CI. The yellow lines indicate the upper and lower 95% limits of agreement with the respective 95% CI. Panel a shows the WHODAS 2.0 for the ICU cohort and b for the normal ward cohort, c shows the EQ-5D for the ICU cohort and d for the normal ward cohort while e shows the Barthel Index for the ICU cohort and f for the normal ward cohort. The WHODAS 2.0 (normal ward – mean difference [95% limits of agreement]: − 2.0 [− 38.7 to 34.8]; ICU – mean difference [95% limits of agreement]: − 8.2 [− 43.7 to 27.3]) shows insufficient reliability with a significant bias in the ICU patients (normal ward – mean difference [95% CI]: − 2.0 [− 5.6 to 1.7]; ICU – mean difference [95%CI]: − 8.2 [− 11.8 to − 4.6]). The EQ-5D (normal ward – mean difference [95% limits of agreement]: 0.1 [− 0.5 to 0.6]; ICU – mean difference [95% limits of agreement]: 0.2 [− 0.5 to 0.9]) shows insufficient reliability without a bias (normal ward – mean difference [95% CI]: 0.1 [0.0–0.1]; ICU – mean difference [95%CI]: 0.2 [0.1 to − 0.3]). The Barthel (normal ward – mean difference [95% limits of agreement]: 1.1 [− 19.9 to 22.0]; ICU – mean difference [95% limits of agreement]: 3.5 [− 20.6 to 27.5]) shows insufficient reliability with a significant bias in ICU patients (normal ward – mean difference [95% CI]: 1.1 [− 1.0 to 3.2]; ICU – mean difference [95%CI]: 3.5 [1.0 to 5.9]). ICU intensive care unit; NW normal ward
EQ-5D
The Bland–Altman presented 95% limits of agreement again outside the MCID of 0.18 [29] in the ICU (Fig. 2c) and normal ward (Fig. 2d) patients, indicating insufficient reliability also for EQ-5D. Additionally, a significant bias towards a worse evaluation by the proxies was present in both cohorts. The dichotomous analysis also pointed towards insufficient reliability, although there was no significant difference between ICU and normal ward patients (Table 3). Sensitivity analysis without imputation showed similar results (Supplementary File 1: Figure S3 and Table S1).
Barthel Index
The Bland–Altman plots showed that the 95% limits of agreement extend beyond the MCID of 9.8 [30] in ICU (Fig. 2e) and normal ward (Fig. 2f) patients, indicating insufficient reliability for the Barthel Index. Furthermore, a significant bias towards a worse evaluation by the proxies could be observed in ICU patients. These results were underlined by the dichotomous analysis, with a relevant proportion of ICU and normal ward patients displaying a difference outside the MCID (Table 3). Sensitivity analysis without imputation showed similar results (Supplementary File 1 Figure S2 and Table S1).
Instrumental activities of daily living
Agreement for IADL was “fair” when evaluated with unweighted values and improved to “moderate” when evaluated with weighted kappa values (Table S2). The sensitivity analysis confirmed those findings (Supplementary File 1 Table S3).
Subgroup analyses
We conducted a subgroup analysis considering the frequency of contact between the patient and proxy, along with shared household status for both parties, as indicators of a close relationship. Neither variable demonstrated a significant difference in the proportion of assessments showing a difference below the MCID (Supplementary File 1 Tables S4 and S5).
Discussion
This is the first study investigating the reliability of the pre-admission PROM assessment via proxies in a collective of surgical patients focusing on post-operative ICU patients who reported their PROMs prior to surgery and the respective ICU admission. We were able to demonstrate that the reliability of the assessment of the SF-36, WHODAS 2.0, IADL, EQ5D-5L, and Barthel Index by a proxy was insufficient. Furthermore, a significant bias was present in ICU patients with a worse evaluation by the proxies compared to the normal ward patients, indicating that the ICU admission or stay appeared to be a substantial confounder in the assessment via proxies.
Proxies, such as children, spouses, and friends, experience severe sleep disorders, anxiety, and fatigue during the ICU treatment of their loved ones, which may affect their judgment [31, 32]. These factors could cause recall bias and consequently influence the assessment of pre-admission status via PROMs, explaining the insufficient reliability and observed bias. Our findings and this assumption align with the investigation by Rogers et al., who also evaluated the reliability of proxy assessments of the SF-36 in ICU patients. They demonstrated that reliability is compromised, showing a significant bias towards a poorer evaluation by proxies, which is consistent with our results. However, they collected scores for assessing admission status at ICU discharge and six months later, revealing improved reliability 6 months after discharge, thus reinforcing the argument for the significant impact of the ICU setting on proxy assessments [12]. At the time of ICU admission, when the patient's outcome is uncertain, this bias may be even more pronounced. Furthermore, it remains unclear whether patients and proxies with certain characteristics, such as a prolonged relationship or cohabitation, exhibit better reliability. This question should be investigated in future studies.
The impact of the setting was further supported by an evaluation of the reliability of proxy assessments in 96 elective cardiac surgery patients, conducted prior to surgery, at hospital discharge, and 6 months after discharge. The lowest level of agreement was found at hospital discharge [33]. These studies confirm our observation of bias within the ICU cohort that was absent in the normal ward cohort, suggesting that this bias emerged from the influence of the proxies'current situation.
Since no objective measures were performed, it remains unclear whether patients’ or proxies’ responses were more accurate. Multiple factors that could influence patients’ self-reports exist in the ICU context. Response shift, or the change in self-reported quality of life concerning the time point and circumstances, was a confounding factor in previous trials that retrospectively assessed pre-admission status [34]. This shift was observed in other groups, such as severely mobility-disabled individuals [35]. On the other hand, PTSD and depression are common challenges following critical illness [36, 37]. Furthermore, lower PROM ratings and more chronic conditions may be evident in the ICU cohort, which could contribute to the observed bias between patients and proxies in this group. These factors could be confounders skewing the patients’ self-evaluation, either positively or negatively. This was highlighted in the work by Dinglas et al., who assessed the differences between patient and proxy evaluations conducted at hospital discharge regarding pre-admission baseline status. They identified a non-linear bias in their Bland–Altman analysis, showing that proxies tended to diminish patients’ responses; specifically, when high patient scores were reported, lower proxy ratings followed, and conversely, when lower patient scores were noted, higher proxy ratings appeared [34]. This further indicates that bias is not uniform and depends on variables such as the instrument used, severity of illness, outcomes of critical illness, and prior functional status. Our results reflect this, as the reliability of the IADL, which employs a clearly defined scale, tends to exhibit superior reliability compared to scores utilizing a more subjective scale.
Health-related quality-of-life instruments are subjective, making them more challenging to report by a proxy. These factors are illustrated in the study by Elliott et al., who found better agreement on more objective measurements, such as physical function, and worse agreement for subjective areas, such as vitality [33]. This emphasizes the need for further research to identify and implement outcome measures that are resistant to confounding by proxy assessment for critically ill patients.
The study by Hofhuis et al. is the only one that examined the reliability of a proxy assessment of patients' health-related quality of life at ICU admission, similar to ours. They utilised the SF-36 to evaluate health-related quality of life. Their study demonstrated a significant difference between patients and proxies across the various SF-36 domains, except for physical and emotional roles [13]. This aligns with our findings, which indicate poor reliability. However, the applicability of their study to real-life scenarios is limited, as their patients could complete the questionnaires themselves upon ICU admission. Nonetheless, Scales et al. produced similar results in a cohort of ARDS patients when the proxy was interviewed at admission and the patient later [38]. Consistent with the SF-36 results, the EQ-5D also performs poorly when obtained via proxies [34]. No studies are available on the other assessment tools that were explored. A strength of the conducted study is its multicentric design and the assessment of patients and proxies at different time points—patients before surgery and ICU admission, as well as proxies after surgery and ICU admission. This design reflects a real-world scenario where proxies are interviewed after patients have been admitted to the ICU. This approach allows for assessing the ICU as a confounding factor for the proxy response while excluding it as a confounder for the patient’s response, as might occur if the patient were evaluated in the ICU. Another strength is the heterogeneity within the cohort, which facilitates generalisation to a broad population of surgical ICU patients.
The results were limited by the subjective nature of the assessment performed, as they did not permit identification of whether the patient's or proxy's response was closer to the objective performance status. Nevertheless, the patient’s ratings serve as the gold standard, independent of objective assessments, for patient-reported outcome measures like quality of life. Furthermore, accurate functional measurements at ICU admission are often unattainable due to ventilation or sedation. A second limitation was that our cohort included solely elective surgical patients who were able to complete the questionnaires preoperatively. It is unclear whether medical ICU patients, emergency admissions, and their proxies are similarly biased. Additionally, emergency admissions likely introduced a greater bias, as the aforementioned confounding factors may increase due to the unexpected nature of the illness and its impact on proxies. Both these aspects should be investigated in future trials. Cultural and socioeconomic factors were not examined separately and could be potential confounders. These factors might also result in response bias, as external expectations may skew the proxy responses. To date, there is no established core outcome set for assessing patients prior to surgeries associated with a high likelihood of ICU admission, nor for the pre-admission ICU status of surgical patients. The questionnaires used represent a sample of various core outcome sets applied in different critical care research topics. Furthermore, part of the recruitment occurred during the SARS-CoV-2 pandemic, and its impact on critical care, as a central focus of mainstream media highlighting high mortality rates, cannot be evaluated.
In conclusion, the proxy assessment of pre-admission PROMs demonstrated insufficient reliability and exhibited significant bias in ICU patients when using commonly utilised questionnaires (SF-36, WHODAS 2.0, ED-5L, and Barthel Index). Therefore, it should be applied cautiously and not interchangeably in analyses with patient reports. Future research into objective or robust subjective measures is urgently needed to adequately inform treatment decisions, study inclusion, and the development of an adequate ICU pre-admission assessment.
Availability of data and materials
The datasets generated and/or analysed during the current study are not publicly available due to German data protection laws but are available from the corresponding author in anonymised form on reasonable scientific request.
Abbreviations
- ICU:
-
Intensive care unit
- PROM:
-
Patient-reported outcome measure
- MCID:
-
Minimal clinically important difference
- IADL:
-
Instrumental activities of daily living
- WHODAS 2.0:
-
World Health Organization Disability Assessment Schedule
- SF-36:
-
Medical Outcome Study Short-Form Health Survey 36
- MCS:
-
Mental Component Score
- PCS:
-
Physical Component Score
- IQR:
-
Interquartile range
- SD:
-
Standard deviation
References
Needham DM, Davidson J, Cohen H, Hopkins RO, Weinert C, Wunsch H, et al. Improving long-term outcomes after discharge from intensive care unit: report from a stakeholders’ conference. Crit Care Med. 2012;40(2):502–9.
Bagshaw SM, Webb SA, Delaney A, George C, Pilcher D, Hart GK, et al. Very old patients admitted to intensive care in Australia and New Zealand: a multi-centre cohort analysis. Crit Care. 2009;13(2):R45.
Wehler M, Strauss R, Bost A, Geise A, Mueller A, Martus P, et al. Pre-admission functional status and outcome in medical intensive care. Crit Care. 2000;4(Suppl 1):P225.
Krinsley JS, Wasser T, Kang G, Bagshaw SM. Pre-admission functional status impacts the performance of the APACHE IV model of mortality prediction in critically ill patients. Crit Care. 2017;21(1):110.
Bagshaw SM, Stelfox HT, McDermid RC, Rolfson DB, Tsuyuki RT, Baig N, et al. Association between frailty and short- and long-term outcomes among critically ill patients: a multicentre prospective cohort study. CMAJ. 2014;186(2):E95-102.
Sprung CL, Joynt GM, Christian MD, Truog RD, Rello J, Nates JL. Adult ICU triage during the coronavirus disease 2019 pandemic: who will live and who will die? recommendations to improve survival. Crit Care Med. 2020;48(8):1196–202.
Paton M, Lane R, Paul E, Linke N, Shehabi Y, Hodgson CL. Correlation of patient-reported outcome measures to performance-based function in critical care survivors: predictable. Aust Crit Care. 2023;36(4):485–91.
Cohen RI, Eichorn A, Silver A. Admission decisions to a medical intensive care unit are based on functional status rather than severity of illness. A single center experience. Minerva Anestesiol. 2012;78(11):1226–33.
Li M, Harris I, Lu ZK. Differences in proxy-reported and patient-reported outcomes: assessing health and functional status among medicare beneficiaries. BMC Med Res Methodol. 2015;15:62.
Liang Y, Che T, Zhang H, Shang L, Zhang Y, Xu Y, et al. Assessing the proxy response bias of EQ-5D-3 L in general population: a study based on a large-scale representative household health survey using propensity score matching. Health Qual Life Outcomes. 2020;18(1):75.
Hernandez JD, Spir MA, Payares K, Posada AM, Salinas FA, Garcia HI, et al. Assessment by proxy of the SF-36 and WHO-DAS 2.0. A systematic review. J Rehabil Med. 2023;55:jrm4493.
Rogers J, Ridley S, Chrispin P, Scotton H, Lloyd D. Reliability of the next of kins’ estimates of critically ill patients’ quality of life. Anaesthesia. 1997;52(12):1137–43.
Hofhuis J, Hautvast JLA, Schrijvers AJP, Bakker J. Quality of life on admission to the intensive care: can we query the relatives? Intensive Care Med. 2003;29(6):974–9.
Davies TW, van Gassel RJJ, van de Poll M, Gunst J, Casaer MP, Christopher KB, et al. Core outcome measures for clinical effectiveness trials of nutritional and metabolic interventions in critical illness: an international modified Delphi consensus study evaluation (CONCISE). Crit Care. 2022;26(1):240.
Needham DM, Sepulveda KA, Dinglas VD, Chessare CM, Friedman LA, Bingham CO, et al. Core outcome measures for clinical research in acute respiratory failure survivors. An International Modified Delphi Consensus Study. Am J Resp Crit Care. 2017;196(9):1122–30.
Ware JE Jr, Kosinski M, Bayliss MS, McHorney CA, Rogers WH, Raczek A. Comparison of methods for the scoring and statistical analysis of SF-36 health profile and summary measures: summary of results from the Medical Outcomes Study. Med Care. 1995;33(4 Suppl):AS264–79.
Ware JE Jr, Sherbourne CD. The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Med Care. 1992;30(6):473–83.
Ustun TB, Chatterji S, Kostanjsek N, Rehm J, Kennedy C, Epping-Jordan J, et al. Developing the World Health Organization Disability Assessment Schedule 2.0. Bull World Health Organ. 2010;88(11):815–23.
Herdman M, Gudex C, Lloyd A, Janssen M, Kind P, Parkin D, et al. Development and preliminary testing of the new five-level version of EQ-5D (EQ-5D-5L). Qual Life Res. 2011;20(10):1727–36.
Lawton MP, Brody EM. Assessment of older people: self-maintaining and instrumental activities of daily living. Gerontologist. 1969;9(3):179–86.
Nasreddine ZS, Phillips NA, Bédirian V, Charbonneau S, Whitehead V, Collin I, et al. The Montreal Cognitive Assessment, MoCA: a brief screening tool for mild cognitive impairment. J Am Geriatr Soc. 2005;53(4):695–9.
Mahoney FI, Barthel DW. Functional evaluation: the Barthel Index. Md State Med J. 1965;14:61–5.
Bland JM. How can I decide the sample size for a study of agreement between two methods of measurement? 2004; updated 12.01.2004. Available from: https://www-users.york.ac.uk/~mb55/meas/sizemeth.htm.
Hodgson CL, Udy AA, Bailey M, Barrett J, Bellomo R, Bucknall T, et al. The impact of disability in survivors of critical illness. Intensive Care Med. 2017;43(7):992–1001.
Üstün TB KN, Chatterji S, Rehm J. Measuring health and disability manual for WHO Disability Assessment Schedule WHODAS 2.0: World Health Organization; 2010.
Shulman MA, Kasza J, Myles PS. Defining the minimal clinically important difference and patient-acceptable symptom state score for disability assessment in surgical patients. Anesthesiology. 2020;132(6):1362–70.
Ludwig K, Graf von der Schulenburg JM, Greiner W. German value set for the EQ-5D-5L. Pharmacoeconomics. 2018;36(6):663–74.
Haines KJ, Berney S, Warrillow S, Denehy L. Long-term recovery following critical illness in an Australian cohort. J Intensive Care. 2018;6:8.
(ISPOR) TPSfHEaOR. Minimal clinically important difference in EQ-5D: we can calculate it—but does that mean we should? [PowerPoint]. Webpage: ISPOR; 2017 [PowerPoint]. Available from: https://www.ispor.org/docs/default-source/presentations/1066.pdf.
Unnanuntana A, Jarusriwanna A, Nepal S. Validity and responsiveness of Barthel index for measuring functional recovery after hemiarthroplasty for femoral neck fracture. Arch Orthop Trauma Surg. 2018;138(12):1671–7.
Williamson AM, Feyer AM. Moderate sleep deprivation produces impairments in cognitive and motor performance equivalent to legally prescribed levels of alcohol intoxication. Occup Environ Med. 2000;57(10):649–55.
Day A, Haj-Bakri S, Lubchansky S, Mehta S. Sleep, anxiety and fatigue in family members of patients admitted to the intensive care unit: a questionnaire study. Crit Care. 2013;17(3):R91.
Elliott D, Lazarus R, Leeder SR. Proxy respondents reliably assessed the quality of life of elective cardiac surgery patients. J Clin Epidemiol. 2006;59(2):153–9.
Dinglas VD, Gifford JM, Husain N, Colantuoni E, Needham DM. Quality of life before intensive care using EQ-5D: patient versus proxy responses. Crit Care Med. 2013;41(1):9–14.
Stensman R. Severely mobility-disabled people assess the quality of their lives. Scand J Rehabil Med. 1985;17(2):87–99.
Patel MB, Jackson JC, Morandi A, Girard TD, Hughes CG, Thompson JL, et al. Incidence and risk factors for intensive care unit-related post-traumatic stress disorder in veterans and civilians. Am J Respir Crit Care Med. 2016;193(12):1373–81.
Rabiee A, Nikayin S, Hashem MD, Huang M, Dinglas VD, Bienvenu OJ, et al. Depressive symptoms after critical illness: a systematic review and meta-analysis. Crit Care Med. 2016;44(9):1744–53.
Scales DC, Tansey CM, Matte A, Herridge MS. Difference in reported pre-morbid health-related quality of life between ARDS survivors and their substitute decision makers. Intensive Care Med. 2006;32(11):1826–31.
Acknowledgements
Not applicable.
Funding
Open Access funding enabled and organized by Projekt DEAL. Institutional funding only.
Author information
Authors and Affiliations
Contributions
JJG performed data analysis and interpretation and drafted and reviewed the manuscript. LH acquired data and reviewed the manuscript. BU performed data analysis and interpretation and reviewed the manuscript. MM performed data interpretation and reviewed the manuscript. MB performed data analysis and interpretation and reviewed the manuscript. KS acquired data and reviewed the manuscript. LS acquired data and reviewed the manuscript. SWC performed data interpretation and reviewed the manuscript. KF performed data analysis and interpretation and reviewed the manuscript. TW designed the study, performed data acquisition, analysis and interpretation and drafted and reviewed the manuscript. SJS designed the study, performed data acquisition, analysis and interpretation and drafted and reviewed the manuscript. All authors approved the final manuscript and agree to be accountable for all aspects of the work.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
This study was approved by the institutional ethics committee of the medical faculty of the Technical University of Munich (542/18S) and conducted at Charité-Universitätsmedizin Berlin and Klinikum rechts der Isar of the Technical University of Munich according to the declaration of Helsinki. It was registered at ClinicalTrials.gov (NCT03785444—28 th of December 2018). All patients gave written informed consent to participate.
Consent for publication
Not applicable.
Competing interests
JJG, LH, BU, MM, KS, LS, SWC, KF and TW have no conflict of interest to declare. MB participated in a DSMB sponsored by GE Healthcare, Helsinki, Finland. SJS received grants and non-financial support from Reactive Robotics GmbH (Munich, Germany), ASP GmbH (Attendorn, Germany), STIMIT AG (Biel, Switzerland), ESICM (Geneva, Switzerland), grants, personal fees, and non-financial support from Fresenius Kabi Deutschland GmbH (Bad Homburg, Germany), grants from the Innovationsfond of The Federal Joint Committee (G-BA), personal fees from Springer Verlag GmbH (Vienna, Austria) for educational purposes and Advanz Pharma GmbH (Bielefeld, Germany), non-financial support from national and international societies (and their congress organisers) in the field of anesthesiology and intensive care medicine, outside the submitted work. Dr. Schaller holds stocks in small amounts from Alphabet Inc., Ascendis Pharma Inc., Bayer AG, and Siemens AG; these holdings have not affected any decisions regarding his research or this study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Grunow, J.J., Hartmann, L., Ulm, B. et al. Reliability of pre-admission patient-reported outcome measures postoperatively assessed via proxies: a prospective, multicenter observational study. Crit Care 29, 200 (2025). https://doi.org/10.1186/s13054-025-05431-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13054-025-05431-6