Study links prenatal tobacco exposure to high blood pressure risk
Study indicates that body weight is a key factor linked to elevated blood pressure in adolescence.
America Forever Bytes
Other
Study indicates that body weight is a key factor linked to elevated blood pressure in adolescence.
Background: Large language models (LLMs) require specialized methodologies to quantify model confidence for safe deployment in health care systems; however, there is a lack of established methods for confidence assessment. Objective: This study aimed to evaluate confidence metrics for multimodal LLMs interpreting ultrasound-based radiology cases and to compare self-reported, consistency-based, and hybrid methods. Methods: From a total of 330 quizzes on the Korean Society of Ultrasound in Medicine digital platform, we selected 94 multiple-choice cases. Four multimodal LLMs were evaluated: 3 reasoning models (GPT-5, Claude-4.5-Sonnet, and Gemini-3-Pro) and 1 general model (GPT-4o). Temperature was fixed at 1.0. Multiple confidence metrics were assessed: (1) self-reported metrics generated by LLMs using prompts that elicited direct confidence percentages with answers, including first self-reported confidence and mean self-reported confidence; (2) consistency-based metrics derived from 20 repeated outputs per case, including relative entropy calculated as 1 − H/log k (H=Shannon entropy, k=number of answer choices) and majority-vote percentage; and (3) a Top Weighted Score combining response frequency with self-reported confidence. Receiver operating characteristic analysis for discrimination and Spearman correlation between accuracy and each confidence metric was conducted. Additionally, model calibration was assessed using expected calibration error and Brier score. Processing time and token consumption (input, output, and total) were recorded for each application programming interface call to evaluate resource use across models. Results: Diagnostic accuracy varied across models, with Gemini-3-Pro achieving the highest accuracy (70/94, 74.47%), surpassing the median human accuracy (59%, IQR 40.3%-75%). Top Weighted Score, a hybrid metric combining response frequency and self-reported confidence, was the only metric achieving statistically significant correlations across all 4 models: Gemini-3-Pro (ρ=0.52), GPT-5 (ρ=0.43), Claude-4.5-Sonnet (ρ=0.30), and GPT-4o (ρ=0.22). Receiver operating characteristic analysis revealed that Top Weighted Score demonstrated the highest discriminative ability, with area under the curve values of 0.826 (95% CI 0.731‐0.920) for Gemini-3-Pro and 0.767 (95% CI 0.668‐0.866) for GPT-5. Top Weighted Score was the only metric achieving statistical significance in GPT-4o. Calibration analysis showed that Top Weighted Score achieved the lowest expected calibration error in GPT-5 (0.098) and Claude-4.5-Sonnet (0.192), while Gemini-3-Pro showed comparable calibration between relative entropy (0.119) and Top Weighted Score (0.122). Resource use analysis demonstrated that reasoning models required substantially longer processing times and higher token consumption compared to general models. Conclusions: In multimodal LLMs applied to ultrasound-based radiology cases, hybrid methods (Top Weighted Score) demonstrated significant associations across all evaluated models and appear to serve as more reliable indicators of diagnostic confidence compared to self-reported or consistency-based metrics alone, although the strength of these associations varied across models, and external validation is warranted before broader clinical application. These findings support integrative confidence estimation approaches that incorporate response consistency while highlighting the need for resource-efficient sampling strategies to enable practical clinical deployment.
A new study suggests chemicals from tire wear may pose health risks to humans when inhaled. Resea...
Compared to traditional cigarettes, vaping is just as bad for health, even changing the genetic makeup that could worsen the risk of several diseases.
A new study shows that urban male bowerbirds are ditching otherwise natural decorations and turning to human trash to woo mates.
Patients taking the daily pill lived, on average, about six months longer than those who received chemotherapy alone, results one expert called a "grand slam."
The study found average concentrations of 13.7 nanograms per cubic meter in March 2025, with most of the toxic metal in particles smaller than 56 nanometers. Ai...
A new study challenges skill-based matchmaking, finding that equal-skill contests can quietly hurt retention while smarter systems kept players engaged longer i...
A new study compares emissions from herbal and tobacco cigarettes, revealing that herbal cigarettes produce harmful emissions that can exceed those of tradition...
A writing test analyzing pauses and stroke patterns during dictation may detect cognitive impairment earlier than traditional methods, according to a study from...
The study suggests that sex-change surgeries and drugs cause more psychological problems than they claim to solve.
A new meta-analysis of 57 psychological studies found that essentially identical patient interviews can lead to different diagnoses for the same exact patient.
A life that leads to dementia can take many paths, but there are some common risk factors that make a diagnosis more likely.
A four-day workweek reduced burnout while maintaining productivity.
Quitting smoking may reduce dementia risk, according to a study of more than 32,000 adults over 25 years. Risk declined the longer a person remained smoke-free.
The key to heart health isn't cutting down on pasta or potatoes, new evidence suggests; it's not even a low-fat diet.
Research now shows that mosquitoes may bite the hand that feeds them Deet — at least in time. The commonly used chemical — which is the gold standard ingred...
Discover how unplanned downtime is costing global businesses an astounding $600 billion each year. This article explores the findings of a recent study, the imp...
Background: Reducing 30-day hospital readmissions has been a long-standing goal across health systems in the United States. While nurse-led phone outreach has b...
Background: Mindfulness meditation has been reported to reduce stress and enhance well-being. However, its effects on heart rate variability (HRV)—a physiolog...
Model created by researchers shows better outcomes are often more likely when people are not too ambitious
A few breathless minutes of exercise each week could have a far bigger impact on your health than most people realize.
Stay caffeinated out there.
A warmer world will likely make bigger and more damaging hail, a new study said.
"Mental health is deeply connected to the brain and body, and food is one of the most fundamental biological inputs we have."