Ehud Rivlin
Authored Publications
Sort By
Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM
Alon Levkovitch
Roy Hirsch
Chulayuth Asawaroengchai
ICLR (2024)
Preview abstract
We present Spectron, a novel approach to adapting pre-trained large language models (LLMs) to perform spoken question answering (QA) and speech continuation. By endowing the LLM with a pre-trained speech encoder, our model becomes able to take speech inputs and generate speech outputs. The entire system is trained endto-end and operates directly on spectrograms, simplifying our architecture. Key to our approach is a training objective that jointly supervises speech recognition, text continuation, and speech synthesis using only paired speech-text pairs, enabling a ‘cross-modal’ chain-of-thought within a single decoding pass. Our method surpasses existing spoken language models in speaker preservation and semantic coherence. Furthermore, the proposed model improves upon direct initialization in retaining the knowledge of the original LLM as demonstrated through spoken QA datasets. We release our audio samples and spoken QA dataset via our website.
View details
Artificial intelligence for phase recognition in complex laparoscopic cholecystectomy
Tomer Golany
Amit Aides
Nadav Avraham Rabani
Wisam Khoury
Hanoch Kashtan
Petachia Reissman
Surgical Endoscopy (2022)
Preview abstract
Background: The potential role and benefits of AI in surgery has yet to be determined. This study is a first step in developing an AI system for minimizing adverse events and improving patient’s safety. We developed an Artificial Intelligence (AI) algorithm and evaluated its performance in recognizing surgical phases of laparoscopic cholecystectomy (LC) videos spanning a range of complexities.
Methods: A set of 371 LC videos with various complexity levels and containing adverse events was collected from five hospitals. Two expert surgeons segmented each video into 10 phases including Calot’s triangle dissection and clipping and cutting. For each video, adverse events were also annotated when present (major bleeding; gallbladder perforation; major bile leakage; and incidental finding) and complexity level (on a scale of 1–5) was also recorded. The dataset was then split in an 80:20 ratio (294 and 77 videos), stratified by complexity, hospital, and adverse events to train and test the AI model, respectively. The AI-surgeon agreement was then compared to the agreement between surgeons.
Results: The mean accuracy of the AI model for surgical phase recognition was 89% [95% CI 87.1%, 90.6%], comparable to the mean inter-annotator agreement of 90% [95% CI 89.4%, 90.5%]. The model’s accuracy was inversely associated with procedure complexity, decreasing from 92% (complexity level 1) to 88% (complexity level 3) to 81% (complexity level 5).
Conclusion: The AI model successfully identified surgical phases in both simple and complex LC procedures. Further validation and system training is warranted to evaluate its potential applications such as to increase patient safety during surgery.
View details
Detection of Elusive Polyps via a Large Scale AI System
Dan Livovsky
Danny Veikherman
Tomer Golany
Amit Aides
Valentin Dashinsky
Nadav Rabani
David Ben Shimol
Yochai Blau
Ilan Moshe Shimshoni
Ori Segol
Eran Goldin
Jesse Lachter
Gastrointestinal Endoscopy (2021)
Preview abstract
Colorectal cancer (CRC) is the second leading cause of cancer death worldwide resulting in an estimated 900,000 deaths per year. Colonoscopy is the gold standard for detection and removal of precancerous lesions, and has been amply shown to reduce mortality. However, the miss rate for polyps during colonoscopies is 22-28%, while 20-24% of the missed lesions are histologically confirmed adenomas. To address this shortcoming, we propose a polyp detection system based on deep learning, which can alert the operator in real-time to the presence and location of polyps during a colonoscopy. We dub the system DEEP^2: DEEP DEtection of ElusivePolyps. The DEEP^2 system was trained on 3,611 hours of colonoscopy videos derived from two sources, and was validated on a set comprising 1,393 hours of video, coming from a third, unrelated source. For the validation set, the ground truth labelling was provided by offline GI annotators, who were able to watch the video in slow-motion and pause/rewind as required; two or three such annotators examined each video.
Overall, DEEP^2 achieves a sensitivity of 96.8% at 4.9 false alarms per video, which improves substantially on the current state of the art. These results are attained using a neural network architecture which is designed to provide fast computations, and can therefore run in real-time at greater than 30 frames per second. We further analyze the data by examining its performance on elusive polyps, those polyps which are particularly difficult for endoscopists to detect. First, we show that on fast polyps that are in the field of view for less than 5 seconds, DEEP^2 attains a sensitivity of 88.5%, compared to a sensitivity of 31.7% for the endoscopists performing the procedure. On even shorter duration polyps, those that are in the field of view for less than 2 seconds, the difference is even starker: DEEP^2 attains a sensitivity of 84.9% vs. 18.9% for the endoscopists. Second, we examine procedures which are apparently clean, in that no polyps are detected by either the performing endoscopist or the offline annotators. In these sequences, DEEP^2 is able to detect polyps -- not seen by either live endoscopists or offline annotators -- which were later verified to be real polyps: an average of 0.22 polyps per sequence, of which 0.10 are adenomas. Finally, a preliminary small clinical validation indicates that the system will be useful in practice: on 32 procedures, DEEP^2 discovered an average of 1.06 polyps per procedure that would have otherwise been missed by the GI performing the procedure. Future work will be needed to measure the clinical impact on a larger scale.
View details
Detecting Deficient Coverage in Colonoscopies
Amit Aides
Ariel Gordon
Danny Veikherman
Ilan Moshe Shimshoni
Tomer Golany
Yochai Blau
IEEE Transactions on Medical Imaging (2020)
Preview abstract
Colorectal Cancer (CRC) is a global health problem, resulting in 900K deaths per year. Colonoscopy is the tool of choice for preventing CRC, by detecting polyps before they become cancerous, and removing them. However, colonoscopy is hampered by the fact that endoscopists routinely miss an average of 22-28% of polyps. While some of these missed polyps appear in the endoscopist's field of view, others are missed simply because of substandard coverage of the procedure, i.e. not all of the colon is seen. This paper attempts to rectify the problem of substandard coverage in colonoscopy through the introduction of the C2D2 (Colonoscopy Coverage Deficiency via Depth) algorithm which detects deficient coverage, and can thereby alert the endoscopist to revisit a given area. More specifically, C2D2 consists of two separate algorithms: the first performs depth estimation of the colon given an ordinary RGB video stream; while the second computes coverage given these depth estimates. Rather than compute coverage for the entire colon, our algorithm computes coverage locally, on a segment-by-segment basis; C2D2 can then indicate in real-time whether a particular area of the colon has suffered from deficient coverage, and if so the endoscopist can return to that area. Our coverage algorithm is the first such algorithm to be evaluated in a large-scale way; while our depth estimation technique is the first calibration-free unsupervised method applied to colonoscopies. The C2D2 algorithm achieves state of the art results in the detection of deficient coverage: it is 2.4 times more accurate than human experts.
View details
Preview abstract
We propose two solutions for both nearest neigh-
bors and range search problems. For the nearest
neighbors problem, we propose a c-approximate so-
lution for the restricted version of the decision prob-
lem with bounded radius which is then reduced to
the nearest neighbors by a known reduction. For
range searching we propose a scheme that learns
the parameters in a learning stage adopting them
to the case of a set of points with low intrinsic
dimension that are embedded in high dimensional
space (common scenario for image point descrip-
tors). We compare our algorithms to the best known
methods for these problems, i.e. LSH, ANN and
FLANN. We show analytically and experimentally
that we can do better for moderate approximation
factor. In contrast to tree structures, our algorithms
are trivial to parallelize. In the experiments con-
ducted, running on couple of million images, our
algorithms show meaningful speed-ups when com-
pared with the above mentioned methods.
View details