Det medisinske fakultet

Clinical utility of strain rate imaging

by Asbjørn Støylen, dr. med.



This section updated: December 2009

This section will deal with the documentation of the clinical value of strain rate imaging, and the emphasis will be on clinical studies. A lot can also be inferred from studies of tissue velocity, so much information from this will also be taken into account. The limitations and pitfalls of the strain rate method still applies, however, so results of studies has to be interpreted with caution. It is important to be aware that clinical studies often show differences between groups. To evaluate the results of a study in terms of clinical utility, one has to consider how this will translate into the examination of a single patient, compared to normal values, or in terms of changes in measurements in one patient over time. to evaluate this, understanding of some basic statistical concepts is necessary. Although known to most readers, this section starts with a short chapter about this, which may be bypassed at will by the links in the section index. Further, this section will deal with the clinical use of strain rate imaging according to different clinical conditions, important studies will be reviewed.

In February 2008 I am in the process of adding a paragraph on how ultrasound applications are tested in trials. A knowledge of this is useful in order to assess the quality of evidence in favor of a new application.

 

Section index

Back to website index

Validity, reliability and discriminatory ability.

The performace of any measurment method is closely linked to the concepts of validity and reliability. Roughly, it can be said that the validity of a measurement is the answer to the question: Does it measure what we believe it measures? Reliability, on the other hand is a measure of how well it returns the same answer if  peerformet more than once. It can be illustrated as target shooting as in the figure below:


Target shooting with two different weapons. The weapon on the left shows a high reliability, as the shots are well gathered. However,  the whole group and hence the average is off centre, thus the method is less valid. The weapon on the right  shows better validity, as the average of the shots are on centre, but the shots are less  well gathered (more scattered), the weapon will tend to hit in a more different location each time, it is less reliable.

Different echo methods may have different validity and reliability, as demonstrated by this study (151):



Comparison of  three different ultraound methods for deformation imagiong, against tagged MR as reference; Left 2D strain, middle segmental strain by combined tissue Doppler and speckle tracking, and right strain by dynamic velocity gradient. Top row: Bland Altmann plots, bottom row scatterplots with identity line shown. It can be seen that there is a significant bias between 2D strain and MR, while the measurements are fairly well gathered together, but on the average below the identity line. The segmental method has a small bias, but this is not significant. The method is less reliable, as seen by greater scatter (and lower correlation). The method on the left shows no bias, i.e. good validity, but even greater scatter, and is clearly the least reliable.

Validity is related to the comparison to a reference method, but it may also be related to basic physiological theory. It may be maintained, for instance, that deformation measures alone have a low physiological validity, as the main point is the stress-strain relation, i.e. the deformation in relation to the load, and that measurements of deformatipon alone without incorporating some measure of load have little physiological validity. However, for clinical purposes, the main point is not physiological validity, but the dicriminatory ability of the method. As differnt methods use different approaches, the validity agains a reference may vary. However, for ech method, both cut off and normal values may be established, and measurements may be evaluated in terms of these in itself. For instance, the lower normal value for EF is 50% for ultrasound, but 60% for ventriculaography, as this method uses contrast and thus activates the Frank-Starling mechanism by volume loading. The main interest for the clinician is the methods dicriminatory ability, the ability to discriminate between normal and abnormal function. And given a set of reference values for the method, the main point is reliability.

Reliability is used concurrent with reproducibility or repeatibility , or it's inverse; varability. 


Trials in Ultrasound technology - methodology

Whenever a new ultrasound application is introduced, the trials will undergo several trial phases, analogous to the clinical trials of new treatment.

Validation.

The initial trials after mathematical simulations are validation. It is a trial of the accuracy of the application (does it measure what we think it measures), and the precision (variability of measurements compared to the reference).  This will comprise the use of the application for measurements in a setting where the  result is known, in the first stage, often measurements on a physical phantom. In addition tests can be done in experimental settings with a reference method, that can be for instance ultrasonomicrometry, or post mortem measurements with micro spheres or staining techniques. Finally, validation studies can be performed in patient studies, with the reference being another and established method (e.g. MR tagging for deformation), if available. In the case of valve regurgitation, the references are sorely lacking, as the reference has often been radiologically grading by contrast leakage, which is notoriously imprecise as well as semi quantitative.The study should give both the mean bias - with the significance of the bias, and the variability of the method compared to the reference, as well as the inherent variability of the method by repeated measurements. However, the reference method should have a known variability itself, and ideally, this reference ought to be evaluated by repeated measurements as well, in a validation study, as the reference method may have different variability at different centers. At least, the variability of the reference method should be know at the study center.

The statistics used will vary wit the type of measurements. Correlation between measurements should not be used, as explained below. A significant bias does not necessarily invalidate the application, as most biological measurements are method dependent, but the bias should be taken into account in the clinical studies as well as eventual clinical work. If two methods are tested against the same reference for comparison, differences in variability should be tested for significance.


Validation studies should also be performed by using a prospective material. Often, the development of an application comprises a fitting of a model or correction formula (e.g. the formula for LV mass), and thus, the cases that have gone into this cannot be used to test the validity of the application afterwards, as the cases are no longer independent of the reference, thus voiding the null hypothesis.


Feasibility (Phase 1 clinical trial).

The first step in testing a new application is the feasibility. The main points of this is  to evaluate if the method can separate between normal and pathological. This means that the diagnosis has to be established by independent means, but it is not necessary to use the exact same measurements. In myocardial infarction, the diagnosis could be the standard diagnostic criteria, coronary angiography MR etc. In the case of diagnosing infarcted segments by deformation, for instance, the reference should be MR late enhancement, which does nor measure deformation, but images necrosis or scar.  Feasibility could be done in the same study as the validation, but typically, diagnostic criteria and reference measurements may often not be the same. In addition, feasibility studies need to be slightly larger, as the study should show  how many of the patients with a specific disease (or segments etc.) the method can be applied to.

Finally the feasibility study should give the variability again by repeated measurements, as well as the size and significance of differences between groups of patients or between patients and controls. Of course, if the method is to be applied to new groups, for instance children cs. adults, or new patient groups, new feasibility studies need to be done for each group. At this stage, the method may be used for pathophysiological studies that uses group data, and may already yield valuable clinical information. However, the use for diagnosis in individual patients is still insufficient.

 

Clinical utility (phase 2) trials

The feasibility studies are not  enough to determine if an application is of clinical value. Any study that only gives the significance of group values must be classified as a feasibility study only. A phase two study should elucidate the diagnostic utility of the new method, generally in terms of sensitivity and specificity. In general, the numbers have to be somewhat larger for this, than for a feasibility study.

Ideally, a phase two study should have a diagnostic gold standard. For instance in the case of myocardial infarction this could be the general diagnostic criteria, or in the case of ischemia it could be coronary angiography (although angiography actually does not show ischemia). If the diagnostics was infarction on the segmental level, it could be late enhancement MR. The study should test the sensitivity and specificity in terms of area under the curve, as well as determine the optimal cut off values. The AUC itself is a diagnostic accuracy indicator, but ideally the cut off values should be tested prospectively in a new part of the patient material, not used for determination of the original cut off values. The cases that have gone into the original ROC analysis cannot be used to test the cut off values afterwards, as the cases are no longer independent of the reference, thus voiding the null hypothesis. For an ideal utility trial, the study should include a comparison with the performance of older and established methods against the same reference, (in ischemia for instance wall motion score), as well as new variability studies of both old and new methods.

Even so, the phase 2 studies, do not establish normal limits for measurements, but only optimal cut off between patients and controls as discussed below. These cut off values may vary according to which patient groups that are compared to normals. For instance, the limits between normal and reduced annular velocity may vary considerably when controls are compared to patients with reduced EF (37), or to compare athletes and patients with pathological hypertrophy (39).

Thus population studies are needed to establish the normal range, and this is a must for decision making when an unselected subject is examined for diagnostic purposes. Only then is the method established as a fully fledged clinical method. For strain rate imaging by tissue Doppler, this is appearing, for speckle tracking this is still far off.

Outcome (phase 3) trials


Finally, the ideal is to establish the beneficial effect of a diagnostic method for the patients in terms of clinical endpoints as for instance event free survival or over all survival. This is analogous to the large multicentre trials of drugs. However, this involves randomized multicentre trials with randomized use or non-use of a certain diagnostic method, with thousands of patients, and at a cost of tens of millions of Euros. Such trials are more or less totally lacking in the field of diagnostics. Thus, there is very little evidence that new diagnostic methods actually improves outcome.

If such trials should be undertaken, due to the size of the trials, they should comprise a whole diagnostic field,  applied to groups of patients. But as diagnostic methods are establishes, this raises serious ethical problems as well as the problem of financing such trials. Thus outcome studies will probably remain an ideal.







Basic statistical concepts


To  evaluate the the value of a study of a diagnostic test, as well as the clinical implications, some basic statistical concepts are necessary. Basically, all statistics is about quantization of the degree of uncertainty of the knowledge. This section has incorporated

Mean and standard deviation

Univariate statistics: Mean of a group represents the representative value of the group. It has little predictive value for the individual, and for what is normal values.  The standard deviation (SD) represents the spread of the values around the mean.  Mean ±1SD includes 68%, mean ±2SD 95% of the population.  A common way of defining normal values is mean of a normal population ±2SD. The specificity and sensitivity of this approach, however, depends on both the SD and the difference in mean value between normals and patients for a given measurement. The SD includes both biological and measurement variability, but the SD will still show ow dispersed measurement values are and give an idea of how useful a measurement or method is. If data are not normally distributed, or if they are ordinal, the use of mean and standard deviations is meaningless, and median and percentiles should be used instead.



Confidence intervals and significance

Standard error of the mean (SEM), is the error in the estimate of the value of the mean value in a group. This is a function of the SD and the number of patients in the study SEM = SD /  SQRT[n]. Thus the mean is more precisely estimated by large numbers. ± 2 SEM is the 95% confidence interval for the mean, and if the 95% confidence interval of the mean of two groups does not overlap, the difference between the groups is significant at the 5% level (p < 0.05). The p value is the probability of the study given the result by random chance, a p value of 0.05 is usually considered significant. If multiple comparisons are given, the limit for p value of each pair has to be adjusted by the number of pairs. When data are presented with the variability, it is important to see if  SD, 2SD or SEM are given. The p value and SEM only tells about the precision of the study, not the method, being dependent on:



A study giving only mean and significance, will show the difference between groups, and is useful in defining normalcy and in pathophysiology. However, this gives limited information about the clinical utility of  a method:
  1. Group comparison of means is an á posteriori analysis of the data, where the groups are separated by some (hopefully) independent criterion, and the average measurement values in the groups are compared. The clinical use of the method is dependent on the predictive value of the measurement, which should be addressed by an á priori analysis where patients are separated by the measurement values, and then counting the number of patients from each group classified by the measurement categories: sensitivity and specificity analysis.
  2. The sensitivity and specificity is dependent on the variability of the method. The more variable, the less predictive. Group differences may be highly significant, but if the difference between the group means are smaller that the variation between repeated measurements, the method is useless even if findings are highly significant. A clinical situation is thus comparable to a study where N = 2, either in comparing findings in a patient with normal values, comparing two patients or comparing serial measurements in one patient (in the time course of a disease or treatment), and the clinical value of a method is dependent on the smallest significant difference between two measurements.
The difference between á posteriori analysis by group data and á priori analysis by sensitivity and specificity is equivalent to the difference between case - control  and prospective studies of treatment. 

Repeatability / Variability

To assess this, repeatability or variability has to be taken into account. Repeatability and variability are inversely related. This is usually assessed by repeated measurements. The variability of repeated measurements in the same recordings reflects measurement variability, and should be done both for intra (repeated by the same person) and inter (two different persons) observer measurement. Repeated measurement in different recordings tells about the additional biological variation.

The variation coefficient is the variability as percent of the measured value, but is less used, as the measurement error often is constant, independent of the measured value. Most used is the Bland Altman method (23). With repeated measurements, the standard deviations of the difference between first and second measurement defines the variability. The mean of two measurements ± 2SD of the difference between the two measurements represents the 95% limits of agreement. The same method can be used to compare two methods for the same measurement. If the limits of agreement are given, they can be compared to the differences between means of different groups. However, in some cases, the  Bland-Altman plot shows that  difference increases with increasing value. In that case, the variation coefficient is more appropriate.

2SD of the difference is considered to be the coefficient of reproducibility, the least difference that is significant in one individual. Thus, to consider a clinical study, first consider the difference in the measurements between the groups being studied, and then the reproducibility coefficient (if only limits of agreement are given, this is equal to half of the total interval). This will tell if strain rate imaging is useful in the individual patient, and which difference should be considered significant in the clinic.

For categorical data, the kappa coefficient is considered instead. The kappa coefficient represents how much better agreement is than what would result from chance only. A kappa of 0 means that the agreement is no better than chance. The interpretation of the kappa coefficient is as follows (96):


Kappa value
Strength of agreement
< 0.20
Poor
0.21 - 0.40
Fair
0.41 - 0.60
Moderate
0.61 - 0.80
Good
0.81 - 1.0
Very good

If the categorical data are ordered in an ordinal scale, the kappa can be weighted for the degree of difference; weighted kappa.

Abuse of correlations

Correlation between two data sets, means the degree of correspondence. It is measured by the correlation coefficient, which is between 0 and one or 0 and 100%. A correlation should not be mistaken for causality, it just tells about a degree of co variatiation.

The statistical significance of the correlation depends on the magnitude of the correlation coefficient and the sample size. As with all significance testing, it presumes a null hypothesis that the two variables are unrelated, and calculates the probability of the correlation from this, i.e. from random chance.

It has been customary to compare repeated measurements by correlation. This is a total misunderstanding of basic statistics. The correlation tells about the significance and degree of relation between two different variables. The significance measures the probability of the correlation between the two measurements being the result of random chance.

Repeated measurements of the same variable
It stands to reason that repeated measurements of the same variable has to be related as they are the same quantity, and thus the correlation is meaningless, statistically, as one cannot postulate independence between repeated measurements. On the other hand, as repeated measurements measure the same variable twice, low correlation indicates high variability, as variability is the only source of difference between the measurements. Thus, a high correlation between repeated measurements does not add information, but low correlation tells about low repeatability.

Two - population correlation:
An even worse instance of the abuse of correlations is used in a data set containing values from two widely separated groups. An example will illustrate this:



Random A
Random B

Random A + 1
Random B + 1
-0,15785511
0,24518677

0,84214489
1,24518677
-0,00206696 0,39771828
0,99793304 1,39771828
0,35256079 0,44670743
1,35256079 1,44670743
0,15056601 -0,10828912
1,15056601 0,89171088
-0,34869335 0,21256035
0,65130665 1,21256035
-0,46608042 0,30885905
0,53391958 1,30885905
0,19459726 -0,31834638
1,19459726 0,68165362
-0,05706528 0,13382845
0,94293472 1,13382845
-0,2382372 -0,12532215
0,7617628 0,87467785
-0,20069113 -0,03678429
0,79930887 0,96321571



Random A - 1 Random B - 1
0,31954698 -0,14691161
-0,68045302 -1,14691161
0,39753125 -0,11305625
-0,60246875  -1,11305625
0,30898395 -0,43904442
-0,69101605 -1,43904442
0,46917397 0,20024285
-0,53082603 -0,79975715
0,41984839 -0,27716311
-0,58015161 -1,27716311
-0,46824326 -0,33975619
-1,46824326 -1,33975619
0,30650135 0,27807874
-0,69349865 -0,72192126
-0,41892469 -0,38139578
-1,41892469 -1,38139578
-0,39099663 0,22625706
-1,39099663 -0,77374294
-0,08575413 0,39689965
-1,08575413 -0,60310035
R =  -0,06637163 (P = 1)

R = 0,92132846 (P < 0.001)


Random A and Random B are two rows of 20 numbers between -0.5 and +0.5, generated by a random number generator. There is no correspondence between the two rows; they vary independently (covariance = 0), and hence, no correlation. In the two rows to the left, the upper ten numbers in each row has added 1, and the lowest ten numbers subtracted 1. By this we have introduced a covariance, (covariance = 1),  greater than the  variation  interval in the  original data set. In this case, the  correlation coefficient is 0.92, and apparently highly significant. However, this is solely because of the non random element introduced.



Scatter plot of the values of two different rows of random numbers (Random A and Random B) plotted against each other. The values are scattered all over the range, with no correlation.

The same two rows transformed by adding 1 to the first half and subtracting 1 from the second half  of both rows. This clumps the values in two distinct groups, and the correlation is suddenly 92%.


This is the same situation seen in testing correlation between repeated measurements, between two methods or between two variables if correlation is tested in the whole data set consisting of two groups or experimental situations with values widely separated. By choosing the groups in this way, a similar non random element is introduced, equivalent to  the manoeuvre done in  the random numbers. This renders the  the statistics useless, but is nevertheless frequently seen in publications, also in some in the reference list.

This can easily be seen if one looks at the plots, both scatter and Bland Altman will show the clustering of the values. This is the explanation why many authors have found  high correlations despite wide limits of agreement.

In fact, this situation is equivalent to having only two data points, i.e. two data sets with only two values representing the mean of each cluster. And even with a high correlation, this will not be significant because of the low number of arguments. At best, this kind of  correlation analysis can be used to compare the performance of two different methods against a gold standard in this selected subset.



Sensitivity and specificity.

The diagnostic value of a test is often given as sensitivity and specificity, the ability of the test to diagnose the presence and absence of a condition, respectively:




The test is:

The condition is:                                         
Present
Absent
Sum
Positive
True positive (Tp)
False positive (Fp)
All positive (Tp + Fp)
Negative
False negative (Fn)
True negative (Tn)
All negative (Fn + Tn)
Sum
All with condition (Tp + Fn)
All without condition (Fp + Tn)
All (Tp + Fp + Fn + Tn)


Sensitivity is the  percentage of patients with condition who have a positive test test, i.e. sensitivity = Tp / Tp+Fn.
Specificity is the percentage of subjects without the condition who have a negative test. Specificity = Tn / Tn+Fp
Diagnostic accuracy is the number of true tests as proportion to all tests. Accuracy =  Tp+Tn / Tp+Fp+Fn+Tn

Ideally, sensitivity and specificity should be given with their confidence intervals.

For quantitative measurements, the data can be analysed by a receiver operating curve (ROC) analysis. This is the sensitivity plotted against 1 - specificity for all possible cut-off values. For a ROC curve the test must be evaluated against known presence of disease, or a diagnostic gold standard. The area under the curve (AUC) is a measure of the diagnostic power of a test, AUC of 0.5 represents the random chance, AUC of 1.0 represents a perfect test, with sensitivity and specificity of 100%. The curve can identify the cut off value with the optimal relation between sensitivity and specificity, i.e. the cut off giving the highest accuracy.


This is the sensitivity plotted against 1 - specificity for all possible cut-off values. For a ROC curve the test must be evaluated against known presence of disease, or a diagnostic gold standard. The area under the curve (AUC) is a measure of the diagnostic power of a test, AUC of 0.5 represents the random chance, AUC of 1.0 represents a perfect test, with sensitivity and specificity of 100%. The curve can identify the cut off value with the optimal relation between sensitivity and specificity, i.e. the cut off giving the highest accuracy.

As with correlations, ROC analysis cannot tell about the diagnostic power of a test if the data are pre selected to consist of two groups with very different measurements (e.g. segments from mod infarct compared to normal infarcts, patients with large transmural infarcts compared with normal subjects). The AUC will in all cases be high, but does not tell about the diagnostic power over the whole range of measurements. However, ROC methods can still be used to compare  methods, to see if one performs better in the selected population. It should be mephasised, however, that Differences between AUC should be significant, in order to conclude that a method is better than another, mere differences in AUC is not sufficient. Thus AUC for differnt methods should be given with 95% confidence intervals, in order to assess this, and only methods where 95%CI does not overlap are truly different in terms of diagnostic power (discriminatory ability).

The AUC is the dependent on:


But the diagnostic accuracy is not the only point of this:
 

Positive and negative predictive value


The positive predictive value of a test is the probability of having the condition with a positive test: PPV = Tp / Tp+Fp
The negative predictive value is the probability of being well with a negative test: NPV = Tn / Tn+Fn

As can be seen, this is dependent on the pretest probability of the condition, i.e. the prevalence of the condition in the population tested.



For a given test with 90% sensitivity and specificity (good values), and prevalence of 1 in 10:


Present
Absent
Sum
Positive
9
9
18
Negative
1
81
82
Sum
10
90
100


The PPV is only 50%, NPV is 99%. The relation between pre and post test probability of disease is accurately  described by Baye's theorem.

The same test, applied to 1000 subjects with a prevalence of one in 100, will result in 9 true positives, 99 false positives, 1 false negative and 892 true negatives. This results in a PPV of  8%, a negative predictive value of  0,1% (pretest 1%). Thus the utility of a test varies in different settings.

Regression towards the mean

Regression towards the mean is important both in clinical studies of intervention or in monitoring therapy and in the clinic, as it may be the cause of apparent changes in measurements, often in the desired direction. The basic mechanism is that any measurement has a certain variability. This means that any measurement will group around the true value, which coincides with the mean if the variation is random. In that case, the measurements will be normally distributed. This distribution also represents the probability distribution of any single measurement.



If patients are selected on the basis of a certain measurement being below or above a cut off limit,measurements at one of the extreme ends of the variability distribution are over represented. When the patient then is re examined, for instance after an intervention, the next measurement has the same probability distribution, i.e. the the probability of a measurement closer to the mean is high. Thus, it is the fact that selection and outcome measurement is the same that results in a skewed distribution in the first measurement, which is normalised in the second measurement that leads to the apparent changes. This is typically a problem in those situations:
Regression towards the mean is not a problem if patient selection is by completely independent criteria. In intervention studies, the inclusion of a control group should eliminate this problem, as both treatment and control groups should show the same  regression to the mean if the groups are randomised. (This is probably much of the effect described as "the positive effect of participating in the studies" and the "placebo effect" as well.


The importance of post processing

Finally, the reported reproducibility should be considered against the amount of post processing used in the study to achieve this. In research, extensive post processing is often used to get more reliable data, and this is often not feasible in the clinic. However, the reported reproducibility is only valid if the same method for post processing is applied, and thus, reproducibility data from studies may have limited validity in the clinic. Thus careful attention should be given to the method and amount of post processing, as well as percentage of exclusion of patients for poor image quality in the studies as this is pertinent for the routine clinical use.

Post processing is experience dependent. The method in inexperienced hands (especially manual analysis) may result in artefacts that will reduce both sensitivity and specificity. Post processing gives results in the hands of one who knows what to expect, and almost all studies so far are done manually, with post processing while looking at the motion. Thus the studies are about the added value of  strain rate imaging. The post processing may even be biased, due to the high number of artefacts, the curves that corresponds best to the visual impression may be manipulated by integrating just the right amount of artefacts. An example of this can be seen here.  Velocity imaging is also vulnerable to biased post processing, but less vulnerable to artefacts, and thus more robust.

As discussed in the section of "how to use strain rate imaging", only segments with little artefacts should be included in analysis. This pertains to clinical studies, and studies reporting a very high feasibility in terms of analysable segments may be viewed as having been susceptible to artefacts.

For clinical use, the main utility of strain rate imaging may still be about adding information to the 2D image about motion, synchronisity and the presence, extent and degree of hypokinesia. In applying manual analysis, a rigid attitude should be maintained in discarding segments with poor quality data. Strain rate imaging is not about compensating for poor echogeneity, as poor data are poor data whatever method.

Strain rate imaging is basically about imaging of regional dysfunction. Thus, it is to be expected that the main use of this application is in the field of ischemic heart disease, the main cause of regional dysfunction. The field of left ventricular resynchronisation in heart failure, opens up the field for using strain rate imaging.

However, the method may also serve to give additional information about heterogeneity of  function in myocardial disease,especially in cardiomyopathies.

Annular velocities and displacement are robust indices of global left ventricular function. However, in myocardial disease resulting in inhomogeneous function, the regional function analysis with strain and strain rate may lead to the method being more sensitive to minimal changes in function compared to annular motion.


The difference between cut off- and normal values.

Even if one knows cut off values from clinical studies, this does not necessarily reflect the true normal range of a variable. A common definition of the normal range, if it is normally distributed, is mean ± 2 SD, (corresponding to 95%) of a healthy population, and in case of non normally distributed parameter, it ca similarly defined as the range from the 2.5 to 95.7 percentile. This is illustrated below.


Fig. Normal range of a variable, defined as mean ± 2SD.


Normal values in a healthy population has now been studied (153).

The main point is that a sick population may have its own distribution, and the cut off value is only the value that gives the optimal separation between the normal and patient population as shown below.


Difference between a normal and patient population. The two populations each have a separate distribution, but  the two distributions are widely separated,
and the cut off point corresponds to the upper normal limit. In this case, there will be no difference between what is normal by any definition.

If the cut off point is close to the normal limit, there will be little difference between the definition of normality by either definition, and the sensitivity, specificity and AUC of the method will be high. However, this can be achieved if the patient population has a high degree of disease, not necessarily reflecting the whole spectrum.

On the other hand, many measurements have a high variability, both due to measurements and biological variability, and the the patient and normal population may overlap to a high degree as shown below.


In this case, the two populations have a higher degree of overlap. The optimal cut off point is the one that defines the best separation,
i.e. the point that gives the highest AUC. However, this point can be seen to be far below the upper normal limit of the healthy population.


Where there is a high degree of overlap, the cut off point will not coincide with the normal limit, and in that case the AUC will be low, and the cut off point does not define abnormality in an unspecified patient.




Normal values for strain and strain rate in a healthy population:

In a recent population study, the north Tröndelag population (HUNT) study, 1266 subjects without known heart disease, hypertension and diabetes were randomly selected from the total study population of 49 827, and subjects with clinically significant findings on echocardiography (a total of only 30) were excluded. (153) This is the largest strain rate echocardiographic population study ever. End systolic strain and peak systolic strain rate was measured by the combined tissue Doppler / speckle tracking segmental strain application of the Norwegian University of Science and Technolgy, but the results were compared to other methods in a subset of subjects, showing small differences.

The poulation had the following characteristics:

The study consisted of  673 women with a mean BP of 127/71 ,mean age of 47,3 years and BMI of 25.8 and 623 men, with mean BP of 133/77, mean age of 50.6 and BMI of 26.5. Both sexes were normally distibuted with an SD of 13.6 and 13.7 years, rspectively. 20% of both sexes were current smokers.

Ordinary echo findings were:

Mean
Female
Male
IVSd (mm)
8.1
9.5
LVIDd (mm)
49
53
LVPWd (mm) 8.2
9.6
FS (%)
36
36
Mitral E (cm/s)
75
66
Dec-T (ms)
218
238
IVRT (ms)
93
103


These findings are in accordance with other studies, like the findings of Schirmer et al (156, 157), so the study population may be assumed to be representative.



The results were as follows:


Female
Male

End systolic strain (%)
Peak systolic strain rate
End systolic strain Peak systolic strain rate
< 40 years
-17.9% (2.1)
-1.09s-1 (0.12)
-16.8% (2.0)
-1.06s-1 (0.13)
40 - 60 years
-17.6% (2.1)
-1.06s-1 (0.13) -18.8% (2.2)
-1.01s-1 (0.12)
> 60 years
-15.9% (2.4)
-0.97s-1 (0.14) -15.5% (2.4)
-0.97s-1 (0.14)
Over all
-17.4% (2.3)
-1.05s-1 (0.13) -15.9% (2.3)
-1.01s-1 (0.13)
 
The customary definition of normal values as mean ± 2SD, giving about 95% of the normal population, results in wider normal limits than previously shown as cut off values in small patient studies.

The findings were quite similar to the findings of Marwick et al (155), in a study with the 2D strain application in 250 healthy volunteers; showing normal mean strain of -18.6% and strain rate of -1.10s-1.

Thus, age and sex specific normal values are available, and as the comparison (153) shows, are comparable to other applications. In addition, the study shows little differnce between different levels (basal / midwall / apical), or between the different walls. Although some differences were statistically significant, the differences were so small as to be clinically insignificant.

The study had a high rejection rate of segments (slightly above 40%). This was partly due to the fact that a kernel tracking poorly excludes two segments as explained elsewhere, but also to the fact that the aim of the study was to cleanse the material of any biases that might arise from artefacts.

In addition, the strain AND strain rate were shown to be normally distributed.

Ischemic heart disease:


It was in the field of ischemic heart disease the method was first tried and validated. Examining patents with myocardial infarction at rest was relatively easy, and the patients had regional dysfunction that could be located from ECG, and seen by grey scale ultrasound). In addition,  ischemic heart disease is the main cause of regional dysfunction. The first feasibility study was about ability to detect myocardial infarctions (4). In this field, the clinical studies preceded the experimental.

Myocardial infarction

In 1999, Garot et al (98) showed a reduction in peak systolic transmural velocity gradient (transmural strain rate) in the infarcted wall in acute myocardial infarction compared to controls; Anterior: 0.0 (0.5) vs 1.1 (0.7) s-1 during systole, P<0·01, inferior (0.9 (0.6) vs 1.8 (1.2) s-1 with compensatory hyperkinesia in the remote wall. The study, however, excluded 22% of the screened patients. The study reports inter observer variations of  0.6 (1.0) (2·4%), 0.8 (0.7) (2.2%), 0.8 (0.3) (2.9%) and 0.9 (0.2) mm s-1 (1.8%) for peak systolic epicardial velocity in the anterior septum, peak systolic endocardial velocity in the anterior septum, peak systolic epicardial velocity in the posterior wall, and peak systolic endocardial velocity in the posterior wall, respectively. As variability of a compound measure is the sum of the variability of the single measurements, this should mean that the relative error in velocity gradient should be in the order of 4 - 5%, but the wide standard errors given indicates wide limits of agreement, although the analysis is not performed in the paper. The study also reports high correlations!
Strain rate imaging was first tested in acute myocardial infarction. Wall motion score by colour strain imaging (see parametric imaging, fig. 20) which showed longitudinal shortening was compared to wall thickening by standard echocardiography (6), with a kappa coefficient of  0.45, which is moderate, but  the study has later been recalculated to weighted kappa of 0.63, which is good. In a second study of myocardial infarction (7), weighted kappa was 0.64, but for repeated measurement, both inter- and intra observer, the weighted kappa was of the same order of magnitude. Both had a sensitivity of about 70%, a specificity of 90%, and an overall accuracy of about 84%. Interestingly, adding the two methods did not change the overall accuracy, indicating that the methods gave the same information. Thus, the parametric method may seem to have the same value as ordinary echocardiography. However, the parametric images was post processed unblinded, an important reservation, as explained above and one might argue that as an add on, the method did not add information. As a study cannot be resolved into individual instances, it might still be argued that in some instances 2D might be best, where strain rate data are noisy, in others strain might be decisive. Peak strain rate was also measured in this study, but with no smoothing the variability was so great that it was concluded that quantitative measurement would be clinically useful compared to wall motion score.
Another study (10) concluded that both strain rate and strain could describe regional dysfunction in infarction, but the variability in this study is about the same as in the others, so the overlap in values between segments with different WMS is too great for the method to be clinically useful in the individual patient, although the numbers needed for significance in those studies was quite low (10 - 25 patients). 
Strain by ultrasound did show a fair correspondence with strain by MR in another clinical validation study (9), showing fair correspondence, with no significant bias, but with limits of agreement about ± 7%, as compared to normal strain of 18% in controls and 15% in remote segments in infarction patients. In this study mean strain in infarct segments was 1-2%, showing that only akinetic segments was considered, and with a repetition coefficient of 7%, hypokinesia may be difficult to separate from normokinesia.
All these early studies were done with no smoothing or other post processing that could increase precision of measurements, so the studies fall mainly into the category of validation and feasibility. In another study (91), strain and strain rate was compared to velocity imaging in identifying infarcted segments. Velocities diagnosed infarcts, but failed to identify the correct segments, as expected due to the tethering effects. A cut off value of -0.8 for strain rate and 13% for strain gave sensitivity and specificity of about 85%, i.e. quite similar to the other studies. In this study both smoothing and cine-compound techniques were used in post processing to reduce noise, but repeatability evaluation was not done. In addition to peak systolic values, inverted isovolumic strain rate, delayed onset of systolic shortening, post systolic shortening and reduced early diastolic lengthening were also described.
A comparative study (40), of ring motion by M-mode and tissue Doppler vs segmental analysis by peak systolic strain rate showed that neither ring velocity nor displacement could identify the infarct site in terms of myocardial sector affected, while segmental analysis by strain rate could. However, this was also only by significance for group data. The interesting point was that mean strain rate of a sector could not identify the infarct site either, although segmental strain rate could, showing that infarct distribution is not limited to  discrete sectors.

Finally, a study comparing segmental velocities to segmental strain rate (41),  concludes that peak systolic strain rate is superior to segmental peak systolic velocities in identifying infarcted segments, against M-SPECT fixed perfusion defects as reference. Sensitivity and specificity for recognition of infarct segments were 91% and 84% for colour SRI, 63% and 73% for colour DTI, 78% and 71% for B-mode echocardiography (WMS), and 87% and 77% for anatomic M-mode (AMM), respectively. Inter method agreement (kappa) was  0.65 with AMM, 0.52 by WMS and only  0.44 with  DTI.  Repeatability with colour DTI was 0.85. Colour analysis was considered feasible in 100% of segments. The results are similar to previous studies. In quantitative analysis, peak SR was measurable in 84% of segments, while peak segmental velocity was feasible in 91%. Peak SRs correlated with wall-motion assessment by B-mode echocardiography better than peak velocities (R = .66 vs.10), with less overlap between groups, but still the study showed overlap between peak systolic strain rate in segments grouped by grey scale WMS. The variation (SD of  differences) were reported as 6 - 10% or 0.04 to 0.06s-1. This corresponds to a repetition coefficient of  0.10s-1, which is quite acceptable. This study was done by averaging measured values from three cycles.

In conclusion, parametric strain rate imaging seems to have a sensitivity and specificity comparable to grey scale imaging (about 85%), both in locating infarct segments and in semi quantitative analysis of wall motion. Repeatability also seems to be on the same magnitude. In quantitative analysis, careful post processing may give a sufficient precision for clinical work, but this is still a question, as the results are diverse.

The presence of post systolic shortening (PSS) in acute myocardial infarction was observed by Jamal et al . (91) and might represent another diagnostic criterion. This was addressed in a longitudinal study (92) showing the presence of post systolic shortening in 60% of infarct segments (73% of mid infarct segments, but in all patients), 29% of the border zone segments and 5% of presumed non infarct segments.  The finding that the area of  PSS exceeds the area of hypokinesia was also observed in a study of 3D parametric imaging of myocardial infarction (22). PSS disappeared in virtually all border segments in one week, and half the infarct segments after 3 months. Thus PSS has neither the sensitivity nor the specificity of identifying infarcted segments, and the presumed ischemic border one also does show PSS, but it may be important in identifying acute ischemia, and in identifying infarct segments in combination with peak strain rate /strain. Post systolic shortening has been shown to be present in 30% of normal segments, but in those cases always in combination with normal systolic strain (97). The best cut off between normal and pathological PSS was considered post systolic strain > 2,5% absolute or 2=% of total strain. In patients with acute ischemia, PSS was present in 78% of ischemic segments and 40% of non ischemic segments, in scarred segments the percentages was about the same. The last finding contrasts with another study, where PSS was reduced both in magnitude and extent from the acute (1 day) to the chronic (3 months) phase of myocardial infarction (92).

Diastolic function is also reduced in myocardial infarction. The study by Garot et al (98) of myocardial velocity gradient (transmural strain rate) showed significantly reduced early diastolic strain rate in acute myocardial infarction. The same finding was reported in the study by Jamal et al (91). In the longitudinal study By Ingul et al (92), the diastolic function was found to be less reduced in the acute phase, but with less improvement during three months. Diastolic function as a diagnostic criterion, is rarely addressed in terms of mean differences vs. repeatability, sensitivity or specificity. Repeatability is rarely done in diastolic measurements, although a similar repeatability as in systole may, with some reservations be inferred. Diagnostic value would have to be studied in comparison to other patients with reduced diastolic function.

Viability

Determination of viability is important. Sicari et al (134) showed the presence of viability after myocardial infarction to be a stronger risk factor for subsequent events than ischemia, presumably in the absence of revascularisation. A number of studies (135, 136, 137) have shown improved prognosis of revascularisation in the presence of viability, both compared to medical treatment in the presence of viability and revascularisation in the absence of viability. The impact on prognosis seems to be best in the setting of heart failure (138). Finally, the meta analysis of Allmann (139) seems to confirm that there is no benefit of revascularisation in the abscence of viability.

Thallium uptake with a traditionmal cut off of 60% activity seems to be too sensitive to the presence of resudual myocardial viability, detecting a too low amount of viable myocytes embedded in fibrous tissue. This results in a low specificity in predicting segmental functional outcome after revascularisation (140, 141). Wall thickness < 5 mm alone seem to have a high negative predictive value for functional recovery after revascularisation. Demonstration of residual contractility may be improved by strain rate imaging, but the real issue in functional recovery is contractile reserve, which has to be addressed with low dose dobutamine.

Low dose dobutamine stress echo (LDDE) demonstrates contractile reserve in akinetic myocardium, and has a high positive predictive value for functional improvement after revascularisation (140). This has been shown to be comparable to the predictive value of PET (142), and in crease in EF during LDDE was the best predictor of increase in EF post revascularistion. As compared to both endocardial excursion and segmental velocities, strain rate imaging is tethering independent. Hoffmann et al (143) has demonstrated that contractility increase with low dose dobutamine can be visualised by colour SRI, as well as measured by peak systolic strain rate. The study was a comparison with FDG PET as reference. The accuracy of 2D assessment of viability, compared with PET was 66%. Feasibility was 92% of segments, for both TDI and SRI. Tissue velocities showed an accuracy of 66%, AUC of  0.63 with an optimal cut of velocity of 1.05 cm/s giving a sensitivity of 69 and specificity of 64%. Strain rate imaging did show an accuracy of 83%, an AUC of  0.89 and with a cut of for peak systolic SR the sensitivity was 83 and specificity 84%. Finally, a study addressing the diagnostic value of low dose dobutamine SRI echocardiography against the true end point of functional recovery is recently published by Hanekom et al (144). The end point was segmental recovery 9 months after revascularisation assessed by 2D echocardiography. Feasibility was 95% of segments by SRI. WMS by 2D echo had a sensitivity of 73 and a specificity of 77%. Peak systolic strain rate had an AUC of 0.844, with a cut off of -0.7 the sensitivity was 78% and specificity 77%, end systolic strain had an AUC of 0.839, a cut off of -10% and a sensitivity of 75%/specificity of 76%. Increase in SR and strain with dobutamine showed similar results, and no results were significantly diffrent against ecah other or WMS. Logistic regression, however, showed that the combined information from WMS and SRI gave significantly better accuracy. Thus, this study demonstretes that SRI is equal to, if not better than WMS in predicting functional recovery. The problem with this study is that as 2D WMS was the reference method after revascularisation, this would tend to favor the 2D method, at least in the artea of specificity.


Acute ischemia:

The acute phase of myocardial infarction may be considered as an acute ischemic event. However, several studies has addressed the presence of acute ischemia in other settings. Kukulski et al (99) did a study during PCI, demonstrating a reduction in peak systolic velocities, strain rate and strain in both longitudinal  (LAD occlusion) and transmural (RCA/CX occlusion) direction. SR and strain had the highest sensitivity / specificity (75% / 80% and 80%, respectively) compared to 68% / 65% for velocity in identifying ischemia. In ROC analysis, the AUC was 0.62 for reduction in systolic velocities, 0.84 for strain rate and 0.82 for strain. In post systole, the AUC was 0.67, 0.80 and 0.85, respectively, for increase in post systolic velocity or strain rate /strain, demonstrating the diagnostic value of post systolic shortening in ischemia. This study also showed the reversal of post systolic shortening after reperfusion. The main implications of the study is that it demonstrates the difference in sensitivity of deformation vs. motion imaging, due to tethering effects. On the other hand reproducibility data are not given. As strain rate is more noisy than velocity, the repetition coefficient may well be substantially higher, and the clinical value similar. The other main point of the study is that the presence of post systolic shortening  is established as an important marker of acute ischemia in a clinical setting being present after very few seconds. As the duration of ischemia is short during PCI, however, the reversibility may not be the same after prolonged ischemia (stunning) or myocardial infarction (92, 97). The clinical setting was such that it has more important bearing on the method and the pathophysiology than the actual clinical utility. This is further developed in another paper by the same group (100), where the post systolic strain index is defined as PSI = (peak  systolic strain - end systolic strain) / peak systolic strain. As ischemia is shown to induce reduction in systolic strain as well as increase in post systolic strain, the combined index was shown to be more sensitive, AUC of 0.95 with a cut off value of 0.25 giving a sensitivity and specificity of 89%, as compared to 0.84 for end systolic strain alone (cut off -10%, sensitivity/specificity 86/83%). Repeatability is not given.

The setting of an angiography laboratory differs considerably from a clinical setting, so the value of these studies is mainly about the feasibility of detecting acute ischemia by SRI and the pathophysiology of acute ischemia as seen by SRI. The results, however, may give an indication of what to look for in the setting of acute ischemia during stress.

Stress echocardiography:



Typical dobutamine stress echo. 4 chamber recordings from baseline, low dose (10 ug/kg/min) showing contractility increase without HR increase, intermediate dose (20 ug/kg/min) showing a hint of asynchrony in the apex and peak dose (30 ug/kg/min, the stress test terminated because of evident ischemia) showing substantially hypokinesia in the apex.

The interpretation of stress echocardiography is dependent on the subjective assessment of wall thickening (eventually substituted by wall motion, meaning endocardial excursion, but this may be less specific for preserved function as segments may move by tethering). This is subjective, and provides only semi quantitative data. It has been shown to be extremely experience dependent, as trained echocardiographers with no specific training in stress echo has only a sensitivity of 65%, i. e. no better than exercise ECG, while expert stress echocardiographers has about 85% to 90%, comparable to myocardial SPECT perfusion imaging (101). Furthermore, it has also been shown that visual assessment has poor temporal resolution, (usually about 100 ms, with training down to 80 ms), and therefore has limited ability to detect more subtle changes in myocardial function (102), although this can be compensated by increased frame rate and lower replay rate, a point not raised in the study. Inter institutional reproducibility has been shown to be low, a study from 1996 (103) did show a kappa coefficient of 0.37, sensitivity of 76, specificity 87%. Introducing second harmonic imaging increased the reproducibility to 0.69 in intra institution agreement (104) and 0.55 inter institution (105). The sensitivity was 92%, substantially better than the study from 1996, but still at the same level as reported in other studies (101). Fundamental imaging, however, did show a decrease in sensitivity compared to 1996. This illustrates a general principle, whenever a new method becomes available, the accuracy of older methods decreases. However, without fundamental imaging, more patients may be classified as non-echogenic, indicating that with harmonic imaging more patients became eligible for stress echo at sufficient diagnostic accuracy.

Myocardial velocities

Still, the method remains experience dependent and semi quantitative. Tissue Doppler has the promise of increased temporal resolution as well as quantitative and objective measurement. Peak systolic velocity is a robust measurement, as well as closely related to contractility. Peak segmental systolic velocity during DSE was shown to be reduced in segments with reduced wall motion score and segments supplied by a stenosed artery (106). This was further elucidated in a study where patients and normal subjects were compared (107). Feasibility was 92% of segments, normal values were established in the normal group, and cut off was set to give a specificity of 80%. The definition of the normal dobutamine response was set in each segment, derived from normal subjects, patients with a normal 2D dobutamine response and patients with normal coronary angiography. The study measured all feasible segments in the basal and midwall levels.  The sensitivity and specificity of systolic velocities for affected vascular territories was 83 and 72%, vs. 88 and 81% by wall motion scoring. Limits of agreement was 0.2 cm/s for inter observer and concordance 86%. (The illustration in fig 1 of this paper, however, may indicate that the low velocity curve during stress is measured in a reverberation, if so  probably accepted during post processing, as probably being in concordance with visual wall motion.) Analysis was not feasible in the apex, due to the low velocity and poor depth resolution in the near field. Thus, Systolic velocities seem to give comparable results, but not better, than Wall motion scoring. However, the diagnostic accuracy by tissue Doppler was the same by novice interpreters (76%), expert echocardiographers (74%) and slightly lower than expert stress echocardiographers (6%) as compared to wall motion scoring (68, 71 and 88% respectively) (108). Velocities were measured in the middle of each segment. Of the 77 patients investigated, 55 had significant coronary artery disease. Nineteen patients (25%) had 1-vessel disease, 17 (22%) had 2-vessel disease and 19 (25%) had 3-vessel disease. Of all the patients studied, 40 (52%) had disease of the left anterior descending artery; 33 (43%) had involvement of the left circumflex artery, and 37 (48%) had involvement of the right coronary artery. The criteria for a positive test by tissue velocity (one or more segments, how much below below the cut off limit), is not reported, but all twelve midwall and basal segments were analysed.

Another study, the multi centre MYDISE study, reported a similar feasibility but slightly less reproducibility in using the segmental velocities (109), with coefficients of variation of 
11–18% for peak systolic velocity at peak stress in basal, 14–28%  in mid segments and 29–69% in in apical segments. This study also concludes that the apical velocities are too low to give reproducible results. In this feasibility study, 10 normal studies were analysed by nine different observers. Feasibility was reported to be 90% of midwall and basal segments in 92 normal subjects. In the second part of that study (110), the diagnostic value was addressed in 289 patients. Cut off values were established by ROC analysis, in the 92 normal subjects from the previous study, and 48 patients with known coronary artery disease. Sensitivity and specificity was then studied in a prospective study of 149 unselected patients referred for chest pain, with coronary angiography (>50% stenosis) as reference. This group included 59 normal, 36 (24%) patients with single vessel, 27 (18%) with double vessel and 27 (18%) with triple vessel disease.


Peak systolic velocity at peak stress, rather than change in velocity from baseline was the best discriminator of disease, but sensitivity was only 63% - 69% and specificity 60 – 67% for the different vascular regions, which is somewhat lower values than reported by the Brisbane group, and with cut of values of 10 - 12 cm/s in the basal segments. However, when a regression model including age, gender and peak heart rate was applied, sensitivity increased to 80 – 93% and specificity to 80 – 82%. These results imply that not only heart rate, but also age and gender should be taken into account when interpreting stress echo by tissue Doppler.

The differences in cut off values between the two studies can in part be explained by the fact that segmental velocities in Brisbane were measured in mid segment, in MYDISE in the base of the segment. As velocities increase from the apex to the base, this means that normal segmental velocities (and hence, cut off values) will  be higher in the MYDISE study. The difference in sensitivity in the two studies for peak velocity alone, may in part be explained by the number of segments analysed. IN the MYDISE study, only 7 segments were analysed, and it seems that positivity is defined by segmental velocities being reduced only in the specific vascular areas ((LAD: BA and MS; Cx: BL and BP; RCA: BI, MI and BS). If so, the sensitivity may be sub substantially
reduced.

It has previously been shown that the segmental specificity of velocities is low (40). Thus, reducing the number of segments necessary for a positive test, will reduce the specificity. Basically, using velocities in stress echo should be considered a screening for an ischemic response. The actual location of the ischemic areas should hypothetically be shown
better in strain rate and strain.

Frame rates are not reported in either study, but tended to be somewhat lower than what is customary at present (especially in the MYDISE study), this might result in some under sampling, so the cut off values might be higher with higher frame rate. So far no studies has addressed the timing of  motion by tissue velocities as an additional variable. peak velocities does not include the asynchrony induced by the delayed onset and post systolic shortening that is a marker of ischemia. Just looking at the timing of peak velocities if there is a suspicion of asynchrony will often answer this as illustrated in this clinical example from Euroecho 8.

Even though peak velocities show comparable accuracy as wall motion score, at least for detection of ischemia, tethering makes the true location of ischemia difficult. That may be part of the problem of the MYDISE sty as well, analysing only typical segments and considering them positive only for stenoses in the vessels of the vascular territories considered.


the true location of the ischemic areas as well as the reason for asynchrony, however, has to be answered by strain rate / strain as shown below, taken from a stress echo study:
In this study at peak stress, there is delayed motion of the whole of the lateral wall (Cx).  Peak velocities at 6 cm/s is slightly reduced, however, but with no difference between the septum and the lateral wall.


Strain rate shows this to be due to apical dyskinesia in systole, followed by post systolic shortening in the same area, resulting in the motion pattern shown in velocity imaging. IN this case, the presence of ischemia is detected by velocity imaging, by the location by strain rate. 

Strain rate imaging


Feasibility of strain rate imaging was addressed in a study by Davidaviticus et al (111). They found that
95 % of segments were analysable during dobutamine stress. Due to noise problems strain rate imaging was not feasible during treadmill or bicycle stress. The study, however, was small and was limited to healthy individuals. The normal response during dobutamine stress was an increase in velocity, strain rate and strain at low dose dobutamine, a further increase in velocity and strain rate at high dose, when strain showed a plateau. This is intuitive, concordant with an initial increase in contractility and stroke volume at low dose dobutamine, giving increased stroke volume, but with increased heart rate without increased venous return at higher dobutamine levels resulting in a plateau or even diminished stroke volume. Velocity/SR of contraction, however, continues to increase as ejection time shortens. This in opposition to exercise, where increased venous return increases stroke volume even at high heart rate.Kowalski et al (112) extended the testing of SRI to patients with coronary artery disease, 20 patients with chest pain, 16 with positive coronary angiography were examined. Feasibility was over 95% of segments. Both narrow angle and wide angle sector gave similar results. Peak systolic strain rate showed a linear increase from baseline to peak. Ischemic segments (critical stenoses) showed no increase in strain nor strain rate during low and high dose dobutamine.However, they found that some ischemic segments showed normal velocity responses to dobutamine, and suggest that this is due to tethering.  A different explanation can be that isovolumic contraction velocities are mistaken for peak velocity during ejection, as shown in this clinical example. No overall analysis was done for the diagnostic criteria of ischemia out over clinical examples.Their study confirms that SRI may have a clinical potential, but was not designed to determine the ability of SRI to diagnose coronary artery disease.

The clinical value of SRI was addressed in a study by Voigt et al (113). The study included 44 patients and single photon emission computed tomography (SPECT) was used as reference method for ischemia, but with coronary angiography as well. The study reports 100% sensitivity and specificity of  SPECT compared with coronary angiography, somewhat higher than usual, indicating that this material is somewhat selected. Then SPECT is used as the gold standard for ischemia. In general, the sensitivity of SPECT against coronary angiography is around 90%. It can easily be argued that angio does not show ischemia, thus SPECT is a better reference. But that assumes a perfect sensitivity of SPECT. It can easily well be argued that bot SPECT and stress echo has limited sensitivity, and in that case, not all studies will show ischemia by both methods, and coronary angiography will then serve as an external reference that is the same for both methods.

In this study the feasibility was 92% for tissue velocities, and 85% for SRI, which is reasonable in our experience with SRI artefacts.

In non ischemic segments, peak systolic strain rate increased significantly with dobutamine stress, from -1.6 ± 0.6s-1 to 3.4 ± 1.4s-1, while strain during ejection time changed only minimally 17 ± 6% to 16 ± 9%. During dobutamine, 47 myocardial segments in 19 patients developed scintigraphy-proven ischemia. Strain-rate
increase from 1.6 ± 0.8s-1 to 2.1 ± 1.1s-1and strain decreased from 16 ± 7% to 10 ± 8%, both significantly different from non ischemic segments. Post systolic shortening (PSS) was found in all ischemic segments. By ROC analysis, the AUC was 0.57 for peak strain (not surprising as this includes post systolic strain), 0.65 for end systolic strain, 0.74 for peak systolic strain rate, 0.8 for time to end of negative strain rate and 0.9 for the ratio of  post systolic strain to peak strain. The ratio of post-systolic shortening to maximum segmental shortening thus was the best parameter to identify stress-induced ischemia. Furthermore, in qualitative analysis  of parametric strain rate imaging, compared with conventional grey scale readings SRI curved M-mode improved sensitivity/specificity from 81/82% to 86/89%. The statistical significance of this difference, however, is not given in the paper.
 


Parametric imaging in dobutamine stress echo. In this image curved M-modes are drawn along the septum from base (bottom) to apex (top) in the images at the beginning of this paragraph. It can be seen normal contraction in the apex at baseline and low dose, post systolic shortening with delayed onset of relaxation at 20 µg and hypo- to akinesia with post systolic shortening (i.e. tardykinesia) at peak stress. This demonstrates the visual assessment by SRI instead of  wall motion.


In a further paper from the same study (114), giving pretty much the same data, SRI is also compared to tissue velocities and displacement for diagnostic accuracy. Visual wall motion had a sensitivity/specificity of  81/82%, post systolic strain /peak strain had a sensitivity of 81/82% and segmental tissue velocities 74/63%. These numbers, however, refers to sensitivity against SPECT. One might theoretically apply this to a sensitivity of 90%, and end up with a traditional sensitivity of WMS against angiography of  73%, which is definitely lower than in other studies using harmonic imaging (104, 105). Again an instance of new methods leading to a decrease in sensitivity of established methods.

This might be due to several reasons:
The grey scale images are taken with Tissue Doppler data in the background, thus reducing the image quality and frame rate slightly. Recordings may to a certain degree be less optimised for endocardial visibility, as proper alignment are more important for tissue Doppler. Patients are included with less regard to good grey scale image quality as one expects to rely on both grey scale and tissue Doppler information. This, however, will make stress echo available to more patients, but as the general grey scale sensitivity may be expected decline, on has to utilise both 2D and tissue Doppler information.

This is analogous to the effect seen with harmonic imaging, and a parallel effect is described when using contrast stress echo for left ventricular opacification (123), although that study does not give the number of substandard recordings without contrast, nor the impact on sensitivity.

The velocity accuracy in detecting ischemia alone is comparable with the MYDISE study (110). This again illustrates the main principle, as segment velocities are dependent on overall function and adjacent segments, they are not suited to segmental analysis (40). If one only analyses the velocities in the ischemic segments, the sensitivity will be low as well. The overall sensitivity of peak velocity in any segment for presence of ischemia is not given. However, by the same principle, the overall sensitivity in detecting ischemia anywhere is good, if all segments are analysed, as shown by the Brisbane group. I  Thus, peak velocities is a fair screening for ischemia. On the other hand, for locating the area of ischemia, the strain rate indices is probably best.

New December 2005. Two other studies was presented with preliminary results on the ESC congress 2005 (127, 132). The first(127), a cooperative study of Trondheim and Brisbane, with 137 patients reported, using automated analysis, showed a feasibility of 80% of segments, a sensitivity > 80%  and specificity around 90% with SRI, peak systolic strain rate at peak stress was the best parameter, compared to timing and post systolic index, as opposed to the previous study. WMS had significantly lower sensitivity than strain rate.

The second (132), with 170 pts. from the Brisbane group showed a feasibility of  91% of segments at peak stress for SRI. Sensitivity for both WMS and peak systolic strain rate was over 90%, but specificity was significantly higher with SRI (ca 60 vs 35%).

Finally, a large prospective study from Brisbane was presented at the Euroecho9 in December 2005 (133). In this study of 515 patients with an average follow up time of 4.8 years, resting akinesia was predictive of death, ischemia at peak stress defined as new or worsening wall motion abnormality gave incremental predictive value, and both end systolic strain and peak systolic strain rate (mean of all segments) gave independent and incremental information above that, while segmental strain rate/strain did not to the same degree. It was pointed out in the discussion (by Marwick) that this may be that the mean values identified patients with regional ischemia who did not have the capacity of compensating with hyperkinesia in other segments. In that case this may be the incremental information, that is not derived by WMS. Peak systolic strain rate again had better predictive value than strain.

The evidence from several large studies seem to indicate that Strain Rate Imaging is ready for clinical use in dobutamine stress echo. The clinical evidence is about dobutamine stress only. Early experience (111) seem to indicate that it is less feasible during exercise stress due to increase in motion artefacts, although this evidence is limited. . Still, Strain rate has a lot of pitfalls, and they tend to become even more exaggerated with increasing stress. A critical eye should always be applied to data quality before analysis, and all segments with low data quality should be discarded. The parametric imaging, is probably superior to visualise the extent of ischemia (and indeed to see if the curves are credible at all by having a certain extent), as well as for timing, especially tardykinesia. Only one stress echo study so far (113) has addressed the qualitative visual assessment of colour SRI.

Finally, as with all echo measurements, SRI should always be considered a part of the total echo examination.

As a conclusion: I now consider the evidence strong enough to support that SRI is now emerging as a clinical application in dobutamine stress echocardiography, in expert hands as the post processing and evaluation of artefacts is as expertise dependent as wall motion assessment.









Back to section index
Back to Website index

References


Editor: Head of department Contact address: isb-post@medisin.ntnu.no, Updated: XXX