This page explains what the EOF is and how the customer can use them. This page describs a part of the data analysis services we offer at CRI. Please click "Data Analysis" button above to see other types of data analysis we offer.
We prepared explanatory pages with some examples for underlined words in blue. If you want to see those pages, please click underlined words in blue below.
What is EOF analysis?
Why should I bother computing EOF?
Since these new sets of time series data are uncorrelated each other, it is probably reasonable to assume that each set of new time series data represents different phenomenon. Then, we usually try to study each component individually. The magnitudes of eigenvalues show how important each of these new sets of time series data is.
The vertical axis scale at the upper left is for the record at 50m and we moved all other records downward to make comparison easier. The zero levels for each record are shown as horizontal black lines. We downloaded these data from NOAA, U.S.A. (http://www.pmel.noaa.gov/tao) and applied a band-pass filter to remove variations of period longer than 150 days and shorter than 20 days.
This figure suggests that the variations of current at 100m (2nd line from top) are inversely correlated with those at 50m (1st line) while those at 200m (4th line)might be inversely correlated with those at 100m. There are 23 sets (depths) of time series data at this location if we ignore bad data. Studying the relations among these 23 sets of time series data like this way is an overwhelming task to us.
So, we applied EOF to extract coherent variations among these 23 data sets. Figure 2a shows time series plots of two major components extracted from original time series data. Let us call these components mode 1 (blue line) and mode 2 (red line) following oceanographic tradition. Although it looks like they are somewhat correlated, the correlation coefficient between these two time series data is actually zero (yes, we actually computed).
Figure 2b shows how much of the variations included in the entire original time series data sets(all 23 of them) is included in each of these components. For this purpose we used statistical quantity called variance which is often considered to be related to "power" or "energy" of variations. The vertical axis of this figure is the percent of variance and the horizontal axis is the mode number (there are 23 modes). The mode 1 contains 53.5% and the mode 2 contains 14.3% of variance exists in the entire original time series data sets. This information is what we can obtain from the eigenvalues. The variance contained in each mode decreases progressively as the mode number increases. Thus, we can concentrate our analysis efforts on a few lower modes. Even if we ignore many other higher modes we would not lose that much information.
In this case if we analyzed mode 1 and mode 2, then we have practically analyzed 67.8% of variations included in the original 23 sets of time series data. Our efforts/costs to analyze just only two sets of time series data would be considerably lower than our efforts/costs to analyze original 23 sets of time series data. This is why EOF is a useful tool.
Figure 2c, the eigenvectors of mode 1 and mode 2, shows how the amplitude of variations shown in Figure 2a varies at different depths. The mode 1 is positive at 50m, negative at 100m and positive at 200m. This means that the variations at 100m look like mirror images of those at 50m and at 200m except that their amplitudes are different. This pattern matches our previous description of Figure 1.
Other than these qualitative observation, we now know how the amplitude of variation shown in Figure 2a varies at different depths quantitatively. The information like this might help us to identify the cause of these variations. Even if we do not need to know the cause of these variations, knowing the amplitude of variations at different depths quantitatively rather than qualitatively would be probably nice.
So, what is the mode 1 anyway, which includes 53.5% of variance included in the original time series data? Figure 3a shows time series plot of mode 1 (blue line) and east-west wind speed (red line) at 147.5E on the equator obtained from NOAA, U.S.A. (http://www.cdc.noaa.gov/cdc/reanalysis).
This figure suggests that mode 1 variations of ocean current at this location are related to the wind speed variations. The variations of ocean current lag behind those of wind. The value of the correlation coefficient reaches its maximum, 0.68, when we shift wind data to the right by 9 days. We computed 95% confidence interval assuming that effective sampling interval is 10 days since the cut-off period of a band-pass filter we applied is 20 days, and this correlation is statistically significant (meaningful).
From figure 2c wind influence is such that the ocean current near the surface and at depths below 160m is accelerated in the down-wind direction but it is accelerated against wind direction at depths between about 80 and 160m.
We will stop our analysis at this point since this is not a scientific paper, but we would like to mention that we published results of more detailed analysis applied to the older data at the same location in a scientific journal.
1-3 Some cautions of using EOF
(1) This type of EOF cannot deal with phase or time lag among time series data sets.
Figure 4a shows the data sets for the second experiment. The first time series data (top) is the one at 50m. The second time series data is the same as the first one except that we shifted it backward (to the right on a time series plot) by 5-days. We shifted the third one by another 5-days, totally 10-days. We generated the fourth and the fifth time series data in a similar manner.
Figure 4b shows time series plots of mode 1 (blue line) and mode 2 (red line). Figure 4c shows eigenvalues of this experiment. Mode 1 contains only about 60% and mode 2 contains about 34% of variance of the input data sets. This result indicates that mode 2 is no longer negligible.
Figure 4d shows eigenvectors of mode 1 and mode 2. Mode 1 has amplitude variations at different "depths" (not a straight vertical line) although all the time series data are exactly the same except for the time shift among them.
You might have noticed that the time series plots of mode 1 and mode 2 (Figure 4b) look suspiciously similar. The correlation coefficient between them is zero but it becomes 0.81 if we shift mode 2 time series data to the left by 13.3. The zero-correlation among time series data sets generated by EOF is guaranteed only if we do not shift resultant time series data at all.
To avoid result like this we have to adjust data sets before computing EOF if the variations in oroginal time series data sets have time lag among them. Alternatively we might use time domain complex EOF or frequency domain EOF if we do not know how much we need to shift data. We will describe these methods later in this page.
(2) Variables represented by eigenvectors and eigenvalues are supposed to be constant in time
Then, what we can do to deal with this kind of problem is creating a time series of eigenvectors in the following manner. First, we picked up initial 91.25-day (1/4-year) long segment of data and then applied EOF to that segment of data. Next, we picked up another 91.25-day long segment of data starting from 91.25/2=45.625 day (0 day is the start date of entire data) and then applied EOF to this new segment of data. We repeated this procedure until we reached the end of the data. Figure 5, the result of these computations, shows how the eigenvector of mode 1 changes in time. This figure shows that mode 1 eigenvector has a two-layer structure (negative near the surface and positive below) at the beginning of the data. Then, it changes to a three-layer structure (positive near the surface, negative below to about 180m and positive again below) in the early 2003 and this three-layer structure continues to exist till the end of data. Thus, in our example (Figure 2 and 3), we picked up data starting from February 2003 but discarded data prior to that month.
(3) We might need to do some pre-processings before computing EOF
In case of our example we know that there are strong tidal signals in our data. We know also that there are variations of periods of half a year and one year. We are not interested in these variations. Also, we have an idea at which frequencies wind has strong influences on ocean currents through coherency analysis. Thus, we applied a band-pass filter to our data before computing EOF based on this prior knowledge. We used a combination of classical numerical filters called Butterworth filter but there are many other numerical filters. It is also possible to use more modern methods such as wavelet analysis.
Another important point here is that a specific external factor might have influences to our data by several different mechanisms via different routes. The responses caused by the same external factor but by different mechanisms might have different characteristics. For example, certain mechanisms might dump variations of shorter period while others might dump variations of longer period. It might become difficult to interpret time series data sets produced by EOF as a result of that. Wind affects ocean currents on the equator in several different ways in our example. We have a theoretical reason to believe that the mechanisms by which wind affects to ocean current near the surface where eigenvector is positive and to ocean current at mid-depths where eigenvector is negative are different. One of the methods we can try in case like this is removing some data sets before computing EOF. So, we re-calculate EOF using data at depths only between 40m and 80m (5 time series data sets). Here, we might say we "filtered" our data based on depth (location). Figure 3b, the result of this re-calculation, shows shorter period variations such as "dual-bump" features more clearly than Figure 3a does.
If we mix time series data sets with different units (units of speed, temperature and such), we usually need to adjust their amplitude unless we use a correlation matrix to compute EOF. This process is called weighting and multiplying certain constant to each of these time series data sets often does it. We usually remove average and often remove trend from each time series data set before computing EOF. Using a correlation matrix to compute EOF is equivalent to adjusting amplitude of input data by dividing input data by the square root of variance of them before computing EOF. By doing so all the time series data sets will have equal importance (weight) in EOF computation.
Finally, EOF might not be able to separate variations caused by different factors especially when they are correlated for whatever the reasons. In case of ocean we have daily variations caused by tidal motions. We have another daily variation caused by solar heating during the daytime and radiation cooling during the nighttime near the surface. Periods of these variations are not exactly the same but it is quite possible that we cannot separate effects caused by these two factors by EOF.
(4) The result of EOF might be meaningless.
|Estimations are free. For more information,
please send a mail
|The correlation coefficient has a value ranging -1 to 1. If the value is -1, two variables are perfectly correlated but in the opposite direction(like a mirror image). If the value is zero, they are not correlated at all. If the value is one, they are perfectly correlated and they vary in the same direction.|
|Click below for more