A Surprise For Big-Data Analytics
A simple but interesting issue with analyzing high-dimensional data
Peter Landweber, Emanuel Lazar, and Neel Patel are mathematicians. I have never worked with Peter Landweber, but have written papers with Larry and Laura Landweber. Perhaps I can add Peter one day.
Today I want to report on a recent result on the fiber structure of continuous maps.
The paper by Landweber, Lazar, and Patel (LLP) is titled, “On The Fiber Diameter Of Continuous Maps.” Pardon me, but I assume that some of you may not be familiar with the fiber of a map. Fiber has nothing to do with the content of food or diets, for example. Fibers are a basic property of a map.
Their title does not give away any suggestion that their result is relevant to those studying data sets. Indeed even their full abstract only says at the end:
Applications to data analysis are considered.
I just became aware of their result from reading a recent Math Monthly issue. The paper has a number of interesting results—all with some connection to data analytics. I must add that I had not seen it earlier because of a recent move, and the subsequent lack of getting US mail. Moves are disruptive—Bob Floyd used to tell me that “two moves equal a fire”—and I’ve just moved twice. Oh well.
The fiber of a map at is the set of points so that . The diameter of a fiber is just what you would expect: the maximum distance of the points in the fiber. LLP prove this—they say they have a “surprisingly short proof” and give earlier sources for it at the end of their paper:
Theorem: Let be a continuous function where . Then for any , there exists whose fiber has diameter greater than .
The following figure from their paper conveys the essence of the proof in the case :
For one might expect a difficult dimension-based agument. However, they leverage whatever difficult reasoning went into the following theorem by Karol Borsuk and Stanislaw Ulam. We have mentioned both of them multiple times on this blog but never this theorem:
Theorem: Let be any continuous function from the -sphere to . Then there are antipodal points that give the same value, i.e., some on the sphere such that .
The proof then simply observes that -spheres of radius live inside for any , and arbitrarily large . The antipodal points belong to the same fiber of but are apart.
What It Means For Data Scientists
Why should we care about this theorem? That’s a good question.
One of the main ideas in analytics is to reduce the dimension of a set of data. If we let the data lie in a Euclidean space, say , then we may wish to map the data down to a space of lower dimension. This yields lots of obvious advantages—the crux is that we can do many computational things on lower-dimensional data that would be too expensive on the original -dimensional space.
The LLP result shows that no matter what the mapping is, as long as it is continuous, there must be points that are far apart in the original space and yet get map to the exactly same point in the lower space. This is somewhat annoying: clearly it means there will always be points that the map does not classify correctly.
One of the issues I think raised by this work on LLP is that within areas like big-data people can work on it from many angles. I think that we do not always see results from another area as related to our work. I believe that many people in analytics are probably surprised by this result, and I would guess that they may have not known about the result previously. This phenomenon seems to be getting worse as more researchers work on similar areas, but come at the problems with different viewpoints.
Can we do a better job at linking different areas of research? Finally, with respect, this seems like a result that could have been proved decades ago? Perhaps one of the great consequences of new areas like big data is to raise questions that were not thought about previously.
[fixed typo R^m, corrected picture of Landweber, added note on sources for main theorem]