A simple but interesting issue with analyzing high-dimensional data

Peter Landweber, Emanuel Lazar, and Neel Patel are mathematicians. I have never worked with Peter Landweber, but have written papers with Larry and Laura Landweber. Perhaps I can add Peter one day.

Today I want to report on a recent result on the fiber structure of continuous maps.

The paper by Landweber, Lazar, and Patel (LLP) is titled, “On The Fiber Diameter Of Continuous Maps.” Pardon me, but I assume that some of you may not be familiar with the fiber of a map. Fiber has nothing to do with the content of food or diets, for example. Fibers are a basic property of a map.

Their title does not give away any suggestion that their result is relevant to those studying data sets. Indeed even their full abstract only says at the end:

${\dots}$ Applications to data analysis are considered.

I just became aware of their result from reading a recent Math Monthly issue. The paper has a number of interesting results—all with some connection to data analytics. I must add that I had not seen it earlier because of a recent move, and the subsequent lack of getting US mail. Moves are disruptive—Bob Floyd used to tell me that “two moves equal a fire”—and I’ve just moved twice. Oh well.

## The Result

The fiber of a map ${f}$ at ${y}$ is the set of points ${x}$ so that ${f(x)=y}$. The diameter of a fiber is just what you would expect: the maximum distance of the points in the fiber. LLP prove this—they say they have a “surprisingly short proof” and give earlier sources for it at the end of their paper:

Theorem: Let ${f \colon \mathbb{R}^{n} \rightarrow \mathbb{R}^{m}}$ be a continuous function where ${n > m}$. Then for any ${M > 0}$, there exists ${y \in \mathbb{R}^m}$ whose fiber has diameter greater than ${M}$.

## The Proof

The following figure from their paper conveys the essence of the proof in the case ${m = 1}$:

For ${m > 1}$ one might expect a difficult dimension-based agument. However, they leverage whatever difficult reasoning went into the following theorem by Karol Borsuk and Stanislaw Ulam. We have mentioned both of them multiple times on this blog but never this theorem:

Theorem: Let ${f}$ be any continuous function from the ${m}$-sphere to ${\mathbb{R}^m}$. Then there are antipodal points that give the same value, i.e., some ${x}$ on the sphere such that ${f(x) = f(-x)}$.

The proof then simply observes that ${m}$-spheres of radius ${M}$ live inside ${\mathbb{R}^n}$ for any ${n > m}$, and arbitrarily large ${M}$. The antipodal points belong to the same fiber of ${f}$ but are ${2M}$ apart.

## What It Means For Data Scientists

One of the main ideas in analytics is to reduce the dimension of a set of data. If we let the data lie in a Euclidean space, say ${\mathbb{R}^{n}}$, then we may wish to map the data down to a space of lower dimension. This yields lots of obvious advantages—the crux is that we can do many computational things on lower-dimensional data that would be too expensive on the original ${n}$-dimensional space.

The LLP result shows that no matter what the mapping is, as long as it is continuous, there must be points that are far apart in the original space and yet get map to the exactly same point in the lower space. This is somewhat annoying: clearly it means there will always be points that the map does not classify correctly.

## Open Problems

One of the issues I think raised by this work on LLP is that within areas like big-data people can work on it from many angles. I think that we do not always see results from another area as related to our work. I believe that many people in analytics are probably surprised by this result, and I would guess that they may have not known about the result previously. This phenomenon seems to be getting worse as more researchers work on similar areas, but come at the problems with different viewpoints.

Can we do a better job at linking different areas of research? Finally, with respect, this seems like a result that could have been proved decades ago? Perhaps one of the great consequences of new areas like big data is to raise questions that were not thought about previously.

[fixed typo R^m, corrected picture of Landweber, added note on sources for main theorem]

August 14, 2016 12:52 pm

This result is in contrast to (but not inconsistent with) the Johnson-Lindenstrauss Lemma. https://en.wikipedia.org/wiki/Johnson%E2%80%93Lindenstrauss_lemma

August 14, 2016 4:32 pm

Minor typo in theorem statement: should be y \in R^m

• August 14, 2016 9:04 pm

Thanks—made look of R consistent too.

August 14, 2016 11:20 pm

My understanding of DR is that the prerequisite is that the data is really a low-dimensional manifold stuck in a higher-dimensional one, so a global mapping isn’t needed. The parts of fibers outside the data are not an issue.

• August 15, 2016 8:45 am

In practice though, that would mean that one would have to prove that for a given DR mapping, the inputs are not ‘falling into’ this issue. I’m not sure such a proof is possible or easy for an arbitrary mapping.

4. August 15, 2016 2:38 am

I think it would be fair to say that such applications of Borsuk-Ulam have appeared many times before. This question would not be out of place on a mathematics PhD general exam. As one example from the TCS community, see Theorem 6.2 in this SODA paper: http://repository.cmu.edu/cgi/viewcontent.cgi?article=1840&context=compsci

• August 15, 2016 5:57 am

Upon further reflection, “general exam” is not a fair assessment.

August 15, 2016 9:55 am

I don’t see how this would be surprising or problematic for data scientists in any way. As I understand there are two main reasons to map data from a high dimensional space to a low dimensional space. Either you want to identify certain features which are distingushed in the high dimensional space, e.g., you want to map from the space of words used in a web page (with each word being a dimension with a page being assigned the value of #of times word used/#total words on page for that dimension) to a space of topics, or there is a hidden regularity in the high dimensional data which restricts all the points to some lower dimensional manifold in the high dimensional space.

In the first case the identification of points that are far apart in the high dimensional space is exactly what is desired. Two pages which use very different words could have exactly the same topics (or degree of relevance to a list of topics). The whole point of reducing the dimension is to ignore certain kinds of differences judged to be irrelevant for the application.

In the second case the result is irrelevant as one isn’t interested in the behavior of the continuous function on the entire high dimensional space, only on some limited sub-manifold. Indeed, the function need not even be defined on the entire high dimensional space (though in practical applications one would probably want to handle imperfect data by mapping points not on the sub-manifold to the closest point that satisfies).

—-

Indeed, it seems like it would be quite strange if this result wasn’t true. It would basically be saying that, ignoring balls of sufficiently small radius, the low dimensional space could capture behavior in the high dimensional space.

Maybe I’m just unaware and people do/did think this was possible.

• August 16, 2016 6:24 pm

If f was a quasi-continuous function, then could the low dimensional quasi-continuum space completely capture its behavior in the high dimensional one?