We work with a Riemannian manifold \((M, g)\) with \(\nabla\) denoting the Levi-Civita connection.

Vector fields will be denoted \(X, Y, Z, W, \dots\). Most to the time we will abuse notation and write \(X, Y, Z, W, \dots\) for tangent vectors. We do this because many tensors, such as the curvature tensor are defined for vector fields, are tensorial so at a fixed point only depend on the value of the vector fields in question at that fixed point.

In practice what this usually means is that tensorial expression such as \(\operatorname{Rm}(X, Y) Z\), is a priori only defined on vector fields (by differentiating them which makes no sense for vectors) but is tensorial hence it's value at a point \(x\) depends only on \(X, Y, Z\) at the point \(x\). Thus if we extend tangent vectors \(X, Y, Z \in T_x M\) to vector fields defined in a neighbourhood of \(x\), the value of \(\operatorname{Rm}(X, Y) Z\) at \(x\) will be independent of the extension. Rather than continually saying "for an extension", we will just assume whenever necessary that such an extension has been made and denote it by \(X, Y, Z\). At times there will be exceptions to this abuse and they will be clearly stated.

One forms will be written \(\alpha, \beta, \dots\) and the same abuse of not specifying whether we are working at a point or with a field as with vectors will be followed.

For iterated derivatives, we will write \(\nabla_X \nabla_Y Z\) and at times if clarity seems particularly important, as \(\nabla_X (\nabla_Y Z)\). Note that many authors will write \(\nabla_X \nabla Y Z\) to mean the second covaraint derivative of \(Z\) in directions \(X, Y\). We will not do this and denote the second covariant derivative as \(\nabla^2_{X, Y} Z\) or \(\nabla^2 Z (X, Y)\). The two notions are related by \[ \nabla^2 Z (X, Y) = \nabla_X (\nabla_Y Z) - \nabla Z (\nabla_X Y). \] More details are given in Second Derivative.

In particular, when working in coordinates, or more generally with respect to a local frame, often you will see \(\nabla_i \nabla_j Z\) to mean \(\nabla^2 Z (\partial_i, \partial_j)\). For the most part, we will not work with respect to any local frames, but if the need arises we will write \(\nabla^2_{ij} Z\) for \(\nabla^2 Z(\partial_i, \partial_j)\).

Let \(V\) be a finite dimensional, real vector space. A tensor of type \((p, q)\) is a multi-linear map \[ T : \underbrace{V \times \cdots V}_{p \text{ times}} \times \underbrace{V^{\ast} \times \cdots V^{\ast}}_{q \text{ times}} \to \mathbb{R}. \] The space of such tensors is written \[ T^p_q V = \bigotimes^p V^{\ast} \otimes \bigotimes^q V. \]

In more general situations such as infinite dimensional vector spaces or modules over a commutative ring, the definition above is unsatisfactory. In general, the defining property of the tensor product of vectors spaces \(V, W\) is the universal property. That is, given vector spaces \(V, W\) there is exists a vector space, denoted \(V \otimes W\) and a binilear map \(\pi : V \times W \to V \otimes W\) such that we have the universal diagram for bilinear maps:

\begin{array}{ccc} V \times W & \xrightarrow{\pi} & V \otimes W \\ \searrow & \downarrow \exists ! & \\ & Z & \end{array}

In words, given any bilinear map \(f : V \times W \to Z\), there exists a unique linear map \(\bar{f} : V \otimes W \to Z\) such that \(\bar{f} \circ \pi = f\).

This notion generalises readily to the \(q\) copies of \(V\) and \(p\) copies of \(W\). In particular, with \(W = V^{\ast}\) we obtain the tensors \(T^p_q V\).

For finite dimensional vector spaces however, the universal property is satisfied by our definition and hence our definition agrees with the more general definition in the case of finite dimensional vector spaces.

We won't worry about the universal property much here except to note a couple of points:

  1. To define \(\pi : V^{\ast} \times V^{\ast} \to V^{\ast} \otimes V^{\ast}\) we need to define \(\alpha^1 \otimes \alpha^2 = \pi(\alpha_1, \alpha_2)\) as a bilinear map \(V \times V \to \mathbb{R}\). For this we define \[ \alpha^1 \otimes \alpha^2 (X_1, X_2) = \alpha^1(X_1) \alpha^2(X_2). \] Then we note that the bilinear maps of the form \(\alpha^1 \otimes \alpha^2\) generate \(V^{\ast} \otimes V^{\ast}\) as a vector space - they aren't a basis however - there are too many of them and they are not linearly independent. To get a basis, choose a basis \(\{e_i\}_{i=1}^n\) for \(V\) and let \(\{\theta^j\}\) denote the dual basis defined by \(\theta^j(e_i) = \delta^i_j\). Then for an arbitrary bilinear form \(B\) we may write uniquely \[ B = \sum_{i,j=1}^n B_{ij} \theta^i \otimes \theta^j \] with \(B_{ij} = B(e_i, e_j)\).

    Note here that we have used that \(V\) is a finite dimensional vector space! We then obtain that the set of bilinear maps along with the map \(\pi\) satisfies the universal property for \(V^{\ast} \otimes V^{\ast}\).

  2. To define \(\pi : V \times V \to V \otimes V\) we use the isomorphism \(V \simeq (V^{\ast})^{\ast}\) obtained by the map \[ X \in V \mapsto \alpha_X : V^{\ast} \to \mathbb{R} \] with \[ \alpha_X (\beta) = \beta(X). \] That is, given \(X \in V\), we define the element \(\alpha_X \in (V^{\ast})^{\ast}\) and the correspondence \(X \mapsto \alpha_X\) is a vector space isomorphism. Then we may define \(V \otimes V\) as the set of bilinear maps \(V^{\ast} \times V^{\ast} \to \mathbb{R}\) and apply the previous part.

    Note again we used that \(V\) is a finite dimensional vector space!

Note that the expression \(B = B_{ij} \theta^i \otimes \theta^j\) (employing the summation convention) shows that \(\theta^i \otimes \theta^j\) is a basis for \(V^{\ast} \otimes V^{\ast}\). In general, we may write \(B\) non uniquely as a sum of elements of the form \(\alpha \otimes \beta\). For example, \[ (\theta^1 + \theta^2) \otimes \theta^2 = \theta^1 \otimes \theta^3 + \theta^2 \otimes \theta^3 \] expresses the same element as different sums of such elements whereas if we insist on using only the basis elements \(\theta^i \otimes \theta^j\), then we obtain a unique expression. The left hand side just above would then be inadmissible. The situation is similar to that of how fractions do not have a unique representation \(m/n\) in general, but if we insist that \(m, n\) are co-prime then (up to sign), the expression is unique.

If you're interested, define \(\pi : V^{\ast} \times V \to V^{\ast} \otimes V\) and show the universal property is satisfied. It might help to verify the universal properties more explicitly for \(V^{\ast} \otimes V^{\ast}\) and \(V \otimes V\) first.

If you're interested, formulate and verify the appropriate universal property for \(T^p_q V\). Hint: Replace bilinearlity with multilinearity.

Now we have a fundamental operation.

The trace is the unique linear map \(\operatorname{Tr} : V^{\ast} \otimes V \to \mathbb{R}\) determined by \[ (\alpha, X) \in V^{\ast} \times V \mapsto \alpha(X) \in \mathbb{R}. \]

The universal property ensures this map is well defined. We may realise it by writing (non-uniquely!) any tensor \(T \in T^1_1 V\) as a finite sum \[ T = \sum_{i=1}^N \alpha_i \otimes X_i \] and extend the definition by linearity. That this is well defined (i.e. independent of the choice of \(\alpha_i, X_i\)) is guaranteed by the universal property. With respect to this expression for \(T\) we have \[ \operatorname{Tr} (T) = \sum_{i=1}^N \alpha_i(X_i). \]

Alternatively, we may choose a basis \(\{e_i\}\) for \(V\) and write uniquely \[ T = \sum_{ij} T_i^j \theta^i \otimes e_j \] in which case, the definition above gives \(\operatorname{Tr}(\theta_i e^j) = \theta^i(e_j) = \delta^i_j\) and we extend by linearity to \(T\). In this case, the extension is automatically well defined since a linear map is uniquely determined by it's values on a basis. With respect to this expression for \(T\) we have \[ \operatorname{Tr} (T) = \sum_{ij} T_i^j \theta^i(e_j) = \sum_{ij} T_i^j \delta^i_j = \sum_i T^i_i \] That is the trace defined here, when written with respect to basis is the trace of the matrix with respect to that basis.

More generally given a tensor of type \(T^{p+1}_{q+1}\), we may choose any of the upper slots and any lower slot to obtain a trace \[ \operatorname{Tr} : T^{p+1}_{q+1} \to T^p_q. \]

Of course it must be made clear which upper slot is contracted with which lower slot and this will always be explicitly stated. For example, consider a tensor of the form \[ T = \alpha \otimes X \otimes Y. \] Then we have two possible traces \[ \operatorname{Tr}_{12} T = \alpha(X) Y, \quad \operatorname{Tr}_{13} T = \alpha(Y) X. \] Both are linear maps \(T^1_2 V = V^{\ast} \otimes V \otimes V \to T^0_1 V = V\).

Finally, if \(V\) is equipped with an inner-product \(g\), we may use the inner-product to raise and lower indices. That is, the inner-product induces an isomorphism \(V \to V^{\ast}\) by \[ X \in V \mapsto (Y \mapsto g(X, Y)) \in V^{\ast}. \] This isomorphism is often denoted \(\flat : V \to V^{\ast}\) because we lower the \(1\) from the top index to the bottom: \[ \flat : V = T^1_0 V \to V^{\ast} = T^0_1 V. \] It's inverse is denoted \(\sharp : V^{\ast} \to V\) which raises the index. Then we have maps \[ \flat : T^{p+1}_q V \to T^p_{q+1} \] and \[ \sharp : T^p_{q+1} \to T^{p+1}_q \] where as with the traces we must specify which index is being acted upon.

As with the trace example above we then have \[ \flat_2 (\alpha \otimes X \otimes Y) = \alpha \otimes \flat(X) \otimes Y, \quad \flat_3 (\alpha \otimes X \otimes Y) = \alpha \otimes X \otimes \flat(Y) \] and \[ \sharp_1 (\alpha \otimes X \otimes Y) = \sharp_1(\alpha) \otimes X \otimes Y. \]

A particularly important example of this is with bilinear forms. Then we have the Riesz representation \[ B(X, Y) = g(B^{\sharp} (X), Y) \] where \(B^{\sharp}\) is the linear transformation \(V \to V\) representing \(B\). Here we note really \(B^{\sharp} \in T^1_1 V\) but in fact \(T^1_1 V \simeq \operatorname{Hom} (V, V)\), the latter being the vector space of linear transformations \(V \to V\).

Show that the map \[ \alpha \otimes X \in T^1_1 V \mapsto (Y \mapsto \alpha(Y) X) \in \operatorname{Hom}(V, V) \] is an isomorphism. That is, \[ \alpha \otimes X (Y) = \operatorname{Tr}_{13} \alpha \otimes X \otimes Y. \] Hint: you will not be able to show surjectivity directly without using the assumption that \(V\) is finite dimensional. Thus you might either try to show injectivity and conclude surjectivity by looking at dimension, or use a basis.

Verify that for a bilinear form \(B \in T^0_2 V\), we have the relation \[ B(X, Y) = g(B^{\sharp} (X), Y) \] where \(B^{\sharp} = \sharp(B) \in T^1_1 V\) is the musical raising and \(B^{\sharp} (X) = \operatorname{Tr}_{13} \sharp{B} \otimes X\). Hint: you only need to prove this for \(B\) of the form \(\alpha \otimes \beta\) and then apply the universal property to extend by linearity to general \(B\). Make sure to check bilinearity with respect to \(\alpha, \beta\)! Alternatively, choose a basis for \(V\) prove the result for \(B = \theta^i \otimes \theta^j\) once more extending by linearity to general \(B = \sum_{ij} B_{ij} \theta^i \otimes \theta^j\).

Then we define the trace of a bilinear form \(B\) with respect to the inner-product \(g\): \[ \operatorname{Tr}_g (B) = \operatorname{Tr} B^{\sharp}. \] That is, the trace of \(B\) is the trace of the linear transformation \(B^{\sharp}\) representing \(B\). Note that if we used a different inner-product \(k\) to define \(B^{\sharp}\) then in general \(\operatorname{Tr}_g (B) \ne \operatorname{Tr}_k B\). There is no general way to define the trace of a bilinear form; it depends on a choice of inner-product.

Also take caution to note that if \(B\) is symmetric, there are two distinct choices for \(B^{\sharp}\) depending on which index we raise. Mostly for us however, we will only apply this construction to symmetric \(B\) and then it does not matter which index is raised.

Show that if \(B\) is a symmetric bilinear form (i.e. \(B(X, Y) = B(Y, X)\)), then \[ \sharp_1 B = \sharp_2 B. \]

For a manifold \(M\), we have the tangent bundle \(TM\) and cotangent bundle \(T^{\ast} M\) along with tensor bundles \(T^p_q M = \bigotimes^p TM \otimes \bigotimes^q T^{\ast} M\). A tensor field of type \((p, q)\) is a section \(S \in \Gamma(T^p_q M)\). Thus at each point \(x \in M\), we have a multilinear map \[ S_x : \underbrace{T_x^{\ast} M \times \cdots T_x^{\ast} M}_{p \text{ times}} \times \underbrace{T_x M \times \cdots T_x M}_{q \text{ times}} \to \mathbb{R}. \] Then if \(\alpha^1, \ dots, \alpha^p \in \Gamma(T^{\ast} M)\) and \(X_1, \dots, X_q \in \Gamma(TM)\) we have the smooth function \[ x \mapsto S_x (\alpha^1(x), \dots, \alpha^p(x), X_1(x), \dots, X_q(x)) \] which we abbreviate as \(S (\alpha^1, \dots, \alpha^p, X_1, \dots, X_q)\).

By definition, \(S_x\) is \(\mathbb{R}\)-linear for each \(x\). Then in fact \(S\) is \(C^{\infty} (M)\) multilinear by which we mean that if \(f\) is a smooth function then

\begin{align*} S (f \alpha^1, \dots, \alpha^p, X_1, \dots, X_q) (x) &= S_x (f(x) \alpha^1(x), \dots, \alpha^p(x), X_1(x), \dots, X_q(x)) \\ &= f(x) S_x(\alpha^1(x), \dots, \alpha^p(x), X_1(x), \dots, X_q(x)) \\ &= [f S(\alpha^1, \dots, \alpha^p, X_1, \dots, X_q)] (x). \end{align*}

In other words, \[ S (f \alpha^1, \dots, \alpha^p, X_1, \dots, X_q) = f S(\alpha^1, \dots, \alpha^p, X_1, \dots, X_q). \] Similarly if we multiply any slot by \(f\).

An \(\mathbb{R}\)-multilinear map \(\Gamma(T^p_q M) = \Gamma(\bigotimes^p TM \otimes \bigotimes^q T^{\ast}M) \to C^{\infty} (M)\) is tensorial if it is \(C^{\infty} (M)\) linear.

For example, for bilinear forms this means that for every \(f \in C^{\infty}(M)\) and every \(X_1, X_2 \in \Gamma(TM)\) we have \[ B(f X_1, X_2) = B(X_1, f X_2) = f B(X_1, X_2). \]

Then we have that tensorial multilinear maps are in fact determined pointwise. That is, the value of \(B(X_1, X_2)\) at a point \(x\) depends only on the values \(X_1(x), X_2(x)\) of the \(X_1, X_2\) at \(x\). In other words, if \(Y_1, Y_2\) are vector fields such that \(Y_1(x) = X_1(x)\) and \(Y_2(x) = X_2(x)\), then \[ (B(X_1, X_2)) (x) = (B(Y_1, Y_2))(x). \]

As such \(B\) determines a section of \(T^2_0 M = T^{\ast} M \otimes T^{\ast} M\). Here \(B_x \in T_x^{\ast} M \otimes T_x^{\ast} M\) is determined for example by it's values on a basis for \(T_x M\) extended arbitrarily to all of \(TM\). The result is we defined since \(B\) is pointwise hence independent of the extension.

That is, the (infinite dimensional) vector space of tensorial, \(\mathbb{R}\) multilinear maps is precisely equal to the vector space of sections of tensor bundles.

Let \(X, Y, Z\) be vector fields. Define a new vector field by \[ \operatorname{Rm}(X, Y) Z = \nabla_X \nabla_Y Z - \nabla_Y \nabla_X Z - \nabla_{[X, Y]} Z. \]

Notice that \(\nabla_X \nabla_Y Z\) will include the variation of \(Y\) along \(X\) - namely \(\nabla_X Y\). This is undesirable since we want to measure the curvature of the space itself at each point using \(\operatorname{Rm}\), and this should not depend on how any particular vector field varies. Likewise for \(\nabla_Y \nabla_X Z\). The term \(\nabla_{[X, Y]} Z\) compensates precisely for this undesirable effect.

Another way of expressing this compensation is to say that \(\operatorname{Rm}\) is tensorial in \(X, Y\) so that for any smooth function \(f \in C^{\infty} (M)\) we have \[ \operatorname{Rm}(fX, Y) Z = f \operatorname{Rm}(X, Y) Z = \operatorname{Rm}(X, fY) Z. \]

Using the Leibniz rule for the connection \(\nabla\) and the corresponding rule for the Lie bracket, prove the claimed tensorality in \(X, Y\).

As a consequence, although as written, \(\operatorname{Rm}\) is defined for vector fields, tensorality induces a well defined map defined on tangent vectors. As mentioned in Basic setup and conventions, we will typically not differentiate by vector fields and tangent vectors when dealing with tensorial equations. But just this time, let us be very explicit: Let \(X, Y, Z \in T_x M\) be tangent vectors, let \(\bar{X}, \bar{Y}, \bar{Z}\) and \(\tilde{X}, \tilde{Y}, \tilde{Z}\) be vector fields defined on a neighbourhood of \(x\) such that \[ \bar{X} (x) = X, \bar{Y} (x) = Y, \bar{Z} (x) = Z \] \[ \tilde{X} (x) = X, \tilde{Y} (x) = Y, \tilde{Z} (x) = Z. \] Then tensorality implies that \[ \left(\operatorname{Rm}(\bar{X}, \bar{Y}) \bar{Z}\right) (x) = \left(\operatorname{Rm}(\tilde{X}, \tilde{Y}) \tilde{Z}\right) (x). \] Thus we may define unambiguously, \[ \operatorname{Rm}(X, Y) Z = \left(\operatorname{Rm}(\bar{X}, \bar{Y}) \bar{Z}\right) (x) \] where \(\bar{\cdot}\) denotes any arbitrary extension of \(X, Y, Z\). Tensorality then guarantees the result is independent of the extension.

What is rather more suprising, given that \(X\) is being differentiated twice, is that \(\operatorname{Rm}\) is tensorial in \(Z\) also! This means that \(\operatorname{Rm}\) may be evaluated on tangent vectors \(X, Y, Z\) at a point and thus may be interpreted as giving information (via \(\nabla\) which itself is determined by \(g\)) about \((M, g)\) at a point. This information is in fact a measure of curvature.

One question stands out: Why is \(\nabla_{[X, Y]} Z\) the right correction term? There are a few ways we might answer this question such as "because it works!" and "check in coordinates". The answer we will give here is obtained by interpreting \(\operatorname{Rm}\) as the commutator of second derivatives.

The second derivative of a vector field, in directions \(X, Y\) is defined to be \[ \nabla^2_{X, Y} Z := \nabla_X (\nabla_Y Z) - \nabla Z (\nabla_X Y) = \nabla_X (\nabla_Y Z) - \nabla_{\nabla_X Y} Z. \]

Check that \(\nabla^2_{X, Y} Z\) is tensorial in \(X, Y\).

The reason for this definition is that once again, \(\nabla_X (\nabla_Y Z)\) will include the variation, \(\nabla_X Y\) of \(Y\) along \(X\) so we must subtract it off so that it doesn't contribute to \(\nabla^2 Z\). Essentially the way to understand how to choose what to substract off is by the product rule. First, for those more comfortable with coordinates, we have \[ \nabla_Y Z = Y^i \partial_i Z^j \partial_j + Y^i Z^j \Gamma_{ij}^k \partial_k. \] This looks pretty good: we are differentiating \(Z\) in the direction \(Y\) and the result depends only on \(Y\), \(Z\) and the first derivatives of \(Z\). Now we apply \(\nabla_X\): \[ \nabla_X \nabla_Y Z = X^{l} \partial_{l} (Y^i \partial_i Z^j) \partial_j + X^{l} Y^i \partial_i Z^j \Gamma^m_{l j} \partial_m + \cdots \] where I got tired of computing this way to I just put \(\cdots\) to indicate there are more terms! The point though is that there are derivatives of \(Y^i\) in there but we really only want to compute the variation of \(Z\). In particular notice that applying the product rule will give a term \[ X^{l} \partial_{l} Y^i \partial_i Z^j \partial_j \] which we recognise as the first term occuring in \[ \nabla_{\nabla_X Y} Z = X^{l} \partial_{l} Y^i \partial_i Z^j \partial_j + \cdots \]

If one is so inclined, this computation may be fully carried out to verify that the result only depends on the components \(X^i, Y^j, Z^k\) and the first two derivatives of \(Z\): \(\partial_i Z^k, \partial_i \partial_j Z^k\). It's worth doing and doesn't actually take very long. Doing is better than reading, hence we have:

Carry out the computation if you are so inclined.

For comparsion, consider the hessian matrix of a real valued function defined on \(\mathbb{R}^n\): \[ d^2 f (x) = \begin{pmatrix} \frac{\partial^2 f}{\partial x^1 \partial x^1} (x) & \cdots & \frac{\partial^2 f}{\partial x^1 \partial x^n} (x) \\ \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial x^n \partial x^1} (x) & \cdots & \frac{\partial^2 f}{\partial x^n \partial x^n} (x) \end{pmatrix} \]

This matrix records how \(f\) varies to second order at \(x\). Once this matrix has been computed, second derivatives of \(f\) in directions \(X = (X^1, \dots, X^n)\) and \(Y = (Y^1, \dots, Y^n)\) may be computed as \[ d^2 f (X, Y) = Y^T d^2 f X. \] However, if \(X, Y\) are vector fields, then in general, \[ d^2 f \ne \partial_X (\partial_Y f) \] where \[ \partial_X f = df(X) \] or equivalently \(\partial_X f = X(f)\) with \(X\) acting as a derivation. The problem is of course again the fact that \(Y\) will also be differentiated: \[ \partial_X (\partial_Y f) = X^i \partial_i (Y^j \partial_j f) = X^i Y^j \partial_i \partial_j f + X^i \partial_i Y^j \partial_j f = d^2f (X, Y) + df(D_X Y) \] so that \[ d^2 f (X, Y) = \partial_X (\partial_Y f) - df(D_X Y) = \partial_X (\partial_Y f) - \partial_{D_X Y} f. \] Now the point of tensorality is that just from the matrices for \(d^2 f\) and \(df\) at a point \(x\), the second derivative \(\partial_X (\partial_Y f)\) at \(x\) may be computed by linear algegra alone (i.e. matrix multiplication) with no further differentation required. This is because of tensorality: \(d^2 f(X, Y)\) only depends on the value of \(X, Y\) at the point \(x\) and not in a neighbourhood. In other words, we may pre-compute the matrices \(df\) and \(d^2 f\) once and for all, then apply them to any vectors to compute first and second derivatives. We may also approximate \(f\) to second order at any point without needing to compute any more derivatives.

As a simple comparison, this idea is essentialy used by a calculator (or computer) to compute \(\sin, \cos, \exp\) etc. The Taylor series is calculated once and for all (giving an expression for the coefficients that can be calculate easily or by storing in a table sufficiently many of the coefficients) and then hard wired into the calculator. Further calculation is by elementary artihmetric operators.

Thus the moral is to compute the maps \(x \mapsto df(x)\) and \(x \mapsto d^2f (x)\) from which any second derivatives may be later computed using linear algebra. This only works by using the tensorial first and second derivatives so we may later work pointwise!

Now the definition of \(d^2 f\) should be compared immediately with the definition of \(\nabla^2 Z\). Formally, it is the same thing just with \(f\) replaced by \(Z\) and \(D\) replaced by \(\nabla\). This is suggestive that we have the correct expression for \(\nabla^2 Z\).

Let us know rephrase the expression for \(\nabla^2 Z\) and see how the tensorality arises.

The first observation is that \(\nabla Z\) is an endomorphism of \(TM\). That is an element of \[ \operatorname{Hom}(TM, TM) \simeq T^{\ast} M \otimes TM. \] Then we may interpret \(\nabla Z (X) = \nabla_X Z\) in terms of contractions (traces) and tensor products: \[ \nabla Z (X) = \operatorname{Tr} \nabla Z \otimes X \] where the trace is taken by contractinng the \(T^{\ast} M\) part of \(\nabla Z\) with \(X\). Notice in particular for so-called indecomposable elements of \(T^{\ast} M \otimes T^M\), namely those of the form \(\alpha \otimes X\) with \(\alpha\) a one-form we have \[ \operatorname{Tr} \alpha \otimes X = \alpha(X). \] Now we'd like to be able to differentiate \(\alpha\). As before, if we differentiate the function \(\alpha(X)\) we will pick up derivatives of both \(\alpha\) and \(X\). So to isolate the derivative of \(\alpha\) we could subtract off the derivative of \(X\). Then we make the definition \[ \nabla \alpha (X, Y) = \partial_X (\alpha(Y)) - \alpha(\nabla_X Y). \]

Check this is tensorial in \(X\) and \(Y\).

In terms of tensor products and traces we may express the defintion as \[ \partial_X (\alpha(Y)) = \partial_X \operatorname{Tr} (\alpha \otimes Y) = \operatorname{Tr} (\nabla_X \alpha) \otimes Y + \operatorname{Tr} \alpha \otimes \nabla_X Y = \nabla_X \alpha (Y) + \alpha(\nabla_X Y). \]

Given a connection \(\nabla\) on \(TM\) and the (uniquely determined by identifying vector fields with derivations) connection on \(M \times \mathbb{R}\), we may define a unique connection on \(T^{\ast}M\) by requiring that the resulting three connections commute with traces and satisfy the Leibniz rule for the tensor product.

Now how do we differentiate \(\nabla Z\)? It is an endomorphism and we may do something similar for endomorphisms. So let \(T\) be and endomorphism so that \(T(X)\) is a vector field. Note that for one-forms \(\alpha\) we had \(\alpha(X)\) is a function and we know how to differentiate functions. Well, given \(\nabla\) we also know how to differentiate vector fields suggesting that we define \[ (\nabla_X T) (Y) = \nabla_X (T(Y)) - T(\nabla_X Y). \] In terms of traces \[ \nabla_X (T(Y)) = \nabla_X (\operatorname{Tr} T \otimes Y) = \operatorname{Tr} \nabla_X T \otimes Y + \operatorname{Tr} T \otimes \nabla_X Y = \nabla_X T (Y) + T(\nabla_X Y). \] Rearranging gives \[ (\nabla_X T) (Y) = \nabla_X (T(Y)) - T(\nabla_X Y). \]

Check directly that this is tensorial in both \(X\) and \(Y\). Do it both with the final expression and with the identities using traces and tensor products. Think about how requiring that the connection commutes with traces and satisfies the Leibniz product rule for tensor products leads to tensorality.

Then for \(T = \nabla Z\) we finally obtain \[ \nabla^2_{X, Y} Z = \nabla^2 Z (X, Y) = (\nabla_X \nabla Z) (Y) = \nabla_X (\nabla Z(Y)) - \nabla Z(\nabla_X Y) = \nabla_X \nabla_Y Z - \nabla_{\nabla_X Y} Z \] which is tensorial in both \(X\) and \(Y\).

Now that we understand second derivatives, we can express the curvature tensor \(\operatorname{Rm}\) as the commutator of second derivatives: \[ \operatorname{Rm} (X, Y) Z = \nabla^2_{X, Y} Z - \nabla^2_{Y, X} Z. \] This equation is known as the Ricci Identity.

Prove the Ricci Identity. Hint: Use the fact that \(\nabla\) is torsion-free \(\nabla_X Y - \nabla_Y X = [X, Y].\)

Sometimes this expression is written \[ [\nabla_X, \nabla_Y] Z = \nabla^2_{X, Y} Z - \nabla^2_{Y, X} Z. \] Be careful with this phrasing: \([\nabla_X, \nabla_Y] Z \ne \nabla_X (\nabla_Y Z) - \nabla_Y (\nabla_X Z)\)! The right hand side is not tensorial.

Define \(\operatorname{Rm}(X, Y)f = \nabla^2_{X, Y} f - \nabla^2_{Y, X} f\). Show that \(\operatorname{Rm} (X, Y) f = 0\). Equivalently, \(\nabla^2 f(X, Y) = \nabla^2 f(Y, X)\). We might then say that \(M \times \mathbb{R} \to M\) is a flat (i.e. not curved!) vector bundle.

Thus the curvature tensor measures the lack of commutativity of second derivatives of vector fields. Put another way, unlike for functions, \(\nabla^2_{X, Y} Z\) need not be symmetric. Instead we have \[ \nabla^2_{X, Y} Z = \nabla^2_{Y, X} Z + \operatorname{Rm} (X, Y) Z. \]

Show that in Euclidean space, \(\nabla^2_{X, Y} Z\) is symmetric in \(X, Y\).

Now we observe that since we defined \(\nabla^2 Z\) in a tensorial way, immediately we have \(\operatorname{Rm}(X, Y)Z\) is tensorial in \(X, Y\). By defining \(\operatorname{Rm}\) as the second order commutator, we also immediately obtained the correction term.

But still, we have the question why is \(\operatorname{Rm}\) tensorial in \(Z\)?

Show that \(\nabla_X \nabla_Y fZ - \nabla_Y \nabla_X fZ - \nabla_{[X,Y]} fZ = f \operatorname{Rm} (X, Y) Z + (\operatorname{Rm} (X, Y) f) Z = f \operatorname{Rm} (X, Y) Z.\) Thus we conclude the tensorality in \(Z\) follows since \(M \times \mathbb{R} \to M\) is a flat vector bundle.

The curvature tensor has the following fundamental symmetries:

  1. \(\operatorname{Rm} (X, Y) Z = -\operatorname{Rm} (Y, X) Z\) (first interchange anti-symmetry)
  2. \(\operatorname{Rm} (X, Y) Z + \operatorname{Rm} (Y, Z) X + \operatorname{Rm} (Z, X) Y = 0\) (Bianchi identity)
  3. \(\operatorname{Rm} (X, Y, Z, W) = -\operatorname{Rm} (X, Y, W, Z)\) (second interchange anti-symmetry)
  4. \(\operatorname{Rm} (X, Y, Z, W) = \operatorname{Rm} (Z, W, X, Y)\) (pairwise interchange symmetry)

The first symmetry follows directly from the definition of the curvature tensor simply by swapping \(X\) and \(Y\).

The second symmetry follows from the fact that \(\nabla\) is torsion free and by the corresponding cyclic identity for vector fields: \[ [X, [Y, Z]] + [Y, [Z, X]] + [Z, [X, Y]] = 0 \quad \text{(Jacobi Identity)}. \]

By using the first identity, the third and fourth are equivalent to each other. Either may be proved using the metric compatability and the first two symmetries.

The Ricci curvature is the trace, \[ \operatorname{Ric} (X, Y) = \operatorname{Tr} Z \mapsto \operatorname{Rm} (Z, X) Y. \] That, we fix \(X, Y\) and then trace the resulting linear operator acting on \(Z\). It may also be written \[ \operatorname{Ric} (X, Y) = \operatorname{Tr}_g \operatorname{Rm} (\cdot, X, Y, \cdot). \]

Since tracing is an averaging operation (e.g. it's the sum of the eigenvalues), the Ricci curvature represents and average curvature over two directions.

Using the symmetries of the curvature tensor, show that \(\operatorname{Ric}\) is symmetric.

Using the symmetries of the curvature tensor, show that fixing any two other vector fields and tracing either reproduces the Ricci tensor or vanishes.

The scalar curvature is the trace of the Ricci tensor, \[ \operatorname{R} = \operatorname{Tr}_g \operatorname{Ric}. \] The scalar curvature is a fully averaged curvature over all four directions.

#+BEGINdefin The sectional curvature of a two-plane spanned by \(X, Y\) is \[ \operatorname{K} (X \wedge Y) = \frac{1}{|X \wedge Y|_g} \operatorname{Rm}(X, Y, Y, X). \] At each point, this is the Gauss curvature of a flat surface embedded into \(M\) and with tangent plane equal to \(X \wedge Y\). #+ENDdefn

Show that the symmetries of the curvature tensor imply that the sectional curvatures determine the curvature tensor. Hint: The basic idea is to note that the symmetries of the curvature tensor imply that \[ \operatorname{Rm} (X \wedge Y, W \wedge Z) = \operatorname{Rm} (X, Y, Z, W) \] determines a symmetric bilinear form on the space of two-planes. This bilinear form is uniquely determined by polarising the corresponding quadratic form \[ Q(X \wedge Y) = \operatorname{Rm} (X \wedge Y, Y \wedge X) = |X \wedge Y|_g K(X \wedge Y). \]