Part 2 - coordinates, vectors and the summation convention

The series' table of contents

The basic object in GR is the spacetime. As a mathematical object, formally it is a differential manifold, but for our purposes it is enough to consider it as a set of points called events, which can be described by coordinates. In GR, the spacetime is 4-dimensional, which means that we need 4 coordinates - one temporal and three spatial ones.

The coordinates can be denoted by pretty much anything (like x, y, z, t), but since we will refer to all four of them at multiple occasions, it will be convenient to denote them by numbers. It is pretty standard to denote time by 0, and the spatial coordinates by 1, 2 and 3. The coordinate number \mu will be written like this: x^\mu (attention: in this case it is not a power!). \mu here is called an index (here: an upper one). By convention, if we mean one of the 4 coordinates, we use a greek letter as the index; if only the spatial ones are to be considered, we use a letter from the latin alphabet.

So now we know how to describe points ("places") in spacetime. But in spacetime, as in every space, we have not only places, but also directions. Those are described with vectors. Vectors are written much like points - they are also described by 4 coordinates, denoted v^\mu. In this case they don't mean a place in the spacetime, but the proportions of movement along respective coordinate axes. Let me explain in detail.

Let's imagine an ordinary plane with x and y coordinates. Each vector in this plane will also have x and y coordinates - let's denote them, for example, v_x and v_y. They can be interpreted as a receipt: to move in the direction described by this vector, you need to add v_x to the x coordinate of the point, and v_y to its y coordinate. For example, moving from the point (4,3) by a vector [1,-1] will get us to the point (5,2).

The direction doesn't end on this one point, though. We can go farther, to (6,1), (7,0), (8,-1),... We can make such "steps" as many times as we like, but we can also make for example half of a step, to (4.5, 2.5). Vectors [av_x, av_y] have then the same direction as [v_x, v_y], but different magnitudes. They can also have the same or the opposite sense (the same if a>0, opposite if a<0). Having a function defined on our space (on a plane it will be f(x,y)) and a vector, we can ask about the derivative of this function in the direction of this vector - it will tell us how fast our function changes, when we move in that direction. As it turns out, the derivative in the direction of [v_x, v_y] equals v_x\frac{\partial f}{\partial x} + v_y\frac{\partial f}{\partial y}. In a space in which we have coordinates numbered with an index \mu, we can write it this way: \sum\limits_{\mu=0}^n v^\mu \frac{\partial f}{\partial x^\mu}. It is worth noting that if we multiply the vector by some number a, the derivative will also get multiplied by a. This means that the longer the vector, the greater the value of the derivative. Thus we can say that the vector describes not only a direction, but also the velocity with which we move in that direction. The derivative then tells us how fast the function changes, when we move with that velocity. Now a few notation conventions again: first, \frac{\partial f}{\partial x^\mu} is often being written \partial_\mu f for the sake of convenience. Here we have the index at the bottom - you can remember it this way: when we differentiate with respect to something with an index, the index switches places (from top to bottom and vice versa). It will be important in a while. We have then such an expression: \sum\limits_{\mu=0}^n v^\mu \partial_\mu f. Expressions with such sums appear in GR so often, that Einstein himself decided they shouldn't be written and introduced the so-called summation convention. It says just that every time an index is repeated in an expression, once as an upper index and once as a lower one, the expression should be summed for all the values of that index. This lets us write our expression as v^\mu \partial_\mu f. This convention is the main reason of the importance of positions of indices (there is also another matter, but it is beyond this article). Summing over the repeated index is called a contraction.

Let's take a closer look at our directional derivative: \partial_\mu f. Those are actually n partial derivatives of the function f (for n different values of \mu - in spacetime, four). It looks a bit like a point or a vector, except the index is at the bottom. Such a thing is called a covector.

The derivatives of a function usually depend on the point at which they are being calculated, so the coordinates of this covector will also depend on the point in space. We can think of this as if we had a covector in every point in space, with coordinates equal to the derivatives of the function at that point. We don't have a single covector here then, but a whole covector field. In the same way, when the coordinates of a vector depend on the point in space, we are dealing with a vector field.

Often vector and covector fields are just called vectors/covectors. It's not a problem, since single (co)vectors at a single point are almost never being considered. So every time when we write just v^\mu \partial_\mu f, we will mean the derivative of f in the direction of v calculated at every point in space separately.