So I recently wanted to clarify my thinking about two commonly used error metrics for evaluating regression models - mean absolute error (MAE) and root mean squared error (RMSE). These metrics accomplish similar tasks, but each have their own advantages. Let $\mathcal{E} = (e_1,\ldots,e_n)^T$ be the model errors over the test data $(x_i,y_i)$. MAE is a straightforward mean of the absolute value of the errors, $\frac{1}{n}\sum_{i=1}^n |e_i|$. RMSE is the square root of the mean of the squared errors, $\sqrt{ \frac{1}{n}\sum_{i=1}^n e_i^2 \,}$.
Both of these metrics have several nice properties. In both cases, the positive and negative errors cannot cancel out. Also, both metrics have the nice property that their units are the same as those of the output variable $y$. So how are they different?
Well, the mean absolute error makes more intuitive sense. I do have to grasp a little more to understand RMSE. RMSE does have an interesting advantange of its own: when used as a loss function in a Machine Learning model, it is easy to take its derivatives.
To find more about how these two metrics compare, let's look at several inequalities for norms: $$\|\mathcal{E}\|_2 \leq \|\mathcal{E}\|_1 \leq \sqrt{n}\|\mathcal{E}\|_2.$$ Since $MAE = \frac{1}{n}\|\mathcal{E}\|_1$ and $RMSE = \frac{1}{\sqrt{n}}\|\mathcal{E}\|_2$, we can find two inequalities for these metrics: \begin{eqnarray} RMSE &\geq & MAE, \\ RMSE &\leq & \sqrt{n}\, MAE. \end{eqnarray}
So to summarize, root mean squared error is always bigger than mean absolute error. The second inequality above points toward an interesting tendency of root mean squared error: RMSE tends to grow as the size of the dataset grows. Something to be aware of, and not necessarily freak out about. Outliers in the data tend to affect RMSE more. Chai and Draxley (2014) have a great paper for further reading on this topic: RMSE or MAE? - Arguments against avoiding RMSE in the literature.