This article is about the solution to a recent problem from the CCF NOI 2018 Winter Camp, the penultimate stage before the Chinese national team is selected. I believe that the solution contains a few techniques that are rarely known.
Short description of the problem:
You are given three trees \(T_1, T_2, T_3\) with the same number of nodes. Let \(d_1(x,y), d_2(x,y), d_3(x,y)\) denote the distance from \(x\) to \(y\) in each tree. Find a pair of nodes \((x,y)\) such that \(d_1(x,y)+d_2(x,y)+d_3(x,y)\) is maximized. Output that value.
Let’s start by solving the problem for two trees only. Let \(f_1(v)\) denote the distance from node \(1\) to node \(v\) in the first tree. We want to maximize \(f_1(x) + f_1(y) - f_1(\mathrm{LCA}_1(x,y)) + d_2(x,y)\). For each node \(v\) in the second tree, let’s add a node \(v'\) and an edge \((v,v')\) to the second tree, with weight \(f_1(v)\). If we fix \(p = \mathrm{LCA}_1(x,y)\), we have reduced our problem to this: “Given two sets of nodes \(A, B\), merge them and calculate the maximum value of \(\{ d'_2(a,b) \mid a \in A, b \in B \}\), where \(d'_2(a,b)\) denotes the distance between \(a\) and \(b\) in the modified second tree.” It can be proven through a reductio ad absurdum argument, that for each one of these sets, we only need to store a single pair of nodes \((a,b)\) such that \(d'_2(a,b)\) is maximized. This means that merging is done in this way: Let \((a_1, b_1)\) and \((a_2,b_2)\) be the aforementioned pairs. The value of the new pair will be \((a_1, a_2), (a_1, b_1), (a_1, b_2), (a_2, b_1), (a_2, b_2),\) or \((b_1, b_2)\), depending on which pair maximizes \(d'_2(a,b)\). This gives us an \(O(n \times \mathrm{LCA\_Complexity}(n))\) algorithm.
Now, let’s solve another version of the problem: We have three trees, but the second and third trees are path-shaped. Again, let’s fix \(p=\mathrm{LCA}_1(x,y)\). We have to maximize \(f_1(x) + f_1(y) + |f_2(x)-f_2(y)| + |f_3(x)-f_3(y)|\). For each subtree of the first tree, we will maintain four values:
- \(g_{1,1}(x) = \operatorname*{argmax}_v f_1(v) + f_2(v) + f_3(v)\)
- \(g_{1,2}(x) = \operatorname*{argmax}_v f_1(v) + f_2(v) - f_3(v)\)
- \(g_{2,1}(x) = \operatorname*{argmax}_v f_1(v) - f_2(v) + f_3(v)\)
- \(g_{2,2}(x) = \operatorname*{argmax}_v f_1(v) - f_2(v) - f_3(v)\).
Since \(\left\|x\right\| = max(x,-x)\), the best answer will be either equal to \(g_{1,1}(x) + g_{2,2}(y)\) or \(g_{1,2}(x) + g_{2,1}(y)\) for some pair of nodes \((x,y)\). This gives us a simple \(O(n)\) solution if we keep the prefix and suffix maxima of \(g\) for the children of every node.
Unfortunately, we still have one more subtask to solve before we reach the full solution. In this subtask, we are given two trees and a path. Let’s start by introducing the auxiliary tree: Let \(A\) be a subset of nodes of a tree \(T\). Then, the auxiliary tree contains the nodes \(A \cup \{\mathrm{LCA}(x,y) \mid x,y \in A\}\), and every node is connected to its closest ancestor in the original tree with an edge of weight \(d(\mathrm{ancestor}, \mathrm{node})\). The auxiliary tree therefore has \(O(|A|)\) nodes. Now, we can do D&C on the path: Let’s fix its midpoint \(m\). We have to maximize \(d_1(x,y) + d_2(x,y) + d_3(x,m) + d_3(m,y)\), where \(x \leq m\) and \(y \geq m\). If we add an extra vertex \(v'\) to \(T_2\) where the weight of the edge \((v,v')\) is equal to \(d_3(v,m)\) and create the auxiliary trees that only contain the vertices in the range we’re currently processing, we can use the solution to the “two trees” subtask to solve this one too. This gives us an \(O(n \log{n} \times \mathrm{LCA\_Complexity}(n)\) solution, with an albeit difficult implementation.
And finally, we can move on to the complete solution. But first, we have to introduce another technique, called “centroid decomposition on edges”. Here is a simple pseudocode description of this technique:
Obviously, this will work in \(O(n^2)\) in some cases, e.g. a star graph. However, if we have a binary tree, this code works in \(O(nlogn)\). We can make an arbitrary tree binary while preserving distances between nodes using a method similar to the left-child right-sibling conversion. If we have a node with \(k > 2\) children, we are going to attach a single child node to it. This node’s left child will be the former first child of the node, and it will be connected to it with a weight of identical edge. Its right child will be its “sibling” node, whose left child will be the former second child of the node, and so on.
Our full solution is simply a combination of these two techniques. Make the first tree binary, and do “centroid decomposition on edges” on it. Similarly to the previous subtask, add a dummy node \(x'\), such that \(d'2(x, x') = h(x)\), where \(h(x)\) is the distance from node x to the active edge in the first tree. Then, solve the “two trees” subtask. This gives us an \(O(n \log{n} \times \mathrm{LCA\_Complexity}(n))\) solution.
This article was written by f2lk6wf90d, based on our chat on Telegram. Thank you for your awesome work, Dimitris!
If you find any mistakes in this solution, comments are welcomed.