Description
Given sequences $a_{0\sim n-1}, b_{0\sim n-1}, c_{0\sim m-1}, d_{0\sim m-1}$, where $a, c$ are increasing and $b, d$ are decreasing.
Define
$$ w(i,j) = (a_i + c_j) \times (b_i + d_j). $$
There are $q$ queries $(l_k, r_k, L_k, R_k)$, asking for:
$$ \max_{l_k \le i \le r_k} \max_{L_k \le j \le R_k} w(i,j) $$
Limitations
$1 \le n, m, q \le 10^5$
$1 \le a_i, b_i, c_j, d_j \le 10^9$
$0 \le l_k \le r_k \le n-1$
$0 \le L_k \le R_k \le m-1$
Time Limit : $6\text{s}$
Memory Limit : $1024\text{MB}$
Solution
Consider $n, m, q$ to be of the same order.
Obviously $w$ has decision monotonicity. Consider maintaining several $(l, r, x)$ indicating that the optimal decision for $[l, r]$ is $x$. From right to left, after adding a new $x^\prime$, we only need to change a suffix of decisions to $x^\prime$. First, pop all entire segments where $x$ is worse than $x^\prime$; then, in the remaining last segment, binary search for the last position $p$ where $x$ is still better (if there is no such segment, set $p = -1$). Then from $p+1$ onward, $x^\prime$ is better (actually, depending on the order of adding $x^\prime$, the implementation may vary slightly).
Now to handle queries, first partition the second dimension $j$ into blocks of size $B=\sqrt n$.
For partial blocks, use catenary divide-and-conquer on the first dimension $i$. For each layer, sweep from $\text{mid}$ to both sides, adding the swept $i$ and processing queries related to $i$. Since the second dimension has only $O(B)$ points in partial blocks, for each such query we can:
- Binary search to find which decision segment $L$ belongs to.
- Then move a pointer to sequentially compute the optimal decision for each $j \in [L, R]$.
Complexity for this part is $O(n \log^2 n + n \sqrt n)$.
For whole blocks (where a query covers a whole block of $j$'s), process each block independently. Take all $j$ in the current block and compute $f_i$, defined as the maximum $w(i, j)$ for fixed $i$ with $j$ in this block. Then we need to answer queries, using an $O(n)-O(1)$ RMQ data structure, this part runs in $O(n \log n+n \sqrt n)$ total.
Overall time complexity is $O(n \sqrt n + n \log^2 n)$, space complexity $O(n)$. The actual implementation runs faster than the standard solution.