About this site @ GitHub: Projects on Music/Coding contests/Games
Status:
Project canceled/better solution found
here
The original question statement is here
Arsh has an R-by-C circuit board of inhomogenious thickness, each of the cell has an integer thickness . He wants to cut a subcircuit board from it which is rectangular and verify that the thickness of the thickest and thinnest cell is within a difference of K
An example:
1
2
3
4
5
6
7
8
9
10
K=2
from the following:
987689
923438
923438
988873
2343
2343
is a valid circuitboard
Please find the largest possible area obtainable from a valid subcircuit board.
R,C between N=1 to 300 Thickness integer in 0 to 1000
Of course, this question requires a min and max computation over a rectangular range. Then verification of the difference between them. Since max(X:X iterable) is the same as -min(-X:X iterable). We will forget the max aspect of things…
I had experience with another question that required quick min range lookup of a sublist of a list of size N. The bruteforce algorithm would take time complexity N for iterating over all elements in the range. There is an log-algorithm to speed it.
The idea is to precache queries of size 2^t, The answer from the query of two consequtive queries of size 2^(t-1) can be combined to make the answer for another of size 2^t. Iterating from sizes of 2^0, and going up to t=log(300) would take fewer than 300*log(300) combinations and a memory of the same size. A non 2^t-sized range would be done as follows:
However, this problem is a 2D problem. And maybe that is not the same as a simple range-min query over 1D? Is there anything we can borrow?
Well, the two dimensions are not independent: observe in the following case:
1
2
3
4
5
6
7
8
9
10
11
AB
CD
A,B,C,D adjacent subcircuits
We know that:
min(C)>=min(AC)>=min(ABCD)
min(C)>=min(CD)>=min(ABCD)
but it is hard to say anything about min(AC) and min(CD)...
So I get the feeling that the loops in the horizontal direction must always be present… Brute force would take N^2 to decide the two x-boundaries, N^2 for the y-boundaries, N^2 to iterate over all cells within. That is a whooping N^6 = 723000000000000 iterations…
Well, we can first think about the 2D-query of a min-range-2D. By caching all sizes of 2^I-by-2^J rectangles, we can make a 2D version of the 1D rangemin algo. The maximum number of subrectangles in the rectangle would be logN x logN. So if we iterate over all boundaries with this method, it will become N^4 log(N)^2 = 8100000000*81 = 656100000000~10^12 still too big… The precaching will take N^2 log(N)^2 - negligeable in front of the lookup.
The skeleton would be something like this:
1
2
3
4
5
for x_start in range(300):
for x_end in range(300):
for y_start in range(300):
for y_end in range(300):
doThe2DLog300Lookup(x_start,x_end,y_start,y_end)
Then I remembered this video. And I thought about maybe using a two way algorithm… This time the ends are not leftmost and rightmost. But positions 0 and 1. You start off with a unitary subcircuit in the y direction.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
for x_start in range(300):
for x_end in range(300):
while notBreaked:
y_start = 0
y_end = 1 # not included
res = doThe2DLog300Lookup(x_start,x_end,y_start,y_end)
if res.valid:
#too easy to verify with a small subrect, make it bigger!
y_end+=1
else:
#a bit too big...
y_start+=1
if y_start == 301:
break
This catepiller movement of y_start
and y_end
is so cute :D.
We know for sure that the catepiller can only move forward, so the complexity is N for this while loop. giving a complexity of N^3. The question is will this caterpiller go over all valid configs?
Say the optimal config is [A~B]x[C~D]
A and B are bruteforcefully covered
At a certain moment, y_start
is going to become C, by a rightward movement from y_start = C-1. If y_end
is left of D, it will now advance until it works, never becoming too big in the process so as to necessite shifting of y_start
. Is it possible that y_end
is already right of D? Well in that case going back to the moment when y_end
just became D, y_start
must be on the left of C, since that event happened afterwards! And so y_start
will keep on advancing because it stays invalid until it hits that C, otherwise, violation of optimality!
Now that is a whooping N increase, but still,
1
2
for i in range(300*300*300*9*9):
pass
took me like a few minutes…
You can’t really generalize this to 2D, because the step beGreedier()
can be interpreted as expand y_end
or expand x_end
as in if stillValid(): beGreedy()
. These kind of binary choice is typical of entering the realm of NP expontial branching…
Or so I thought…, it occured to me that this caterpiller parcours all locally maximal y-ranges for fixed x-ranges. The caterpiller doesn’y care about the achieved size of the rectangle.
Maybe we could define jumps of the caterpiller… if a rectangle of size S was already discovered, nothing to lose by skipping over smaller tries. Nope that doesn’t really work, because now the caterpiller might find itseld stuck in a local minimum, not being able to shrink to rebound :(
Another idea flashed, if we have no choice but to bruteforce the y-ranges, maybe we could use the same philosophy on the x-ranges, but considering the size obtained…
Ha I thought about what I call a Fermi Energy Level method:
1
2
3
4
5
6
7
8
9
10
11
12
for sizeGoal in decotomy():
x_start,x_end = 0,1
deltaY = caterpillerOnY(x_start,x_end)
#returns the best Y interval given an x interval
size = deltaY*(x_end-y_end)
if size>=sizeGoal:
#it was too easy, let's become greedier
x_end+=1
else:
# too bad, we should relax the bruteforce
x_start+=1
That will work! Total complexity now is log(300*300)*300*300*log(300)^2 = 300*300*18*9*9
1
2
for i in range(300*300*18*9*9):
pass
Took 8 seconds! That should do the job!
So I start off by writing a python iterator that does dichotomy in a general way,
it returns besides the probing value two functions that are calleable and will tell the iterator to go up or down, the user should call higher()
or lower()
wisely depending on the situation. This code is salvagable for future problems
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import numpy as np
import itertools
# dechotomy over a size goal,
# lower and higher are always returned as signaler functions
def dechotomy(sizeGoal):
low, high = 0, sizeGoal
def higher():
global low
low = mid
def lower():
global high
high = mid
while True:
mid = (low + high) // 2
yield mid, lower, higher
Now, we process the inputs. The outer-most for loop is just repetition, of no particular interest but to test different edge cases whatever that is… Please see the original problem (link at the top of this post), as well as the general structure of Google Code Jam Questions.
1
2
3
4
5
6
7
8
9
# main contest body
t = int(input())
for ti in range(1, t + 1):
r, c, k = tuple(map(int, input().split()))
grid = []
for ri in range(r):
grid.append(list(map(int, input().split())))
grid = np.array(grid, dtype=int)
# variables now all loaded
Now that we have loaded the problem into a 2D array, we may begin precaching. By using numpy (based on C), I hope that I can gain a bit of speed. Although here it is not important, the bottleneck is the number of queries.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#dynamical program used to precalculate lookups of size 2^I x 2^J
_ = []
for operation in [np.max, np.min]:
precache_max_or_min = np.zeros((r, c, 9, 9), dtype=int)
precache_max_or_min[:, :, 0, 0] = grid
for i in range(8):
for j in range(8):
temp = np.zeros((r, c, 2), dtype=int)
temp[:, :, 0] = precache_max_or_min[:, :, i, j]
temp[:, :, 1] = \
np.roll(precache_max_or_min[:, :, i, j], -2 ** j, axis=1)
temp = operation(temp, axis=2)
precache_max_or_min[:, :, i, j + 1] = temp
temp = np.zeros((r, c, 2, 9), dtype=int)
temp[:, :, 0, :] = precache_max_or_min[:, :, i, :]
temp[:, :, 1, :] = \
np.roll(precache_max_or_min[:, :, i, :], -2 ** i, axis=0)
temp = operation(temp, axis=2)
precache_max_or_min[:, :, i + 1, :] = temp
_.append(precache_max_or_min)
precache_max, precache_min = _
The two variables precache_max
, precache_min
should now contain queries of size 2^Ix2^J. Note that by using roll, I am summing in a cylindrical fashion. It is up to the caller to be responsible for not specifying a range that goes out of range.
Now we implement lookups of any arbitrary size, which returns true or false depending on whether the k-limit is surpassed.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# The actual lookup for any sizes
def lookup(xlow, xhigh, ylow, yhigh):
szx = xhigh - xlow
szy = yhigh - ylow
collectedx = []
collectedy = []
while szx > 0:
collectedx.append(szx % 2)
szx = szx // 2
while szy > 0:
collectedy.append(szy % 2)
szy = szy // 2
sublookupx = []
sublookupy = []
cursor = 0
for xi, x in enumerate(collectedx):
if x == 1:
sublookupx.append((cursor, xi))
cursor += 2 ** xi
cursor = 0
for yi, y in enumerate(collectedy):
if y == 1:
sublookupy.append((cursor, yi))
cursor += 2 ** yi
minFound,maxFound = grid[xlow,ylow],grid[xlow,ylow]
for (x, xi), (y, yi) in itertools.product(sublookupx, sublookupy):
maxFound = max(maxFound, precache_max[x, y, xi, yi])
minFound = min(minFound, precache_min[x, y, xi, yi])
if minFound+k>=maxFound:
return True
else:
return False
The algorithm doesn’t work. For each pair (xlow, xhigh)
, we can indead find in
time the corresponding optimal (ylow,yhigh)
pair that maximizes the surface.
Suppose that we suspect that an optimum exist at (near the end of the dechotomy), for example, we found a rectangle of size , and failed to find one at , there is still no way to determine whether to do xlow+=1
or xhigh-=1
halfway through the iteration.
So we have been having big headaches over the 2D aspect, and not being able to control the second caterpiller. Actually I was bound by my deep thought. We could have simply sorted all pixels into a 1D array by thickness.
1
L = [(v_ij,i,j)...]#sorted in increasing order
I have a hunch that I can then perform the caterpiller algorithm over this 1D array of size .
Let’s suppose that a valid rectangle [a:b]x[c:d] exists with min,max = m,M. This statement is equivalent to saying that the leftward (of m) and the rightward (of M) doesn’t have any pixel inside that rectangle
OK I now have an idea for constructing an algo:
1
2
3
4
5
6
7
8
9
10
11
12
For all locally maximal caterpiller pairs (I,J)
such that abs(L[I][0]-L[J][0])<=k:
#complexity is now 300^2
#Find the biggest rectangle that is valid,
#using dechotomy of the four boundaries
For all possible 4 boundaries:
# complexity multiplied by log(300)^4
check validity
# multiplied by log(300)^2
In total the complexity should be that’s still too big.
I think the dichotomy is a waste of time. Actually what you can do is follow along the caterpiller. The thinnest pixel to be newly engulfed by the complementary of the catepiller range can only make an already valid rectangle smaller and by how much can be answered in O(1)
time: just see if the included pixel falls in the rectangle!
The new pixel that is released from the complementary into the caterpiller is more problematical if it is outside the rectangle. The rectangle can be bigger, in the direction of the pixel, the worst case we will have to do dichotomy over two of the four boundaries. Therefore speeding it up by ~100X
What is better is that Actually it is never both boundaries at the same time, so this speeds it again by 5X.
Implementing it, I noticed that the pixel leaving the catepiller into the boundary can break a valid rectangle into two valid rectangles of equal size. Which one should we choose? The reverse is true, for a pixel leaving the complementary, it is absolutely possible that the biggest rectangle can jump (have no overlap at all with the old one). So once again it is wrong to assume following the caterpiller would work. Surely there must be a way!?
Let us revisit what we saw: without sorting pixels we were able to transform iteration of one of the two dimensions from 300^2 to 300*log(300). But the other dimension was stuck at 300^2. Then we did pixelization unidimensionalification, shrinking complexity from 300^4 to 300^2. This is without taking into account the query time.
Oh god I just realized something, the number of caterpiller pairs is actually extremely low. since the thicknesses vary from 1~1000. It means that the there are at most 2000 caterpiller pairs… For each caterpiller pair, you have a fixed thickness range, so you transformed your question as follows:
1
2
given a boolean array 300x300 representing "isInRange?"
find the biggest area of a contingous rectangle
Actually, swapping the min
, max
operation in the dynamical programming by and
, you can precache the question “Is this the top left corner of a 2^t sized square?” From which you can construct a quick query of . I believe that some more complicated construction might let you construct answers to the query: what is the biggest rectangle whose top right corner is you? And I believe it can be done in a single log(300) passage…
The overall complexity will be 2000*(300^2*log(300))
.
But I think I already surpassed too much the thinking time for this question. I should probably just give up and look at the answer…
tags: GKS - competition - python