In the past few weeks, I suddenly recalled of an interesting problem I faced during my days in Competitive Programming world. It was called “Make*me-an+[integer!]“, a problem posted in Internet Problem Solving Contest 2015 problemset (link to problem)
The gist of the problem statement is as such:
Output a list of valid ECMA-262 expression of the number 0 to 1000 (inclusive) using only the characters !, [, ], +, –, and/or *, in which the correctness is determined by the value and the type of the expression itself.
For example, !! evaluates to the number 0 (this is a valid output), and +!!+[+] evaluates to string “10” (this is invalid, as the output type need to be evaluated as type number too).
The problem has two subtasks: the easy subtask is to produce the outputs where each expression uses no more than 200 characters; and the hard subtask limits the expression to use no more than 75 characters.
During the competition itself, I solved the easy subtask, but now I was quite interested in solving the hard subtask because … why not? Hahaha. In total, I spents around 2 weeks (including the many off days in between as I wasn’t in the mood for coding).
So, let’s get into the problem.
What should we do first at this stage? I think it’s pretty easy to start by observing the patterns and thinking for a solution.
- All expressions can be formed using a form of counting (1+1+1+…+1). For example, 3 could be represented as +!!+!!+!! (which means 1+1+1). Pretty simple huh.
- Some expression can be formed using the digit concatenation. For example, 10, can be represented as +[+!!+[+]] (which means concatenate “1” with “0”). It’s now getting more complicated.
So far, using that two observations only, I was able to tackle the easy subtask, producing number expression from 0 to 1000 using no more than 200 characters, and this is my approach during the competition itself. As you might have noticed, this observations did not make use of subtraction nor multiplication at all. So, in this attempt, can we do better?
Since we are allowed to make use of subtraction and multiplication, hence we need to try that out. So, let’s just brute force it.
Another interesting observation that come out is that, after doing subtraction and multiplication, sometimes, it is more efficient to concatenate a 2-digit number and 1-digit number. For example, 591 can be represented as something that means “(60-1)”+”1”.
Tests are fairly easy to write in this problem, because we know the inputs, and we know what are the expected output, also checking the constraints. I also imagine that the function that I would implement takes in a number as an argument, and returns a string expression of that number.
So the tests I write are to loop from 0 to 1000, and for each number, I call the function to get the string expression. For each number and expression:
- I check if the string expression can be evaluated to the number, checking the value and type of this evaluated expression
- I check if the string expression has no more than 75 characters.
This way, I have generated a total of 2002 test cases.
Initially, I was thinking that problem is a Dynamic Programming Coin Change problem, but I was totally wrong and ended up with the solution as follows.
Basically brute force, in multiple passes. Each pass update the current mapping of number to the most efficient expression yet. Reason of running it this way is noted below, in short, is due to a infinite recursion and hence need to find a way to not be trapped into this hole.
So, since I run it in multiple passes, at the initial pass, for each number, I initialize the expression with “1+1+…+1” since I believe strongly that this is the most naive expression and can be further optimized.
On each pass, for each number, I update the expression using this two techniques:
- Generate current number using other numbers
- Optimize current expression
This part is basically a complete search or “brute force”. Methods tried include:
- Try digitizing: Idea is that the expression for 10 can be formed using “1” and “0”
- Try grouped digitizing: Idea is that the expression for 561 can be formed using “56” and “1” or “5” and “61”
- Try subtraction: Idea is that the expression for 99 can be formed using expression of 100 minus 1
- Try multiplication: Idea is that the expression for 6 can be formed using expression of 2 multiply by 3
Then, out of this four candidates of expression (plus the current expression), choose the one that gives the shortest string length and save it at the mapping.
Reason of running it on passes is because if not, generation of expression using subtraction could result in infinite recursion as for number N, it will try to form (N + 1) – 1, hence recursing into (N + 1), but here, (N + 1) hasn’t been formed before, hence it will recurse into (N + 1 + 1), and so on and so on. Running in multiple passes solves this issue, as on each pass, it will just simply use the expression generated on the previous pass.
After generating the expressions, this step is to try out some techniques to shorten the current string without changing the “meaning” (in the sense of how that number is formed)
Methods tried includes:
- Substituting minus sign into the brackets. In the Generation step, the process of expressing 100-3 is something like “1” + “0” + “0” – (1 + 1 +1). Then the idea of this is to write it as “1” + “0” + “0” – 1 – 1 – 1.
- Removal of unnecessary plus sign. In the Generation step, it focuses on generating numbers that evaluates correctly as numbers, that’s why sometimes if the resulting expression is not number, I put an extra plus sign in front to cast the expression to number, but this could led to unnecessary plus sign especially if it appears in front of a bracket expression. For example “5” + “1” ( “(+1+1+1+1)” + “(+1)” ) could be written as “(1+1+1+1)” + “(+1)”.
- Removal of redundant brackets. The process of generating numbers could led to numerous redundant brackets, like “5” + “1” (is actually means “(+1+1+1+1)” + “(+1)”). The idea is to strip it off, so it become something like “+1+1+1+1” + “(+1)” .
Multiple optimization method configurations are tried until it produces satisfactory result. The configuration used in the final program are: method 1, method 2, method 3, method 2, and finally method 3. This is because after running method 3, it could produce a result where there is unnecessary plus sign (and need to run method 2), and vice versa. Hence method 2 and 3 are run twice after one another.
Since this solution runs in multiple passes, I didn’t know how many passes to run to find the sweet spot. In my testing, 3 passes are sufficient to satisfy all test cases (of being a valid number expression and expression length being less than or equal to 75).
This is actually quite a fun project to work on, and could be a bit frustrating, after implementing one after another optimization, yet there are still failing test cases. One of my regrets here is that this program, turns out to be quite messy (in terms of codes, as I initially didn’t expect to run in multiple passes), and it need to run the whole thing first (finding all expressions from 0 to 1000, in multiple passes) before one could use the final expression.
So yeah, that’s the description of the solution of this particular problem. I think it is very abstract, that’s why I invite you to just check the codes out, which “only” consists of 381 lines.