Matching problem
Suppose you have n letters and n addressed envelopes that you need to match. You can find matches by pairing all letters with envelopes and a friend will tell you how many matches there are(not which).
What is the expected number of times you need to pair the letters with envelopes to get all matching? What is the maximum number of pairings you could need to find all matches? What would your strategy be to minimize the above numbers? Do they have the same solutions? For easier discussion let's use n=10 as an example although a general formula would be great. 
If we assume the envelopes are coded 1 to n, and letters are also coded 1 to n, but the pairings of these numbers are unknown, the maximum number of tries you need should be n! because you can simply try all the possible arrangements of the letters while fixing the envelopes.
That's for brute force. If you want something clever, I suggest something like the following:[LIST=1][*]Start with a random arrangement of letters in your fixed envelopes. Ask your magical friend for the number of matches.[*]Then you find your matches by switching two random letters (call them A and B) and comparing your number of matches.[INDENT][LIST][*]a) If the number increases by two, your letters were in each other's places, and are now in the correct places.[*]b) If the number increases by one, one of your letters got to the correct place, and you can find out which one, by switching one of your letters (A, B) with a randomly chosen letter C. Say we switch B and C.[INDENT][LIST][*]I. If the number increases by two, B and C were in each other's places, and thus the correct one was A. A, B, and C are now in the correct places.[*]II. If the number increases by one, A was the correct one, and one of B or C got to the correct place. You can find out which one by repeating step 2b for letters B and C.[*]III. If the number stays the same, A was the correct one, and neither B nor C is in the correct place. Repeat step 2.[*]IV. If the number lowers by one, B was the correct one, and you can return it and repeat step 2.[*]V. If the number lowers by two, B was the correct one, C was also in the correct place, and you can return them, then repeat step 2.[/LIST][/INDENT][*]c) If the number stays the same, neither one of your letters got to the correct place. Repeat step 2.[*]d) If the number lowers by one, one of your letters was already in the correct place. You can find out which one by switching them back and applying the same procedure as in step 2b.[*]e) If the number decreases by two, your letters were already in the correct places. Switch them back, and repeat step 2.[/LIST][/INDENT][/LIST] When you are certain that some letter is in its envelope, leave it there and never move it until you're done. That should do it. I am not sure if it's the best way to do it, but it should be a fairly good way to do it. As for the expected number of times you bother your friend, I don't know, and it's probable that someone else here will find a better algorithm or the number before I do either. 
I will demonstrate what I think is the worstcase scenario for n=5, using a bit of strategy in choosing the switches, such that I will try to not place the same letter twice to where I know it doesn't belong.
([x] denotes that letter is for sure in its correct envelope) Begin Step 1 > 2, 4, 5, 1, 3 Friend > 0 (call 1) Step 2 > 2, 3, 5, 1, 4 (34 switch) Friend > 0 (call 2) Step 2 > 5, 3, 2, 1, 4 (25 switch) Friend > 0 (call 3) Step 2 > 5, 1, 2, 3, 4 (13 switch) Friend > 0 (call 4) Step 2 > 5, 1, 4, 3, 2 (24 switch) Friend > 0 (call 5) Step 2 > 3, 1, 4, 5, 2 (35 switch) Friend > 0 (call 6) Step 2 > 3, 1, 4, 2, 5 (25 switch) Friend > 1 (call 7) Step 2 > 3, 5, 4, 2, 1 (15 switch) Friend > 0 (call 8) Step 2 > 3, 1, 4, 2, [5] (rollback) Step 2 > 3, 4, 1, 2, [5] (14 switch) Friend > 1 (call 9) Step 2 > 1, 4, 3, 2, [5] (13 switch) Friend > 3 (call 10) Step 2 > [1], 2, [3], 4, [5] (24 switch) At this point, you don't even need to call your friend, because you are certain you got it correct  If your correctness number is n2, you know you just have to switch the two remaining letters you didn't yet mark as correct. That is 10 calls for n=5 in the worst case. If someone finds a mistake in my example, please, point it out. 
4 Attachment(s)
My approach would be to formulate the problem as a 01 linear program.
We will have variables [TEX]$x_{ij} \in \{0,1\}$[/TEX] which are 1 if letter [TEX]$i$[/TEX] matches with envelope [TEX]$j$[/TEX] and 0, otherwise. Clearly, every assignment of letters to envelopes would fulfill the following constraints: [TEX]$\sum_{i\in I} x_{ij} = 1 \: \forall j \in J$[/TEX] [TEX]$\sum_{j\in J} x_{ij} = 1 \: \forall i \in I$[/TEX] If we solve this problem we will get a unique assignment and we can receive the number of correct guesses, which we can denote as [TEX]$c$[/TEX]. With that, we can add one new constraint every time this hint is given. Assuming our previous solution is given as a set of [TEX]$ij$[/TEX] pairs, denoted as [TEX]$S$[/TEX], this additional constraint can be expressed as follows: [TEX]$\sum_{ij \in S} x_{ij} == c$[/TEX] This constraint essentially says that the correct solution includes [TEX]$c$[/TEX] of the matchings in [TEX]$S$[/TEX]. I believe this is the strategy that uses the given information the most efficient way possible and minimizes the number of attempts needed to uncover all of the correct pairs. Unlike using only the information from the last (or maybe last two "rounds") it combines all knowledge it retrieves in this process. It is pretty hard to calculate the expected number of "rounds", but a simulation can help to quantify it. I have implemented this quickly in Python to get an idea. I have added a sample run to illustrate how this method choses the pairs in each round. It is clearly not what a human would do or in any way similar to the previous answer, but seems to be quite efficient. I'm attaching a graph showing the frequencies (+cumulative frequencies) of the problem being "solved" after a given number of rounds. It can be seen that, if we have 10 letters and 10 envelopes, the expected number of iterations seems to be somewhere around 10. In my experiment (2000 random experiments), the highest number of tries I needed was 15 (just once), but the most common number seems to be 10 (473 times out of 2000 experiments). I am also attaching the Python code (simulate.txt  seems like I can't upload .py files) in case anyone wants to experiment with it and adjust it to chose a different strategy. Out of curiosity, I have also done a quick experiment and have adjusted the number of letter/ envelope pairs to 20. Obviously, there are a lot more possible assignments. Nevertheless, the expected numbers seems to be somewhere around 27, which is still quite decent. 
I am looking at the code, but I still can't get how the heck does it decide what's best to continue with... How do the constraints work and how do they affect the next "prediction"?

2 Attachment(s)
I like your approach, Viliam. I would be really curious to see how it performs in a Monte Carlo simulation, but it isn't as easy to implement as my method.
You can even improve the performance a bit if you keep track of pairings that are wrong. For example, in step 2c, you learn about two numbers that are wrong and it would never make sense to switch either of them back again. In the example that you provided, you are doing one move that could be avoided: You already know (from the very first step) that 4 can't be in the second position, but you are doing a swap later in the process that brings it back into that position. By the way, after 8 steps, the solution is actually clear if you combine all of the knowledge you have gained in the process. To show this, I have created a 5x5 grid representing all possible pairs of letters/ envelopes. Whenever your "friend" tells you that a solution has 0 correct choices, you cross out a possible solution. After the first 8 steps, you got the 0correct scenario 7 times, so the grid is quite full already (see attached screenshot). Based on this, you can already deduce the correct solution for letter 4 and 5 and don't need to continue trying with those. Furthermore, you have had one solution that gave you 1 correct position: 3, 1, 4, 2, 5 Since from all the 0correct scenarios you know that the solution will be x, x, x, 4, 5, you can now deduce that 3, 1, 4, 2 is also incorrect. That crosses out one more square and tells you that the solution will be x, x, 3, 4, 5. Furthermore, for position 1, the candidates are either 1 or 4, but 4 is already taken. From that, the whole solution is know. Nevertheless, a pretty good strategy, but you can combine it with additional logic reasoning to reduce the number of steps. 
[QUOTE=Viliam Furik;579808]I am looking at the code, but I still can't get how the heck does it decide what's best to continue with... How do the constraints work and how do they affect the next "prediction"?[/QUOTE]
I'm not familiar with docplex which seems to be what the code uses to generate the specific guesses, but it seems like it probably just chooses some arbitrary permutation which has the given number of matches with every previous try  if your first try got 4 matches and second 5, then on third try it'll try some permutation that has 4 choices in common with the first and 5 with second. 
[QUOTE=Viliam Furik;579808]I am looking at the code, but I still can't get how the heck does it decide what's best to continue with... How do the constraints work and how do they affect the next "prediction"?[/QUOTE]
Viliam, not sure if you are familiar with linear programming/ constraint programming, but it might be a bit confusing at first and requires a slightly different way of thinking. This is certainly not the easiest example to get started with. The whole idea behind my approach is to describe the correct solution in terms of the knowledge we gain in every step. The knowledge you gain in one single step isn't valuable by itself, but the knowledge you gain in multiple steps combined, tells you a lot about the solution. Think of it as [B]ruling out[/B] solutions, rather than moving towards a better solution at every step. At the beginning, the possible solution space is [TEX]$n!$[/TEX]. With every iteration and every constraint that we add, we cut off some of the solutions and therefore reduce the solution space, until we are left with only one single unique solution. Let's look at a much easier example with n=3; let's assume the correct solution is [1, 2, 3] If our first guess is [3,1,2], the number of correct guesses is 0. Based on this, we can not only rule out [3,1,2], but also [3,2,1], [1,3,2] and [2,3,1] (because each of them has one position in common with a solution we know is completely wrong), and that's exactly what adding that constraint would do. About choosing the next solution: as I said, all that the constraints do is to reduce the solution space from [TEX]$n!$[/TEX] to a smaller number of solutions. So after adding a constraint, we just let the MIP solver chose one of the remaining solutions in the solution space at random. The whole idea isn't necessarily to chose a better solution in every iteration, so "predict the next solution" isn't quite the right wording. A common problem that is solved with a binary integer program is Sudoku. It is certainly much easier to understand than this. If you are curious about the method I am using here, this would probably be a good starting point: [url]https://www.mathworks.com/help/optim/ug/sudokupuzzlesproblembased.html[/url] 
[QUOTE=uau;579812]I'm not familiar with docplex which seems to be what the code uses to generate the specific guesses, but it seems like it probably just chooses some arbitrary permutation which has the given number of matches with every previous try  if your first try got 4 matches and second 5, then on third try it'll try some permutation that has 4 choices in common with the first and 5 with second.[/QUOTE]
Yes, you are completely right. And I realize this is a much more intuitive explanation than my attempt :bow: Think of docplex as a black box. There are plenty of other libraries I could have used: CBC, Gurobi, ORTools, one of many SAT solvers etc. 
Trying something that could be the correct answer given all previous ones (which I believe is what Walter's code does) is not always the optimal answer, either for average or worstcase behavior (at least worst case for remaining tries in a particular situation, I don't show this to necessarily affect the global worst case):
Suppose that n = 1000000. Your first random guess gets 999998 matches right (quite lucky). Now, if you keep trying guesses that could be the correct answer, they can only swap one pair from your initial guess. Since you can only change two places at once, you'd expect to need around 146000 tries to even try changing one of the incorrect positions (you'd try changing twice that number of places, and 50% chance that both the incorrect positions are outside that set). Some kind of bisection strategy should be able to do much better than that. 
[QUOTE=uau;579816]Trying something that could be the correct answer given all previous ones (which I believe is what Walter's code does) is not always the optimal answer, either for average or worstcase behavior (at least worst case for remaining tries in a particular situation, I don't show this to necessarily affect the global worst case):
Suppose that n = 1000000. Your first random guess gets 999998 matches right (quite lucky). Now, if you keep trying guesses that could be the correct answer, they can only swap one pair from your initial guess. Since you can only change two places at once, you'd expect to need around 146000 tries to even try changing one of the incorrect positions (you'd try changing twice that number of places, and 50% chance that both the incorrect positions are outside that set). Some kind of bisection strategy should be able to do much better than that.[/QUOTE] Interesting case! Would definitely be curious to see thoughts on how to handle this kind of situation better. I am not so sure. isn't 9999998 correct matches just as bad as only 2 correct matches in terms of information value? I did a few additional tests. Unfortunately it isn't practical to try my method with such a high n because the matrix gets too big, but if I use n = 100 and force exactly this kind of situation, I am on average only using about 30 tries. My feeling is telling me that that's pretty decent compared to what you would expect as an average runtime. So, with n = 1M, it is true that my approach would probably use over 100k tries in this worst case scenario (it needs to "touch" each wrong position at least once). However, what would the average case scenario look like? My gut feeling is telling me it would be somewhere in that range too. But might completely off on that estimate. It is pretty hard to imagine this game with such a lot of letters and envelopes... 
All times are UTC. The time now is 06:24. 
Powered by vBulletin® Version 3.8.11
Copyright ©2000  2021, Jelsoft Enterprises Ltd.