So I was talking to a friend of mine about some interesting job interview questions.  He mentioned a favorite of his:

You are writing an application which continuously receives integer values between 1 and 10,000.  Your job is, once the input stream ceases, to output each integer received in ascending order.  You don't know how many there will be, and you can only use a page (4KB) of memory.  What do you do?

So there is a pretty straightforward and simple method using the whole page as a giant bitmap.

I came up with a different solution.  This is still a little flawed, but the idea is pretty solid. Also, the code is pretty ugly... It was a proof of concept and so I just hacked away.

``````#include <stdio.h>
#define ARRAYSIZE 2000
#define OVERLAYCOUNT 32

// So we have them all stored in an array of short ints with 2000 items.
short x[ARRAYSIZE];

// Here's the data we are going to be adding in. &nbsp;
// Can be changed around, grown, shrunk
void populateX(){
int values = {
2,    3454, 6666, 1666, 1234,
1532, 532,  1235, 6543, 3434,
2432, 555,  232,  1555, 6432,
6789, 8767, 6766, 8766, 12
};
int i;
for(i = 0; i < 20; i++){
}
}

// When we receive a value, we xor it into it's bucket.
x[z%ARRAYSIZE] ^=z;
}

int main(){
populateX();
// each slot will be a xor of up to 5 numbers of
// the form i + 2000j
short i,j,k;
int imax = 10000/ARRAYSIZE;
for(i = 0; i < imax ; i++){
for(j = 0; j < ARRAYSIZE; j++){
short valueToCheck = x[j];
int a;
int l;
// walk through all 32 combinations of 5 xors,
//figure out which one it is

// this is a bit encoding of which numbers
// could be stored there.
for(l = 0; l < OVERLAYCOUNT; l++){
//printf("\tTrying %d\n",l);
short guessValues = 0;
// see if the least bit is set. If so,
// xor the new value calculated with
// our guess. increment a ( the multiplier)
// and shift b to get access to the next bit;

a = 0;
int b = l; // make a copy we can tweak
do {
if(b & 1){
short newValue =j + ARRAYSIZE*a;
guessValues = guessValues ^ newValue;
}
a++;
} while (b>>=1);
// if the generated value from this version of
// the bitmapping is equal to the value,
// we know which 5 numbers were entered.
if(guessValues == valueToCheck){
break;
}
}
int n = 0;
// In order to preserve ascending order, only printout
//the i'th lowest bit, if set
l>>=i;
if(l &amp; 1){
printf("%d\n",j + ARRAYSIZE*i);
}
}
}
}
``````

Output (newlines added for clarity):

``````2
12
232
532
555

1234
1235
1532
1555
1666

2432
3434
3454
6432
6543

6666
6766
6789
8766
8767
``````

#### Other Thoughts

Here I used the full 4KB to hold 2000 short ints. The bitmapping method uses 1250 bytes. Theoretically I could tighten this down to ~600 bytes by setting the overlay count to 2^32 and the array size to be 300 (at 2 bytes for a short int);

#### Known issues

It doesn't handle duplicates.

It is a little buggy, due to certain scenarios where a^b^c == e^d^f. This could probably be fixed by an additional check before printout. The check could be guessValue^b^c == a, as a^b^c^b^c == a and e^d^f^b^c !- a.

Interesting solution to say the least.

Thoughts?

#### Edit (years later)

I realize this is a (crappy) bloom filter. At the time I had no idea and am pretty proud of independently creating such a data structure.