10,000 ints in 1 page of memory (4KB)

So I was talking to a friend of mine about some interesting job interview questions.  He mentioned a favorite of his:

You are writing an application which continuously receives integer values between 1 and 10,000.  Your job is, once the input stream ceases, to output each integer received in ascending order.  You don't know how many there will be, and you can only use a page (4KB) of memory.  What do you do?

So there is a pretty straightforward and simple method using the whole page as a giant bitmap.

I came up with a different solution.  This is still a little flawed, but the idea is pretty solid. Also, the code is pretty ugly... It was a proof of concept and so I just hacked away.

#include <stdio.h>
#define ARRAYSIZE 2000
#define OVERLAYCOUNT 32

// So we have them all stored in an array of short ints with 2000 items.
short x[ARRAYSIZE];  
void addToX(int z);

// Here's the data we are going to be adding in. &nbsp;
// Can be changed around, grown, shrunk
void populateX(){  
  int values[20] = {
    2,    3454, 6666, 1666, 1234,
    1532, 532,  1235, 6543, 3434,
    2432, 555,  232,  1555, 6432,
    6789, 8767, 6766, 8766, 12
  };
  int i;
  for(i = 0; i < 20; i++){
    addToX(values[i]);
  }
}

// When we receive a value, we xor it into it's bucket.
void addToX(int z){  
  x[z%ARRAYSIZE] ^=z;
}

int main(){  
  populateX();
  // each slot will be a xor of up to 5 numbers of 
  // the form i + 2000j
  short i,j,k;
  int imax = 10000/ARRAYSIZE;
  for(i = 0; i < imax ; i++){
    for(j = 0; j < ARRAYSIZE; j++){
      short valueToCheck = x[j];
      int a;
      int l;
      // walk through all 32 combinations of 5 xors, 
      //figure out which one it is

      // this is a bit encoding of which numbers 
      // could be stored there.
      for(l = 0; l < OVERLAYCOUNT; l++){
        //printf("\tTrying %d\n",l);
        short guessValues = 0;
        // see if the least bit is set. If so, 
        // xor the new value calculated with
        // our guess. increment a ( the multiplier)
        // and shift b to get access to the next bit;

        a = 0;
        int b = l; // make a copy we can tweak
        do {
          if(b & 1){
            short newValue =j + ARRAYSIZE*a;
            guessValues = guessValues ^ newValue;
          }
          a++; 
        } while (b>>=1);
        // if the generated value from this version of
        // the bitmapping is equal to the value,
        // we know which 5 numbers were entered.
        if(guessValues == valueToCheck){
          break;
        }
      }
      int n = 0;
      // In order to preserve ascending order, only printout 
      //the i'th lowest bit, if set
      l>>=i;
      if(l &amp; 1){
        printf("%d\n",j + ARRAYSIZE*i);
      }
    }
  }
}

Output (newlines added for clarity):

2  
12  
232  
532  
555

1234  
1235  
1532  
1555  
1666

2432  
3434  
3454  
6432  
6543

6666  
6766  
6789  
8766  
8767  

Other Thoughts

Here I used the full 4KB to hold 2000 short ints. The bitmapping method uses 1250 bytes. Theoretically I could tighten this down to ~600 bytes by setting the overlay count to 2^32 and the array size to be 300 (at 2 bytes for a short int);

Known issues

It doesn't handle duplicates.

It is a little buggy, due to certain scenarios where a^b^c == e^d^f. This could probably be fixed by an additional check before printout. The check could be guessValue^b^c == a, as a^b^c^b^c == a and e^d^f^b^c !- a.

Interesting solution to say the least.

Thoughts?

Edit (years later)

I realize this is a (crappy) bloom filter. At the time I had no idea and am pretty proud of independently creating such a data structure.

comments powered by Disqus