Saturday, May 8, 2010

Fascinating world of NoSql

The word nosql has been around the tech corner for few months (may be years now..). I recently had a chance to explore some of them. I would say I am pretty impressed with the way things are progressing in the nosql world.
NoSql is a innovative idea to replace traditional RDBMS with disk bases storage mechanism. One big example will be Google's BigTable. Online giants like facebook uses nosql database as its backend storage.

There are several types of NoSql databases, some of them are meant to store documents, some of them for key value pair. Most of them follow JSon notation(Javascript Object Notation) to represent the data.

I explored MongoDB and redis. Yet to explore Cassandra. I really likes MongoDB for its easiness to use, configure and efficiency. Redis is a lightening fast written in pure C, but I was able to crash it in a Solaris 10 box after 10 million insertions with limited memory. I could not really explore much on MongoDB as it is not supported in Solaris Sparc platform.

Basically, the Json data gets converted to Binary data(BSon) in MongoDB and stored into disk. You can create tables as and when needed. The storage file will be created upon the first insert. Redis, on the other hand, holds the complete data in a single file. It supports upto 16 databases on single installation, but it can increased by a configuration parameter.
The disk updation is done on a single thread so the integrity is maintained.

Most of the NoSql databases comes with map reduce method for storage and retrieval. You can run more than one number of nodes and server in a clustered environment and take full advantage of cloud computing using NoSql. Cassandra is a very popular one in Java world, as it is completely written in Java. But I doubt about the performance. However, I will be playing around it as and when I get a chance.

NoSql has a growing chance in the agile programming world. I hope more and more innovations and stable products in this area. It is just a matter of corporates getting into this world to get rid of heavy license cost of traditional RDBMS.

Redis Home Page:
http://code.google.com/p/redis/

MongoDB:
http://www.mongodb.org/

Saturday, March 6, 2010

Schema spy

SchemaSpy is one cool open source tool I explored recently. I found it really useful. This tool can be used to analyze your database. It creates diagrams which depicts the table relations using dot binary in Graphviz. You can invoke this tool using the db server name, user name, passwd and schema name as arguments. It gives you a complete xml representation of your schema, a text file which has the insertion order and another text file with deletion order of tables in your schema. It also creates html pages with relational diagrams, anomalies, utility tables etc in your schema. The good news is it works well with almost all popular databases, since it uses DatabaseMetaData information from the Connection object. It is completely written in Java.

Thumbs up for this software !!! Go ahead and use it.. url is

http://schemaspy.sourceforge.net/

Sunday, February 21, 2010

Homomorphic Encryption feature of RSA algorithm

RSA algorithm is homomorphic in mulitiplication. What do you mean by that? Well we all know that RSA is a reversible encryption algorithm which is widely used, especially with ssh. It is a bit slow encryption algorithm as the logic is based on two probable prime number calculations. Let me give a flow of how RSA works:
PlainText ->Key-> Encryption
Now, the encrypted text is what being transmitted across. Decryption is done as follows:
Encryption->Key->Decryption
There is public and private keys for RSA. I am not going in detail about it, but I can share the source code I used to prove that RSA is homomorphic in multiplication.
Homomorphic property of RSA states that if you multiply the encrypted value by n and decrypt with the same key, you will get the result as original value * n.

For example, if the plain text is 10 and if you multiply the encrypted text with 2 and then decrypt, you will get 20.
There is a catch here: A mere multiplication will not work. The multiplication should be done with the encrypted value of n.
ie, Let us take plain text as x
E(x) denotes the encrypted value of x
D(x) denotes the decrypted value of x.

Then,

D ( E(x) * E(n) ) = x*n

But D ( E(x) * n ) != x * n

Here is the source code for your reference:

import java.math.BigInteger;
import java.security.SecureRandom;
   

public class RSAHomomorphic {
  private final static BigInteger one = new BigInteger("1");
  private final static SecureRandom random = new SecureRandom();

  private BigInteger privateKey;
  private BigInteger publicKey;
  private BigInteger modulus;

  // generate an N-bit (roughly) public and private key
  RSAHomomorphic(int N) {
  BigInteger p = BigInteger.probablePrime(N/2, random);
  BigInteger q = BigInteger.probablePrime(N/2, random);
  BigInteger phi = (p.subtract(one)).multiply(q.subtract(one));

  modulus = p.multiply(q);  
  publicKey = new BigInteger("65537"); // common value in practice = 2^16 + 1
  privateKey = publicKey.modInverse(phi);
  }


  BigInteger encrypt(BigInteger message) {
  return message.modPow(publicKey, modulus);
  }

  BigInteger decrypt(BigInteger encrypted) {
  return encrypted.modPow(privateKey, modulus);
  }

  public String toString() {
  String s = "";
  s += "public = " + publicKey + "\n";
  s += "private = " + privateKey + "\n";
  s += "modulus = " + modulus;
  return s;
  }
 
  public static void main(String[] args) {
 
  RSAHomomorphic key = new RSAHomomorphic(100);
  System.out.println(key);
 
  BigInteger x1 = new BigInteger("100");
  BigInteger x2 = new BigInteger("2");

  BigInteger enc_x1 = key.encrypt(x1);
  BigInteger enc_x2 = key.encrypt(x2);
  BigInteger dec_x1 = key.decrypt(enc_x1);
  BigInteger dec_x2 = key.decrypt(enc_x2);
   
 
  BigInteger homomorphic = enc_x1.multiply(enc_x2);
  BigInteger dec_h = key.decrypt(homomorphic);
   
  System.out.println("x1 = " + x1);
  System.out.println("x2 = " + x2);
   
  System.out.println("E ( x1 ) = " + enc_x1);
  System.out.println("E ( x2 ) = " + enc_x2);
  System.out.println("E ( x1 ) * E ( x2 ) = " + homomorphic);
  System.out.println("D ( E ( x1 ) * E ( x2 ) ) = " + dec_h);
  }
}

The output would be as:

$ java RSAHomomorphic 
public = 65537
private = 237497666793168261168967449473
modulus = 895820695748138208858686014249
x1 = 100
x2 = 2
E ( x1 ) = 867737439426455424754448444175
E ( x2 ) = 274813293643034418407848031465
E ( x1 ) * E ( x2 ) = 238465783746157286286789717700937490114748844108605695966375
D ( E ( x1 ) * E ( x2 ) ) = 200


Sorting million records with minimum memory

I encountered this problem of sorting a million records of same size on a unix box. My unix box is often overloaded with lot of processes, so I had to do it with a less amount of user and process memory. As usual, the time for implementation is quite less. So following is the method I adopted:

1. Read the input file which the complete input values, each line has one big number. Basically, each line is a big integer representation of a certain AES output value. Big Integer is created with the open source GMP library.
2. Read the first 4 digits and append the line to a file named based on the first 4 digits. For instance, if the the line is 8923523345012545454646,
It is written out to a file called 8923.ref. Any other number starting with the same 4 digits will be appended to the same file.
3. Repeat this process until every line is processed. This might created utmost 10K files. I wrote this code in C, so to make the file writing fast, a character array of large size is used to minimize the number of file writes.
4. Sort each and every file using any sorting method. If the items are of fixed size, sort utility in unix can be used.
5. You are only left with a cat operation of all files in the ascending order to a single file now.

This may not be a very efficient way, but very quick and flexible and requires very less memory. Choosing the file name based on n first digits should be done by analysing the input data. You may want to reduce or increase the number of digits based on the input data you have.

Start

Hi,This blog is for all the IT professionals around the world who has a passion for programming. I would like to keep posting about the technical tips that I have which might somebody else in this world. After all, programming is more about searching.Sorry, I haven't introduced me yet, I am a humble software professional based in Hyderabad, India. I like to dream about better design and ideas. Hope this blog would be of some help to people. Please feel free to contribute.