Malware Phenotypes denote observable characteristics of individual malware such as specific observal patterns in code, string, header, and api, etc. Given a piece of unknown malware, the proposed system is able to quickly decompose the malware into existing known components or malware families based on the observable characterstics in code, string, header, api, and import modules, etc. These characteristics provide malware reverse engineers a natural and intuitive information to distinguish one malware family from the others. The proposed research under this direction aims at developing a scalable reality malware Phenotype retrieval system for security analysts. This problem is challenging given the time constrains as well as the number of malware being collected perday. In large organizations or security companies, this number can exceed 300,000 unique samples per day. Having an accurate matching system, scalable and efficient index, and optimized storage requirements is the key to have a practical and successful system for this problem.