By analyzing the executable using PE bear, it becomes evident that it solely imports three DLLs: kernel32.dll, user32.dll, and gdi32.dll. This suggests that either the executable is packed or some technique has been employed to obfuscate the API calls. Upon opening the executable in IDA, we promptly identify the utilization of the API hashing technique.
Resolving API Hashes
There are two functions available for hash computation. The first function calculates the hash of a DLL name, while the second function computes the hash from a function name. Not all hashes are the exact hashes of a dll or function.
It is important to note that not all the values in the array shown above represent the actual hashes of DLLs or functions. Many of these values result from performing an XOR operation between the original hash and a specific number. Now, let’s write few python scripts to extract the function names associated with those hashes.
withopen("exports.yaml","w") as f: for dll in dlls: dllPath = system32_path+dll pe = pefile.PE(dllPath) dllExports = pe.DIRECTORY_ENTRY_EXPORT.symbols fnnames = [] for symbol in dllExports: try: fnname = symbol.name.decode("utf-8") fnnames.append(fnname) except Exception: continue yaml.safe_dump({dll:fnnames},f)
Once the exports.yaml file has been generated, we can proceed to write a script that reads function names from it, calculates function hashes, and resolves all the functions accordingly.
defcompute_dll_hash(dllname,num): dllname = [ord(i) for i in dllname] dllname.append(0) for c in dllname: if c>=65and c<=90: c += 32 num = c + ror(num,13,32) return num
defcompute_fn_hash(fnname,num): fnname = [ord(i) for i in fnname] fnname.append(0) for c in fnname: num = c + ror(num,13,32) return num
# Location of all arrays containing the hashes hashTables = [0x405AFC,0x405BBC,0x405C80,0x405CDC,0x405D14,0x405D50,0x405D64,0x405D84,0x405DB0,0x405DC0,0x405DCC,0x405DE4,0x405E14,0x405E2C]
# Location where the resolved addresses of functions will be stored outputTables = [0x4112AC,0x411368,0x411428,0x411480,0x4114B4,0x4114EC,0x4114FC,0x411518,0x411540,0x41154C,0x411554,0x411568,0x411594,0x4115A8]
# In each array of hashes, the first entry stores the hash of the DLL, XORed with the value 0x22065FED. The function hashes begin from the second element. # The last element of each array of hashes is 0x0CCCCCCCC
end = 0x0CCCCCCCC
withopen("exports.yaml","rb") as f: data = yaml.safe_load(f.read()) dllNames = list(data.keys()) storedDllHashes = [idc.get_wide_dword(arr) for arr in hashTables] for dllName in dllNames: print(f"Trying {dllName}") hash = compute_dll_hash(dllName,0)^xorval_1 ifhashin storedDllHashes: index = storedDllHashes.index(hash) itr = hashTables[index]+4 itr2 = outputTables[index]+4 dllfns = data[dllName] actualDllhash = hash^xorval_1 dllfnHashes = [compute_fn_hash(fnname,actualDllhash) for fnname in dllfns] ida_name.set_name(outputTables[index],dllName) ida_name.set_name(hashTables[index],f"{dllName}_hashes") while(idc.get_wide_dword(itr)!=end): currentHash = idc.get_wide_dword(itr)^xorval_1 if currentHash in dllfnHashes: ind = dllfnHashes.index(currentHash) ida_name.set_name(itr2,f"fn_{dllfns[ind]}") itr+=4 itr2+=4
print("All functions resolved")
Running this script in IDA gives us the following results:
Likewise, we can write a script that, given a hash, simply prints the corresponding API name.
defcompute_dll_hash(dllname,num): dllname = [ord(i) for i in dllname] dllname.append(0) for c in dllname: if c>=65and c<=90: c += 32 num = c + ror(num,13,32) return num
defcompute_fn_hash(fnname,num): fnname = [ord(i) for i in fnname] fnname.append(0) for c in fnname: num = c + ror(num,13,32) return num
defget_function_name_from_hash(hash,xorVal=0): withopen('exports.yaml','rb') as f: data = yaml.safe_load(f.read()) dllNames = list(data.keys()) for dllName in dllNames: fns = data[dllName] dllHash = compute_dll_hash(dllName,0) fnhashes = [compute_fn_hash(fn,dllHash) for fn in fns] ifhash^xorVal in fnhashes: return fns[fnhashes.index(hash^xorVal)] return""
The third condition checks whether the current process token belongs to the builtin administrators group or not.
If these conditions are met, the malware continues its execution otherwise it attempts a UAC bypass.
UAC Bypass
The malware conducts a string decryption process by XORing specific values with 0x22065FED, resulting in the decrypted string dllhost.exe. It is also known as the COM Surrogate process. Further, it calls LdrEnumerateLoadedModules and passes the string Elevation:Administrator!new:{3E5FC7F9-9A51-4367-9063-A120244FBEC7} to the function CoGetObject which is used for UAC bypass.
Normal Execution without UAC Bypass
The malware employs a string decryption routine and subsequently transfers the decrypted values to APIs associated with the Windows registry. Let’s develop a code snippet that automates the string decryption task.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
import binascii defdecrypt_string(buffer,xorKey): answer = "" buffer = [i^xorKey for i in buffer] for item in buffer: try: val = binascii.unhexlify(hex(item)[2:]).decode("utf-8")[::-1] answer += val except Exception as e: pass return answer
It reads the value MachineGuid from the registry key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Cryptography, hashes it thrice, reverses the order of bytes and then encodes the hash using base64 encoding. In the resulting base64 string, it replaces + with x, / with i and = with z. It performs a string decryption to get the string %s.README.txt, replaces the format specifier with the base64 string generated earlier by calling the function swprintf and computes the hash of the resulting string <base64Data>.README.txt. It seems to be the ransom note where the malware will write a message for the victim. It checks whether the current user is LocalSystem. If the user is LocalSystem, it uses the current token otherwise it performs a series of steps:
It calls the API function NtQuerySystemInformation to retrieve system information.
It iterates through all processes in the system, verifying if the hash of some process name matches the value 0x3EB272E6. If such process is found, it returns its process id.
To maximize device coverage, this hash could represent a commonly found Windows process hash. Let’s write a script to find out a process name whose hash matches this value.
defcompute_hash(name,num): name = [ord(i) for i in name] name.append(0) for c in name: if c>=65and c<=90: c += 32 num = c + ror(num,13,32) return num
for name in process_names: if compute_hash(name,0)==0x3EB272E6: print(name) break
Running this script, we get the process name which is explorer.exe.
The malware calls NtOpenProcess to get a handle to the process explorer.exe and passes the prcoess handle to the function NtOpenProcessToken to retrieve the token. Further, it calls NtDuplicateToken to create a duplicate of the existing token. If the malware fails to retrieve the duplicated token, it performs a series of steps to get a token.
It calls the API function NtQuerySystemInformation to retrieve system information.
It iterates through all processes in the system, verifying if the hash of some process name matches the value 0xB7E02438. If such process is found, it returns its process id.
By running the script we wrote earlier to calculate the hashes of common process names, we obtain the process name as svchost.exe. The process id of svchost.exe is calculated and passed to the function NtOpenProcess with the PROCESS_ALL_ACCESS flag which gives a handle to the process. Further, it checks if the process svchost.exe is running as a 64-bit process. If yes, it calls a function at the offset 0x64cc from the image base.
The malware allocates memory of size 513 bytes using RtlAllocateHeap, and copies a global buffer into it.
Shellcode Extraction
Using PE bear, we notice that the offset 0x12000 corresponds to the virtual address of the .rsrc section.
The first dword stored in the .rsrc section is a seed and the second dword denotes the size of the encrypted data stored there.
1 2 3 4 5 6 7 8
int __stdcall keygen(unsignedint a1, int *seed) { int v2; // edx
The seed is modified with each call to this function, thereby returning a different value everytime. After that, it xors each byte of the encrypted data with the corresponding byte of the generated key. After that, it decompresses the decrypted data by using the atlib decompression algorithm. Let’s write a script to decrypt this encrypted buffer. The initial value of the seed comes from the first dword present in the .rsrc section i.e. 0xffcaa1ea.
defdecrypt_buffer(keystream,buffer): bufLen = len(buffer) decrypted = [] for i inrange(bufLen): x = buffer[i]^keystream[i] decrypted.append(x) returnbytes(decrypted)
defread_rsrc_section(): pe = PE(file) sections = pe.sections for section in sections: ifb".rsrc"in section.Name: desiredSection = section break return desiredSection.get_data()
defget_dwords_from_byte_buffer(byteBuffer): dwordsBuf = [] for i inrange(0,len(byteBuffer)-4,4): data = byteBuffer[i:i+4] dwordsBuf.append(int.from_bytes(data,"little")) return dwordsBuf
defgenerate_key(seed,bufLen): key = b'' newSeed = seed for i inrange(0,bufLen,4): value,newSeed = zipcrypto_lcg(seed,newSeed) key += struct.pack("<I",value) return key
defdecrypt_buffer(buffer,keystream,size): decrypted = [] for i inrange(size): decrypted.append(buffer[i]^keystream[i]) returnbytes(decrypted)
We can see that the decompressed data contains various base64 blobs separated by null bytes. Continuing our analysis, we quickly figure out that the malware reads and decodes these base64 encoded chunks. Let’s enhance the existing script by incorporating a function that enables the decoding of the decompressed data.