Task details

What is the SHA256 hash of the file?

So, you’ve downloaded your first malware, grabbed a disassembler and a bunch of cool utilities. But what now? Where do you start your analysis? Different people have different approaches, techniques, tactics, weird quirks and so on - you get what I mean. It also depends on what exactly you want to achieve - some people reversing to make detections for antivirus products, some, for example, to extract an encryption algorithm to use it later for decryption, and some are just fucking around, like me. I think the important and most easiest part is getting the file’s hash.

But what is the meaning of these letters and numbers that are contained in the hash? In this case, you can treat the hash as a unique identifier of the program. There are different ways to use it - if you’re working on an antivirus product (hey, and if you actually are - why the fuck are you even reading this?), you could block files by their hashes. Also, this file might have already been analyzed by someone (security researchers) or something - just google it and maybe you won’t even have to reverse this file yourself. Or, people use them to write detection rules, but 99.9% of the time, don’t do this, please.

Also, pay attention that there are different hashing algorithms: md5, sha1, sha256.

Name the DLL whose functions are used by the malware to exfiltrate the stolen data. format: <dll_name.dll>

You’ve got the hash, so it’s time to dive into the actual internal analysis. I think at this stage you can get by with just a disassembler - Ghidra / IDA / etc, but you can also use other utilities for initial analysis like pe-bear, DiE, CFF Explorer, and so on. I won’t go into details here about dynamic imports, anti-analysis methods, etc. (but you’ll definitely run into them, including in the next tasks). Let’s assume the import table is fully available and we know that in advance.

To avoid reinventing a billion functions from scratch over and over again, you can simply write a program (a dynamic library https://learn.microsoft.com/en-us/troubleshoot/windows-client/setup-upgrade-and-drivers/dynamic-link-library) that exports ready-made functions to any program that wants to import them. That’s what a DLL is. That’s basically how the WINAPI works. But, it would be kind of insane to put everything into one massive DLL and export more than 10,000 functions; that’s why they’re split logically into different programs, usually with intuitive names.

For example, if you see bcrypt.dll — the Windows Cryptographic Primitives Library, you can infer that it’s used for encryption/decryption of some data. Maybe it’s a ransomware, maybe it’s encrypting transmitted data, maybe it’s trying to locally decrypt Chrome browser passwords like this. It’s not exact, but it’s definitely useful info early on.

Or ws2_32.dll — the Windows Socket 2.0 32-Bit DLL, which is used for working with sockets to send or receive data over the network.

By looking at the Imports in the program you’re analyzing, you can see what functions and libraries it uses, which already gives you some ideas. The import table is also a good indicator for some packers, or rather, its absence is or too few number of functions. But that’s a topic for another task.

It’s also worth mentioning the relationship between hashes and the import table — namely, the imphash, which provides a unique value for the import table. Sometimes you can use it to look for similar samples.

What is the address of the main function’s entry point? format: <0x4....>

Initial recon is done, so it’s time to dig into the actual code. Personally, I use IDA Free (sometimes Ghidra for firmware). After disassembling, we land at the start and immediately spot some interesting variables — ___security_cookie, and some intriguing function calls like __SEH_prolog4 and SEH_405333.

This is the first mistake all beginners make: jumping straight into the first function and assuming it’s part of the author’s logic. The key point is: not everything you see in the code was actually written by the program’s author (in most cases). The number and size of functions, as well as the size of the file, often depend on the programming language, the compiler, and its settings. Sure, this is the program’s entry point, but it’s not the start of the author’s actual logic. That’s why it’s important to identify the main function (though you don't have to do it right away; you can take the reverse approach, but this way is usually easier). There are many ways to do it, but these resources do a much better job of explaining than I could: video, paper

Even when you find the main function, not all functions inside it are written by the author or relevant for your research. But in C, you rarely see this functions (This is C program, you could check its language for example from DiE or pebear utility).

This is a pretty important topic since you’ll end up dealing not just with this language but also with Golang, Rust, etc. — all of which look pretty specific and have a lot of internal helper functions (like error handling) and complex branches. But with time, as you recognize the patterns of this “irrelevant” (for us) code, you’ll spend less and less time researching it and will filter it out quickly.

In which country will the malware not work?

A common story: why would “good guys” want to hack/steal/encrypt their own kind? For that, they might check the system language — this is relevant for everything from miners to ransomware. Usually these aren’t targeted attacks, and this very often happens through pirated software.

Pay special attention to the constants that are passed into WinAPI functions. For example, if you wrote a program that passes PROCESS_CREATE_PROCESS as an argument, since this is a DWORD data type (put simply and very roughly — WORD, DWORD, QWORD are just numbers of different sizes), you won’t see a string (PROCESS_CREATE_PROCESS) in the disassembler but its actual value: 0x0080. You can always check what type of variables a function accepts on the Microsoft website, which I’ll link in the next paragraph.

I recommend that, at first, you google every WinAPI function called (in IDA Free, for example, they’re highlighted in purple). Your go-to guide

Alternatively, just google the function name and follow the link to Microsoft. Here’s an example function description

This link will help you out with the current question

IMPORTANT! DO NOT FORGET THAT NUMBERS CAN HAVE DIFFERENT NUMERAL SYSTEMS (HEX or DEC)

The stealer mimics legitimate software - provide the name of the file it imitates.

Classic stuff: mimicry can take a lot of forms - a person can create a file, service, registry key, etc., with a made-up name/avatar/info. I could list examples for ages, but here are just a few:

Aoqin Dragon has used fake icons including antivirus and external drives to disguise malicious payloads
APT28 has renamed the WinRAR utility to avoid detection.
Bisonal dropped a decoy payload with a .jpg extension that contained a malicious Visual Basic script
Winter Vivern created specially-crafted documents mimicking legitimate government or similar documents during phishing campaigns

There are tons of techniques - you can check them out here and look at the subtechniques for T1036.

One of the more interesting and kind of amusing—techniques to me is RTLO (Right-to-Left Override). Try it yourself: click, click

But masquerading isn’t always used; sometimes it’s much simpler—exploit a vulnerability, get in, fuck shit up, leave a mess of indicators everywhere in the system, then don’t give a fuck what happens next.

The stealer implements a persistence mechanism. What is the registry key path where it persists? format: <HKLM\...\...>

Just like masquerading, persistence is one of the most important parts (again, for certain types of malware—spyware, stealers, etc.). There are tons of these techniques, and at this point, pretty much every Windows component has its own way of establishing persistence. It ranges from creating a service to placing implants in boot sectors aka bootkits

Read more here

I recommend focusing on file, registry, service and task operations.

The archive’s name is generated based on system information. Which system parameter is used?

System information is used everywhere, and there are tons of ways it can be applied - you can literally learn everything about a computer

https://attack.mitre.org/techniques/T1082/

A miner, for example, might check the CPU and GPU models, here’s a real-world miner that’s publicly available and often spread as “legit” software

For anti-vm techniques, a ransomware can look up the names of system components—since real machines and virtual machines often have very different device names (such as virtual network adapters). You’ll see more of this in later tasks.

In such cases, registry operations are common, as well as launching various Windows built-in tools; in the attack context, these are called LOLBINs. One example of LOLBIN is whoami, which shows the user and their group. But this applications aren’t limited to just collecting information, but that's another topic...

I recommend monitoring the registry and process launches.

Which process is killed by a malware before stealing data?

This approach is often used to evade security mechanisms, sometimes along with deleting and stopping services, though nowadays it’s not the most effective technique, since EDR and AV solutions can deal with this vector - they can simply protect their own processes right from the kernel, using a driver

The most common function chain for doing this looks like:

- CreateToolhelp32Snapshot ->
- Process32First ->
- Process32Next ->
- some comparison function ->
- OpenProcess ->
- TerminateProcess ->

Also, try to answer for yourself: why this process was killed? Here’s a hint - it’s pretty much the same reason a ransomware would kill certain processes.

Which application is the stealer targeting?

There are a ton of types of data to steal - email accounts, game launcher accounts, messengers, photos, browser history, credit cards, etc.
But all stealers that grab data from applications share something in common: they all work with files (and sometimes the registry and processes memory as well). Most often, your session data, passwords, personal info, etc. in third party apps, are stored in a specific Windows folder - AppData. Very simple stealer example, what checks Google password from this folder. So even if you don’t see any strings directly mentioning application names, but notice that the AppData folder appears somewhere (or there’s some code trying to get its path), in most cases that points to only two outcomes:

- the malware wants to persist somewhere inconspicuous (though honestly, it’s a fucking conspicuous)
- the malware wants to steal your data

Quick tip: every disassembler allows you to look for symbols and strings, so I recommend checking out the Strings view (in IDA, it’s under SubViews)

Or you can use the utilities mentioned earlier. Microsoft or more specifically, their badass employee Mark Russinovich, AKA MICROSOFT BIG BOY - also provides a strings utility

P.S. I highly recommend checking out Mark Russinovich’s work - all of his tools are extremely valuable and widely used by analysts.

Which program is used to create the archive?

The LOLBIN topic was mentioned, but let’s go into a bit more detail.

Imagine a situation where the victim’s computer has an antivirus that can wipe out malware in a second. Let’s say a stealer is trying to exfiltrate collected data - an antivirus would simply block it. But what if the data is sent not by the malware, but by a standard and widely used utility like curl, which is everywhere for data transfer. Of course, the file won’t get deleted - after all, it’s legit software. Here, what matters isn’t just what is being used, but how it’s being used.

And that’s not the only win for this technique; you also leave fewer prints and write less code by relying on existing utilities.

But in our case, simplicity is actually the point. Program discussed here is a very important subject for antivirus researchers.

To which domain are the stolen data sent?

Domains, IP addresses, and URLs are some of the most important indicators you can extract from a program or traffic, and they have a wide range of uses:

- Suricata Rules - for identifying network packets/requests and more

- Used by governments (or anyone authorized) to block access to a resource

- You can track which hosts were compromised in a domain. forensics specialists use this all the time

Surprisingly enough, you’ll almost always find this kind of information near network-related functions (fuck really?...)

Tip: to quickly find where a function is called in IDA, just open the IMPORTS view, find the function you’re interested in, and hit something like jump to xref or simply press X

Home Tasks Resources

M101 | Dumb Stealer

M101 | Dumb Stealer

Warning before downloading