It is common knowledge that Android apps are distributed in the form of Android Application Package files, or APK files for short. However, most people look at an APK as a black box and have no idea how it's created or what's inside it. Even most app developers have only a superficial understanding of an APK's anatomy. In integrated development environments, such as Android Studio, all it takes is a single click to transform an Android project into an APK.
In this tutorial, we are going to dissect an Android app. In other words, we are going to open its APK and take a look at its contents. Additionally, because an APK is a binary file only meant to be read by machines, I'm also going to introduce you to a few tools you can use to translate its contents to a more human-readable form.
To follow along, you need:
- the latest version of the Android SDK
- an Android device or emulator running Android 4.4 or higher
1. Why Look Inside an APK?
Many people do it out of sheer curiosity. Others enjoy the ability to directly access the images, sounds, and other assets of their favorite games or apps. There are, however, more important reasons why you would want to look inside an APK.
If you've just started learning Android app development, there's a lot you can learn by looking inside APK files of popular apps or apps created by professionals. For example, by looking at the XML layout files of an app that looks good on multiple screen sizes, you can improve your own layout creation skills.
Apps downloaded from untrusted sources may contain malicious code. If you are already a skilled app developer, by disassembling such apps, you can look at their code to get a better understanding of what they are really doing under the hood.
2. How Is an APK Created?
There isn't much you can learn from an APK without a basic understanding of how it is created. In fact, the most important tools used to dissect an APK are also the tools used for to create one.
Android projects are primarily composed of Java source code, XML layouts, XML metadata, and assets, such as images, videos, and sounds. Before the Android operating system can use all those files, they need to be converted into a form it understands. This conversion involves a lot of intermediate tasks, which are usually referred to as the Android build process. The final output of the build process is an APK or Android Application Package.
In Android Studio projects, the Android Plugin for Gradle handles all the intermediary tasks of the build process.
One of the first important tasks is the generation of a file called R.java. This is the file that allows developers to easily access the project's layout and drawable resources in their Java code using numeric constants. To generate the file, a tool called aapt, which is short for Android Asset Packaging Tool, is used. The tool also converts all the XML resources, along with the project's manifest file, into a binary format.
All the Java files, including R.java, are then converted to class files using the Java compiler. As you might already know, class files consist of bytecode, which can be interpreted by a Java runtime engine. However, Android uses a special type of runtime called Android runtime (ART), which is optimized for mobile devices. Therefore, once all the class files have been generated, a tool called dx is used to translate the bytecode to Dalvik bytecode, a format that ART understands.
Once the resources and Java files have been processed, they are placed inside an archive file that is very similar to a JAR file. The archive file is then signed, using a private key that belongs to the app developer. These two operations are performed by the Gradle plugin without using any external tools. The developer's key, however, is obtained from a keystore managed by keytool.
Finally, a few optimizations are made to the archive file using the zipalign tool to make sure that the memory the app consumes while running is kept to a minimum. At this point, the archive file is a valid APK, which can be used by the Android operating system.
3. Analyzing the Contents of an APK
Now that you understand how APK files are created and used, let's open one and take a look at its contents. In this tutorial, we use the APK of an app called Sample Soft Keyboard, which comes pre-installed on the Android emulator. However, if you prefer to use a physical device, you can just as easily use the APK of any app you've installed on it.
Step 1: Transferring the APK to a Computer
To examine the contents of the APK, you must first transfer it from the emulator to your computer. Before you do so, you need to know the package name and absolute path of the APK. Use
adb to open a shell session on your emulator.
Once you see the shell prompt, use the
pm list command to list the package names of all the installed apps.
pm list packages
The package name of the app we are interested in is com.example.android.softkeyboard. You should be able to see it in the list. By passing the package name to the
pm path command, you can determine the absolute path of the APK.
pm path com.example.android.softkeyboard
The output of the above command looks like this:
Now that you know its path, you can exit the shell and transfer the APK to your computer using the
adb pull command. The command below transfers it to your computer's /tmp directory:
adb pull /data/app/SoftKeyboard/SoftKeyboard.apk /tmp
Step 2: Extracting the Contents of the APK
Earlier in this tutorial, you learned that an APK is nothing but a compressed archive file. This means that you can use your operating system's default archive manager to extract its contents. If you're using Windows, you might first have to change the extension of the file from .apk to .zip. After extracting the contents of the APK, you should be able to see the files inside the APK.
If you are an app developer, a lot of the files in the APK should look familiar. However, apart from the images in the res folder, the files are in a format you can't work with without the help of a few tools.
Step 3: Deciphering Binary XML Files
The Android SDK includes all the tools you need to analyze the contents of an APK. You learned earlier that
aapt is used to package XML resources during the build process. It can also be used to read a lot of information from an APK.
For example, you can use its
dump xmltree option to read the contents of any binary XML file in the APK. Here's how you can read a layout file called res/layout/input.xml:
aapt dump xmltree /tmp/SoftKeyboard.apk res/layout/input.xml
The output should look something like this:
N: android=http://schemas.android.com/apk/res/android E: com.example.android.softkeyboard.LatinKeyboardView (line=21) A: android:id(0x010100d0)=@0x7f080000 A: android:layout_width(0x010100f4)=(type 0x10)0xffffffff A: android:layout_height(0x010100f5)=(type 0x10)0xfffffffe A: android:layout_alignParentBottom(0x0101018e)=(type 0x12)0xffffffff
It's not XML, but, thanks to the indentation and labels like N for namespace, E for element, and A for attribute, you should be able to read it.
Step 4: Deciphering Strings
In the previous step, you saw that the deciphered XML has hexadecimal numbers instead of strings. Those numbers are references to strings in a file called resources.arsc, which represents the resource table of the app.
You can use the
dump resources option of
aapt to view the resource table. Here's how:
aapt dump --values resources /tmp/SoftKeyboard.apk
From the output of the command, you can determine the exact values of the strings used in the app. Here's the entry for one of the hexadecimal numbers in the XML:
resource 0x7f080000 com.example.android.softkeyboard:id/keyboard: t=0x12 d=0x00000000 (s=0x0008 r=0x00)
Step 5: Disassembling Dalvik Bytecode
The most important file in the APK is classes.dex. This is the file that is used by the Android runtime while running the app. It contains the Dalvik bytecode generated during the build process.
By disassembling this file, you can get information about the Java classes used in the app. To do so, you can use a tool called dexdump. With the following command, you can redirect the output of
dexdump to a file that can be opened by any text editor.
dexdump -d /tmp/classes.dex > /tmp/classes.dasm
If you open classes.dasm, you're going to see that it has hundreds of lines of low-level code that looks like this:
Needless to say, understanding it is very hard. Thankfully, you can change the output format of
dexdump to XML using the
-l option. With the following command, you can redirect its output to a file that you can open in a browser.
dexdump -d -l xml /tmp/classes.dex > /tmp/classes.xml
The amount of information available in the XML format is less, but it gives you a fair idea about the Java classes, methods, and fields present in the app.
In this tutorial, you learned how an APK is created and what it contains. You also learned how to use the tools available in the Android SDK to decipher the contents of APK files. There isn't a lot of documentation about these tools, but, since they are open source, you can try reading their extensively commented source code to learn more about them.
If you are looking for something easier to work with, you can try using popular third party tools like dex2jar, which generates more readable disassembled code, or JADX, a decompiler that can generate Java code.