Review of so library size optimization in Android

1. Background

The size of the application installation package affects the user’s download time, installation time, disk space and many other aspects. Therefore, reducing the size of the installation package is of great benefit to improving the user experience and download conversion rate. The Android application installation package is actually a zip file, which is mainly compressed by various types of files such as dex, assets, resources, so, etc. Currently, common packet volume optimization solutions in the industry are generally divided into the following categories:

Optimization for dex, such as Proguard, DebugItem deletion of dex, bytecode optimization, etc.;
Optimization for resources, such as AndResGuard, webp optimization, etc.;
Optimization of assets, such as compression, dynamic delivery, etc.;
Optimization for so, the same as assets, in addition to removing debugging symbols, etc.

With the widespread application of dynamic, end-intelligence and other technologies, after adopting the above optimization methods, the proportion of so in the installation package volume is still very high. We began to think about whether this part of the volume can be further optimized.

After a period of research, analysis and verification, we gradually worked out a solution that can further reduce the size of the so in the application installation package by 30% to 60%. This solution includes a series of purely technical optimization methods, which are low-intrusive to the business. Through simple configuration, it can be quickly deployed and effective. Currently, the Meituan App has been deployed online. In order to let everyone know what is happening and why, this article will start with the so file format and analyze what content can be optimized based on the file format.

2. so file format analysis

so is a dynamic library, which is essentially an ELF (Executable and Linkable Format) file. The internal structure of the so file can be viewed from two dimensions: Linking View and Execution View. The link view regards the main body of so as a combination of multiple sections. This view reflects how so is assembled and is the perspective of compilation and linking. The execution view treats the so body as a combination of multiple segments. This view tells the dynamic linker how to load and execute the so, which is the runtime perspective. Since the optimization of so focuses more on the compilation and linking perspective, and usually a segment contains multiple sections (that is, the link view decomposes so with a smaller granularity), we only discuss the link view of so here.

You can view the list of all sections of an so file through the readelf -S command. Refer to the ELF file format description. Here is a brief introduction to the sections involved in this article:

.text: stores compiled machine instructions. Most functions of C/C++ code are stored here after compilation. There are only machine instructions here, no information such as strings.
.data: stores some readable and writable variables whose initial value is not zero.
.bss: stores some readable and writable variables with an initial value of zero or uninitialized. This section only indicates the memory size required for runtime and does not occupy the size of the so file.
.rodata: stores some read-only constants.
.dynsym: Dynamic symbol table, which gives information about the symbols provided by the so (exported symbols) and the symbols that depend on the outside (imported symbols).
.dynstr: String pool, different strings are separated by ‘\0’ for use by .dynsym and other parts.
.gnu.hash and .hash: two types of hash tables for fast lookup of exported symbols or all symbols in .dynsym.
.gnu.version, .gnu.version_d, .gnu.version_r: These three sections are used to specify the version of each symbol in the dynamic symbol table, where .gnu.version is an array, The number of elements is the same as the number of symbols in the dynamic symbol table, that is, each element of the array has a one-to-one correspondence with each symbol in the dynamic symbol table. Each element of the array is of type Elfxx_Half, which means an index indicating the version of each symbol. .gnu.version_d describes the versions of all symbols defined by this so, for indexing by .gnu.version. .gnu.version_r describes the versions of all symbols that this so depends on, and is also indexed by .gnu.version. Because different symbols may have the same version, using this index structure can reduce the size of the so file.

Before optimizing, we need to have a clear understanding of these sections and the relationship between them. The following figure more intuitively shows the relationship between the various sections in so (only the sections involved in this article are drawn here):

Figure 1 Schematic diagram of so file structure

Combined with the picture above, we understand the structure of the so file from another perspective: Imagine that we put all the function implementations in .text, and the instructions in .text will read .rodata Read or modify the data in .data and .bss. It seems that these contents in so are enough. But how are these functions executed? In other words, it is not enough to load these functions and data into memory. These functions can only be effective if they are actually executed.

We know that if we want to execute a function, we just need to jump to its address. So how does the external caller (module outside the so) know the address of the function it wants to call? This involves a function ID issue: the external caller gives the ID of the function that needs to be called, and the dynamic linker (Linker) finds the address of the target function based on the ID and informs the external caller. Therefore, the so file also needs a structure to store the “ID-address” mapping relationship. This structure is all the exported symbols of the dynamic symbol table.

Specific to the implementation of the dynamic symbol table, the type of ID is “string”. It can be said that all exported symbols of the dynamic symbol table constitute a “string-address” mapping table. After the caller obtains the address of the target function, prepare the parameters and jump to the address, then the function can be executed. On the other hand, the current so may also need to call functions in other so (such as read, write, etc. in libc.so). The imported symbols of the dynamic symbol table record the information of these functions. Before the functions in so are executed, the dynamic linker The address of the target function will be filled in to the corresponding location for use by the so. Therefore, the dynamic symbol table is the “bridge” connecting the current so with the external environment: exported symbols are for external use, and imported symbols declare the external symbols that the so needs to use (Note: In fact, the symbols in .dynsym can also represent variables, etc. Other types are similar to function types and will not be described here).

Combined with the so file structure, next we start to analyze what content in so can be optimized.

3. so can optimize content analysis

Before discussing what can be optimized by so, let’s first take a look at the strip optimization (removing debugging information and symbol tables) of the so volume made by the Android build tool (Android Gradle Plugin, hereinafter referred to as AGP). When AGP compiles so, it first generates so with debugging information and symbol table (task name is externalNativeBuildRelease), and then strips the newly generated so with debugging information and symbol table, and finally packages it into apk or aar. so(task named stripReleaseDebugSymbols).

The function of strip optimization is to delete the debugging information and symbol table in the input so. The symbol table mentioned here is different from the “dynamic symbol table” mentioned above. The section name of the symbol table is usually .symtab. It usually contains all the symbols in the dynamic symbol table, and there are many additional symbols. As the name suggests, debugging information is information used to debug the so. It is mainly sections whose names begin with .debug_. Through these sections, the mapping relationship between each instruction of the so and the source code file can be established (that is, each instruction in the so can be mapped. The command finds its corresponding source code file name, file line number and other information). It is called strip optimization because it actually calls the strip command provided by the NDK (the parameter used is –strip-unneeded).

Note: Why does AGP compile so with debugging information and symbol table first, instead of directly compiling the final so (by adding the -s parameter, you can directly compile so without debugging information and symbol table) of)? The reason is that you need to use so with debugging information and symbol tables to restore the crash call stack. The so that has deleted the debugging information and symbol table can run normally, but when it crashes, it is only guaranteed to obtain the position of the corresponding instruction in the so for each stack frame of the crash call stack, and not necessarily the symbols. But when troubleshooting crash issues, we want to know where in the source code so crashes. so with debugging information and symbol tables can restore each stack frame of the crash call stack to its corresponding source code file name, file line number, function name, etc., which greatly facilitates the troubleshooting of crash problems. So, although so with debugging information and symbol tables will not be packaged in the final apk, it is very important for troubleshooting problems.

By turning on strip optimization, AGP can greatly reduce the size of so, even by more than ten times. Taking a test so as an example, the final so size is 14 KB, but the corresponding so size with debugging information and symbol table is 136 KB. However, during use, we need to note that if AGP cannot find the corresponding strip command, it will directly package the so with debugging information and symbol table into apk or aar, and the packaging will not fail. For example, when the strip command corresponding to the armeabi architecture is missing, the prompt message is as follows:Unable to strip library ‘XXX.so’ due to missing strip tool for ABI ‘ARMEABI’. Packaging it as is.

In addition to the above-mentioned optimizations made by the Android build tool for so volume by default, what other optimizations can we do? First, clarify our optimization principles:

Consider reducing the content that must be retained to reduce the volume occupied;
Delete content that does not need to be retained directly.

Based on the above principles, so can be further optimized from the following three aspects:

Simplified dynamic symbol table: As mentioned above, the dynamic symbol table is the “bridge” that connects so to the outside world. The export table is equivalent to the interface exposed by so to the outside world. Which interfaces must be exposed to the outside world? In Android, most so are used to implement Java’s native methods. For this kind of so, it is only necessary to allow the application to obtain the function address corresponding to the Java native method when running. To achieve this goal, there are two ways: one is to use RegisterNatives to dynamically register Java native methods, and the other is to define java_*** style functions and export their symbols according to the JNI specification. The RegisterNatives method can detect method signature mismatches in advance and reduce the number of exported symbols, which is also recommended by Google. So in the best case, you only need to export the two symbols JNI_OnLoad (where RegisterNatives is used to dynamically register Java native methods) and JNI_OnUnload (which can do some cleanup work). If you do not want to rewrite the project code, you can also export java_*** style symbols. In addition to the above types of so, the remaining so are usually dynamically dependent on other applied so. For this type of so, it is necessary to determine which symbols of all the so that dynamically depend on it, and only keep these dependent symbols. In addition, the symbol table item and the implementation body should be distinguished here. The symbol table item is the corresponding Elfxx_Sym item in the dynamic symbol table (see the figure above), and the implementation body is the one in .text, .data, .bss, .rodata, etc. or other partial entities. When a symbol table entry is deleted, the implementation body does not necessarily have to be deleted. Combined with the above so file structure diagram, it can be estimated that the reduced size of so after deleting a symbol table entry is: symbol name string length + 1 + Elfxx_Sym + Elfxx_Half + Elfxx_Word.
Remove useless code: In actual projects, there are some codes that will never be used in the Release version (such as historical code, code used for testing, etc.), these codes are called DeadCode. According to the above analysis, only all codes that are directly or indirectly referenced by the exported symbols of the dynamic symbol table need to be retained. All other remaining codes are DeadCode and can be deleted (Note: In fact, special codes such as .init_array The code involved in section must also be retained). The potential gain from removing useless code is greater.
Optimization instruction length: The instructions to implement a certain function are not fixed. The compiler may be able to use fewer instructions to complete the same function, thereby achieving optimization. Since instructions are the main component of so, the potential benefits of optimizing this part are also relatively large.

so The optimizable content is shown in the figure below (the deletable part is marked with a red background, and the optimizable part is .text). Among them, funC, value2, value3, and value6 are used by the parts that need to be retained, so their implementation bodies need to be retained. , only its symbol table entries can be deleted. funD, value1, value4, value5 can delete the symbol table entry and its implementation body (note: because the implementation body of value4 is in .bss, and .bss does not actually occupy the volume of so, so deleting the implementation body of value4 does not Will reduce the volume of so).

Figure 2 so can be optimized part

After determining what can be optimized in so, we also need to consider the timing of optimization: should we directly modify the so file, or control its generation process? Considering the risk and difficulty of directly modifying the so file, it is obviously safer to control the generation process of so. In order to control the generation process of so, we first briefly introduce the generation process of so:

Figure 3 So file generation process

As shown in the figure above, the so generation process can be divided into four stages:

Preprocessing: Expand the include header file to the actual file content and replace the macro definition.
Compile: Compile the preprocessed files into assembly code.
Assembly: Assemble the assembly code into an object file, which contains machine instructions (in most cases, machine instructions, see the LTO section below) and data as well as other necessary information.
Link: Link all input target files and static libraries (.a files) into so files.

It can be seen that the output generated by the preprocessing and assembly stages for specific inputs is basically fixed, and the optimization space is small. Therefore, our optimization plan is mainly to optimize the compilation and linking stages.

4. Introduction to optimization plan

We have investigated all solutions that can control the final so volume, verified their effects, and finally summarized a more general feasible solution.

4.1 Simplified dynamic symbol table

Use visibility and attribute to control symbol visibility

Global symbol visibility can be controlled by passing -fvisibility=VALUE to the compiler. VALUE often takes the values default and hidden:

default: Unless symbol visibility is specifically specified for a variable or function, all symbols are in the dynamic symbol table. This is also the default value when -fvisibility is not used.
hidden: Unless symbol visibility is specifically specified for a variable or function, all symbols are invisible in the dynamic symbol table.

How to configure CMake project:set(CMAKE_C_FLAGS “${CMAKE_C_FLAGS} -fvisibility=hidden”)
set(CMAKE_CXX_FLAGS “${CMAKE_CXX_FLAGS} -fvisibility=hidden”)

Configuration method of ndk-build project:LOCAL_CFLAGS += -fvisibility=hidden

On the other hand, for a single variable or function, its symbol visibility can be specified through attributes. The example is as follows:__attribute__((visibility(“hidden”)))
int hiddenInt=3;

Its common values are also default and hidden, which have similar meanings to the visibility method and will not be described again here.

The priority of symbol visibility specified by attribute mode is higher than the visibility specified by visibility mode. It is equivalent to visibility being a global symbol visibility switch and attribute mode being a visibility switch for a single symbol. The combination of these two methods can control the visibility of each symbol in the source code.

It should be noted that the above two methods can only control whether the variable or function exists in the dynamic symbol table (that is, whether to delete its dynamic symbol table entry), but will not delete its implementation.

Use the static keyword to control symbol visibility

In the C/C++ language, the static keyword has different meanings in different scenarios. When static is used to mean “this function or variable is only visible in this file”, then the function or variable will not appear in the dynamic symbol table. But only its dynamic symbol table entries will be deleted, not its implementation. The static keyword is equivalent to enhanced hidden (because functions or variables declared statically are only visible to the current file during compilation, while functions or variables declared hidden only do not exist in the dynamic symbol table and are still visible to other files during compilation) . In project development, it is a good habit to use the static keyword to declare a function or variable “visible only in this file”, but it is not recommended to use the static keyword to control symbol visibility: you cannot use the static keyword to control a symbol visible in multiple files. Symbol visibility of a function or variable.

Use exclude libs to remove symbols from static libraries

The above visibility method, attribute method and static keyword all control the visibility of symbols in the project source code, but cannot control whether the symbols in the dependent static library exist in the final so. exclude libs is used to control whether the symbols in the dependent static library are visible. It is a parameter passed to the linker, which can make the symbols of the dependent static library not exist in the dynamic symbol table. Similarly, only symbol table entries can be deleted, and the implementation will still exist in the generated so file.

How to configure the CMake project:set(CMAKE_SHARED_LINKER_FLAGS “${CMAKE_SHARED_LINKER_FLAGS} -Wl,–exclude-libs,ALL”)#So that all symbols in the static library will not be exported
set(CMAKE_SHARED_LINKER_FLAGS “${CMAKE_SHARED_LINKER_FLAGS} -Wl,–exclude-libs,libabc.a”)#So that the symbols of libabc.a will not be exported

Configuration method of ndk-build project:LOCAL_LDFLAGS += -Wl,–exclude-libs,ALL #Make all symbols in static libraries not exported
LOCAL_LDFLAGS += -Wl,–exclude-libs,libabc.a #Make the symbols of libabc.a not exported

Use version script to control symbol visibility

The version script is a parameter passed to the linker to specify which symbols the dynamic library exports and the version of the symbol. This parameter will affect the contents of .gnu.version and .gnu.version_d in the “so file format” section above. We now just use its ability to specify all exported symbols (i.e. use empty strings for symbol version names). To enable version script, you need to first write a text file to specify which symbols the dynamic library exports. The example is as follows (only the usedFun function is exported):{
global:usedFun;
local:*;
};

Then just pass the path to the above file to the linker (assuming the above file is named version_script.txt).

How to configure the CMake project:set(CMAKE_SHARED_LINKER_FLAGS “${CMAKE_SHARED_LINKER_FLAGS} -Wl,–version-script=${CMAKE_CURRENT_SOURCE_DIR}/version_script.txt”) #version_script.txt is in the same directory as the current CMakeLists.txt

Configuration method of ndk-build project:LOCAL_LDFLAGS += -Wl,–version-script=${LOCAL_PATH}/version_script.txt #version_script.txt is in the same directory as the current Android.mk

It seems that version script explicitly specifies the symbols that need to be retained. If you control whether each symbol is exported through visibility combined with attribute, the effect of version script can also be achieved, but the version script method has some additional benefits:

The version script method can control whether the symbols of the static library compiled into so are exported. Neither the visibility nor attribute methods can do this.
The visibility method combined with the attribute method requires each symbol that needs to be exported to be marked in the source code, which is very complicated for projects that export a lot of symbols. Version script puts the symbols that need to be exported together in a unified way, making it intuitive and convenient to view and modify. It is also very friendly to projects that export a lot of symbols.
Version script supports wildcards, * represents 0 or more characters, and ? represents a single character. For example, my*; represents all symbols starting with my. With wildcard support, configuring version scripts will be more convenient.
There is also a very special point. The version script method can delete some symbols such as __bss_start (this is the symbol added by the linker by default).

To sum up, the version script method is better than the visibility combined with attribute method. At the same time, using the version script method, there is no need to use the exclude libs method to control whether symbols in dependent static libraries are exported.

4.2 Remove useless code

Enable LTO

LTO is the abbreviation of Link Time Optimization, that is, link period optimization. LTO can detect DeadCode and delete them when linking object files, thereby reducing the size of the compiled product. DeadCode example: an if condition is always false, then the code block under the if condition can be removed. Furthermore, the functions called by the removed code block may also become DeadCode, and they can be removed. The reason why optimization can be done at the link stage is that a lot of information cannot be determined at the compile stage. There is only partial information and some optimization cannot be performed. However, most of the information is determined during the link, which is equivalent to obtaining global information, so some optimizations can be made. Both GCC and Clang support LTO. The object file compiled in LTO mode no longer stores instructions for the specific machine, but a machine-independent intermediate representation (GCC uses GIMPLE bytecode, and Clang uses LLVM IR bitcode).

How to configure the CMake project:set(CMAKE_C_FLAGS “${CMAKE_C_FLAGS} -flto”)
set(CMAKE_CXX_FLAGS “${CMAKE_CXX_FLAGS} -flto”)
set(CMAKE_SHARED_LINKER_FLAGS “${CMAKE_SHARED_LINKER_FLAGS} -O3 -flto”)

Configuration method of ndk-build project:LOCAL_CFLAGS += -flto
LOCAL_LDFLAGS += -O3 -flto

There are a few things to note when using LTO:

If you use Clang, LTO must be enabled in both compilation parameters and link parameters, otherwise there will be a problem that the file format cannot be recognized (this problem existed before NDK22). If you use GCC, you only need to enable LTO in the compilation parameters.
If the project depends on a static library, you can use LTO to recompile the static library. Then when compiling the dynamic library, the DeadCode in the static library can be removed, thereby reducing the size of the final so.
After testing, if you use Clang, the linker needs to enable non-zero level optimization for LTO to truly take effect. After actual testing (NDK is r16b), the optimization effect of O1 is poor, while the optimization effects of O2 and O3 are relatively close.
Since more analysis and calculations are required, the link time will increase significantly after LTO is turned on.

Enable GC sections

This is the parameter passed to the linker. GC is Garbage Collection, which is to recycle useless sections. Note that the section here does not refer to the section in the final so, but the section in the object file used as input to the linker.

Let’s briefly introduce the target file. The target file (extension .o) is also an ELF file, so it is also composed of sections, but it only contains the content of the corresponding source file: the function will be placed in the .text style section. Some readable and writable variables will be placed in .data style sections, etc. The linker will merge sections of the same type from all input object files to assemble the final so file.

The GC sections parameter informs the linker: only keep the sections directly or indirectly referenced by dynamic symbols (and .init_array, etc.), and remove other useless sections. This will reduce the size of the final so. But there is another issue to consider when turning on GC sections: the compiler will put all functions into the same section by default, and put all data with the same characteristics into the same section. If there are both parts that need to be deleted and some in the same section, The part that needs to be retained will cause the entire section to be retained. So we need to reduce the granularity of the target file section, which requires the help of two other compilation parameters -fdata-sections and -ffunction-sections. These two parameters inform the compiler to place each variable and function separately. In separate sections, the above problems will not occur. In fact, Android will automatically bring the -fdata-sections and -ffunction-sections parameters when compiling the target file. They are listed here to highlight their functions.

How to configure the CMake project:set(CMAKE_C_FLAGS “${CMAKE_C_FLAGS} -fdata-sections -ffunction-sections”)
set(CMAKE_CXX_FLAGS “${CMAKE_CXX_FLAGS} -fdata-sections -ffunction-sections”)
set(CMAKE_SHARED_LINKER_FLAGS “${CMAKE_SHARED_LINKER_FLAGS} -Wl,–gc-sections”)

Configuration method of ndk-build project:LOCAL_CFLAGS += -fdata-sections -ffunction-sections
LOCAL_LDFLAGS += -Wl,–gc-sections

4.3 Optimize instruction length

Use Oz/Os optimization level

The compiler determines the optimization level of compilation based on the input -Ox parameter, where O0 indicates that optimization is not enabled (this is mainly for ease of debugging and faster compilation speed). From O1 to O3, the degree of optimization becomes stronger and stronger. Both Clang and GCC provide an optimization level for Os, which is relatively close to O2, but optimizes the volume of the generated product. Clang also provides the Oz optimization level, which can further optimize the product volume based on Os.

To sum up, the compiler is Clang and Oz optimization can be turned on. If the compiler is GCC, you can only enable Os optimization (Note: Starting from r13, the default compiler of NDK changed from GCC to Clang, and GCC was officially removed in r18. GCC does not support Oz, which refers to the last GCC4.9 version used by Android. Oz parameter is not supported). Compared with O3 optimization, Oz/Os optimization optimizes the product volume and may cause a certain loss in performance. Therefore, if the project originally used O3 optimization, you can decide whether to use the Os/Oz optimization level based on the actual test results and performance requirements. , if the project does not originally use the O3 optimization level, you can directly use the Os/Oz optimization.

How to configure the CMake project (if using GCC, Oz should be changed to Os):set(CMAKE_C_FLAGS “${CMAKE_C_FLAGS} -Oz”)
set(CMAKE_CXX_FLAGS “${CMAKE_CXX_FLAGS} -Oz”)

Configuration method of ndk-build project (if using GCC, Oz should be changed to Os):LOCAL_CFLAGS += -Oz

4.4 Other measures

Disable C++ exception mechanism

If the C++ exception mechanism (such as try...catch, etc.) is not used in the project, you can reduce the size of so by disabling the C++ exception mechanism.

How to configure the CMake project:set(CMAKE_CXX_FLAGS “${CMAKE_CXX_FLAGS} -fno-exceptions”)

ndk-build disables the C++ exception mechanism by default, so there is no need to specifically disable it (if the existing project has the C++ exception mechanism enabled, it is clearly necessary and needs to be carefully confirmed before disabling it).

Disable C++’s RTTI mechanism

If the project does not use C++’s RTTI mechanism (such as typeid and dynamic_cast, etc.), you can reduce the size of so by disabling C++’s RTTI.

How to configure the CMake project:set(CMAKE_CXX_FLAGS “${CMAKE_CXX_FLAGS} -fno-rtti”)

ndk-build disables the RTTI mechanism of C++ by default, so there is no need to disable it specifically (if the existing project has the RTTI mechanism of C++ enabled, it is clearly necessary and needs to be carefully confirmed before disabling it).

Merge so

The above are optimization solutions for a single so. After optimizing a single so, you can also consider merging the so to further reduce the size of the so. Specifically, when some so in the installation package is only dynamically dependent on another so, these so can be merged into one so. For example, liba.so and libb.so are only dynamically dependent on libx.so, and these three so can be merged into a new libx.so. Merging so has the following benefits:

Some dynamic symbol table items can be deleted to reduce the total size of so. Specifically, you can delete all exported symbols in the dynamic symbol tables of liba.so and libb.so, as well as symbols imported from liba.so and libb.so in the dynamic symbol table of libx.so.
You can delete some PLT entries and GOT entries to reduce the total volume of so. Specifically, you can delete the PLT entries and GOT entries related to liba.so and libb.so in libx.so.
Can reduce optimization workload. If so is not merged, when optimizing the volume of liba.so and libb.so, you need to determine which symbols libx.so depends on before optimizing them. This is no longer necessary after merging so. The linker will automatically analyze the reference relationship and retain the corresponding content of all symbols used.
Since the linker has more complete contextual information about the exported symbols of the original liba.so and libb.so, LTO optimization can also achieve better results.

The merging of so can be achieved at the compilation level without modifying the project source code.

Extract multiple so common dependent libraries

“Merge so” above is to reduce the total number of so, but here it is to increase the total number of so. When multiple so’s statically rely on the same library, you can consider extracting this library into a separate so, and the original several so’s will dynamically depend on this so. For example, both liba.so and libb.so statically depend on libx.a, which can be optimized so that both liba.so and libb.so dynamically depend on libx.so. Extracting multiple so’s common dependent libraries can merge the same code in different so’s, thereby reducing the total so’s volume.

A typical example here is the libc++ library: if there are multiple sos that all depend statically on the libc++ library, you can optimize that these sos all dynamically depend on libc++_shared.so.

4.5 Integrated general solution

Through the above analysis, we can integrate a common optimization solution that can be used by ordinary projects, and the configuration method of the CMake project (if using GCC, Oz should be changed to Os):set(CMAKE_C_FLAGS “${CMAKE_C_FLAGS} -Oz -flto -fdata-sections -ffunction-sections”)
set(CMAKE_CXX_FLAGS “${CMAKE_CXX_FLAGS} -Oz -flto -fdata-sections -ffunction-sections”)
set(CMAKE_SHARED_LINKER_FLAGS “${CMAKE_SHARED_LINKER_FLAGS} -O3 -flto -Wl,–gc-sections -Wl,–version-script=${CMAKE_CURRENT_SOURCE_DIR}/version_script.txt”) #version_script.txt is in the same directory as the current CMakeLists.txt

Configuration method of ndk-build project (if using GCC, Oz should be changed to Os):LOCAL_CFLAGS += -Oz -flto -fdata-sections -ffunction-sections
LOCAL_LDFLAGS += -O3 -flto -Wl,–gc-sections -Wl,–version-script=${LOCAL_PATH}/version_script.txt #version_script.txt is in the same directory as the current Android.mk

Among them, the more common configuration of version_script.txt is as follows. You can add the export symbols that need to be retained according to the actual situation:{
global:JNI_OnLoad;JNI_OnUnload;Java_*;
local:*;
};

Note: The version script method specifies all symbols that need to be exported, and the visibility method, attribute method, static keyword and exclude libs method are no longer needed to control exported symbols. Whether to disable C++’s exception mechanism and RTTI mechanism, merge so, and extract multi-so common dependent libraries depends on the specific project and is not universal.

At this point, we have summarized a set of feasible so volume optimization solutions. But in engineering practice, there are still some problems to be solved.

5. Engineering Practice

Supports multiple build tools

Meituan has many businesses that use so, and the build tools used are also different. In addition to the common CMake and ndk-build mentioned above, there are also projects using various build tools such as Make, Automake, Ninja, GYP, and GN. Different build tools apply so optimization solutions in different ways. Especially for large projects, the configuration complexity is high.

For the above reasons, each business configuring its own optimization solution will consume more labor costs, and the configuration may be invalid. In order to reduce configuration costs, speed up the advancement of optimization solutions, and ensure the effectiveness and correctness of configuration, we have unified support for so optimization on the build platform (supporting projects using any build tool). Businesses only need to perform simple configurations to enable volume optimization of so.

Notes on configuring exported symbols

There are two points to note:

If some symbols of a so are used by other so through dlsym, then these symbols should also be retained in the exported symbols of the so (otherwise it will cause a runtime exception).
When writing version_script.txt, you need to pay attention to the modification of symbols in languages such as C++. You cannot directly fill in the function name. Symbol modification is to add a function’s namespace (if any), class name (if any), parameter type, etc. to the final symbol. This is also the basis for overloading in the C++ language. There are two ways to add C++ functions to exported symbols: the first is to view the exported symbol table of unoptimized so, find the modified symbol of the target function, and then fill it in version_script.txt. For example, there is a MyClass class:

class MyClass{
void start(int arg);
void stop();
};

To determine the true symbol of the start function, execute the following command on the unoptimized libexample.so. Because after C++ modifies symbols, the function name is part of the symbol, so you can use grep to speed up the search:

Figure 4 Find the real symbol of the start function

Figure 4 Find the real symbol of start function

You can see that the real symbol of the start function is _ZN7MyClass5startEi. If you want to export this function, just fill in _ZN7MyClass5startEi in the corresponding position of version_script.txt.

The second way is to use extern syntax in version_script.txt as follows:{
global:
extern “C++” {
MyClass::start*;
“MyClass::stop()”;
};
local:*;
};

The above configuration can export the start and stop functions of MyClass. The principle is that when linking, the linker demangles each symbol (deconstructs, that is, restores the modified symbol to a readable representation), and then matches it with the entries in extern “C++”. If it can match any entry If successful, the symbol is retained. The matching rules are: wildcards cannot be used for entries with double quotes, and the entire string must be completely matched (for example, stop entries, if there is an extra space between the brackets, the match will fail). Wildcards can be used for entries without double quotes (such as the start entry).

View the optimized exported symbols of so

After the business optimizes so, it needs to check which export symbols are retained in the final so file to verify whether the optimization effect is as expected. You can use the following command on both Mac and Linux to see which export symbols are retained by so:nm -D –defined-only xxx.so

For example:

Figure 5 nm command to view the exported symbols of the so file

As can be seen, there are two exported symbols of libexample.so: JNI_OnLoad and Java_com_example_MainActivity_stringFromJNI.

Parse crash stack

The optimization plan in this article will remove unnecessary exported dynamic symbols. So if a crash occurs, will the crash stack be unable to be parsed? The answer is that it does not affect the parsing results of the crash stack at all.

As mentioned in the “So Optimizable Content Analysis” section, using so to parse online crashes with debugging information and symbol tables is the standard way to analyze so crashes (this is also the way Google parses so crashes). The optimization plan in this article does not modify the debugging information and symbol table, so you can use so with debugging information and symbol table to completely restore the crash stack and parse out the source code file, line number and function name corresponding to each stack frame of the crash stack. and other information. After the business compiles the release version of so, it can upload the corresponding so with debugging information and symbol table to the crash platform.

6. Program benefits

Optimizing so has direct benefits on the installation package volume and the local storage space occupied after installation. The size of the benefit depends on the specific circumstances such as the number of redundant codes and the number of exported symbols in the original so. The following is a comparison of the installation package volume occupied by some so before and after optimization:

so	size before optimization	size after optimization	optimization percentage
A Library	4.49 MB	3.28 MB	27.02%
B library	995.82 KB	728.38 KB	26.86%
C library	312.05 KB	153.81 KB	50.71%
D library	505.57 KB	321.75 KB	36.36%
E library	309.89 KB	157.08 KB	49.31%
F library	88.59 KB	62.93 KB	28.97%

The following is a comparison of the local storage space occupied by the above so optimization before and after:

so	size before optimization	size after optimization	optimization percentage
A library	10.67 MB	7.04 MB	34.05%
B Library	2.35 MB	1.61 MB	31.46%
C library	898.14 KB	386.31 KB	56.99%
D library	1.30 MB	771.47 KB	41.88%
E library	890.13 KB	398.30 KB	55.25%
F Library	230.30 KB	146.06 KB	36.58%

7. Summary and follow-up plans

Optimizing the size of so can not only reduce the size of the installation package, but also achieve the following benefits:

Removed a large number of unnecessary export symbols to improve the security of so.
Because the sections that occupy memory during runtime such as .data .bss .text and so on are reduced, the memory occupied by the application during runtime can also be reduced.
If the symbols that so relies on externally are reduced during the optimization process, the loading speed of so can also be accelerated.

We have made the following plans for our follow-up work:

Improve compilation speed. Because using LTO, gc sections, etc. will increase compilation time, we plan to investigate ThinLTO and other solutions to optimize compilation speed.
Show in detail the reasons for retaining each function/data.
Further improve the platform’s ability to optimize so.