Rizin
unix-like reverse engineering framework and cli tools
|
This file is aimed at developers who want to work on the Rizin code base.
There is support for Doxygen document generation in this repo. By running doxygen
in the root of this repository, it will autodetect the Doxyfile and generate HTML documentation into doc/doxygen/html/index.html
If you're contributing code or willing to update existing code, you should use the doxygen C-style comments to improve documentation and comments in code. See the Doxygen Manual for more info. Example usage can be found here
In order to improve the documentation and help newcomers, documenting code is mandatory.
You should add or update the documentation of:
Exceptions:
If you have not updated the documentation, explain why. E.g.: Bug fix did not change the general behavior of the function. No documentation update needed.
In order to contribute with patches or plugins, we encourage you to use the same coding style as the rest of the code base.
dev
is up-to-date and your branch is up-to-date with dev
):rz_return_*
functions to check preconditions that are caused by programmers' errors. Please note the difference between conditions that should never happen, and that are handled through rz_return_*
functions, and conditions that can happen at runtime (e.g. malloc()
returns NULL
, input coming from user, etc.), and should be handled in the usual way through if-else.rz_warn_if_reached()
macros to emit a runtime warning if the code path is reached. It is often useful in a switch cases handling, in the default case:static inline
functions to make them more readable:The structure of the C files in Rizin must be like this:
The reason why many places in Rizin-land functions return int instead of an enum type is because enums can't be OR'ed; otherwise, it breaks the usage within a switch statement and swig can't handle that stuff.
#!/bin/sh
[[
, ‘$’...'` etc.As hackers, we need to be aware of endianness.
Endianness can become a problem when you try to process buffers or streams of bytes and store intermediate values as integers with width larger than a single byte.
It can seem very easy to write the following code:
... and then continue to use "value" in the code to represent the opcode.
This needs to be avoided!
Why? What is actually happening?
When you cast the opcode stream to a unsigned int, the compiler uses the endianness of the host to interpret the bytes and stores it in host endianness. This leads to very unportable code, because if you compile on a different endian machine, the value stored in "value" might be 0x40302010 instead of 0x10203040.
Use bitshifts and OR instructions to interpret bytes in a known endian. Instead of casting streams of bytes to larger width integers, do the following:
or if you prefer the other endian:
This is much better because you actually know which endian your bytes are stored in within the integer value, REGARDLESS of the host endian of the machine.
Rizin now uses helper functions to interpret all byte streams in a known endian.
Please use these at all times, eg:
There are a number of helper functions for 64, 32, 16, and 8 bit reads and writes.
(Note that 8 bit reads are equivalent to casting a single byte of the buffer to a ut8
value, ie endian is irrelevant).
In case of the access to the RzBuffer *buffer
type, there are also helpers like rz_buf_read_bleXX()
/rz_buf_write_bleXX()
, rz_buf_read_bleXX_at()
/rz_buf_write_bleXX_at()
, and rz_buf_read_bleXX_offset()
/rz_buf_write_bleXX_offset()
. In addition to them there are corresponding little-endian or big-endian-only functions like rz_buf_read_leXX()
/rz_buf_read_beXX()
, rz_buf_read_leXX_at()
/rz_buf_read_beXX()
, rz_buf_read_leXX_offset()
/rz_buf_read_beXX_offset()
, and corresponding writing functions.
Due to the various differences between platforms and compilers Rizin has a special helper macro - RZ_PACKED()
. Instead of non-portable #pragma pack
or __attribute__((packed))
it is advised to use this macro instead. To wrap the code inside of it you just need to write:
or in case of typedef:
The Rizin code base is modularized into different libraries that are found in librz/
directory. The binrz/
directory contains the programs which use the libraries.
Hint: To find both the declaration and definition of a function named func_name, you can use the following git grep
command:
Since many places in Rizin output JSON the special API was created, PJ which means "Print Json". It allows to create nested JSON structs with a simple and short API. Full API reference is available in librz/include/rz_util/rz_pj.h
.
Here is the short example of how we usually use PJ:
It will produce the following output:
Rizin is trying to comply with the Software Package Data Exchange® (SPDX®), an open standard to communicate in a clear way licenses and copyrights, among other things, of a software. All files in the repository should either have an header specifying the copyright and the license that apply or an entry in .reuse/dep5 file. All pieces of code copied from other projects should have a license/copyright entry as well.
In particular, the SPDX header may look like:
You can use the REUSE Software to check the compliance of the project and get the licenses/copyright of each file.
In Rizin code there are some conventions to help developers use pointers more safely, which are defined in librz/include/rz_types.h
:
Most of them are easy to understand and you can see brief explanation in the comments. But RZ_OWN
and RZ_BORROW
may be a little tricky to new developers.
Sometimes it may not be immediately clear whether the object you are getting from a function shall be freed or not. Rizin uses RZ_OWN
and RZ_BORROW
to indicate pointer ownership so you don't have to read complicated function definitions to know whether they should still free objects or not.
You can use the two modifiers in two places and their explanations are as below:
RZ_OWN
: the ownership of the returned object is transferred to the caller. The caller owns the object, so it must free it (or ensure that something else frees it).RZ_BORROW
: the ownership of the returned object is not transferred. The caller can use the object, but it does not own it, so it should not free it.RZ_OWN
: the ownership of the passed argument is transferred to the callee. The callee now owns the object and it is its duty to free it (or ensure that something else frees it). In any case, the caller should not care anymore about freeing that passed object.RZ_BORROW
: the ownership of the passed argument is not transferred to the callee, which can use it but it should not free it. After calling this function, the caller still owns the passed object and it should ensure that at some point it is freed.Examples:
const char *
, the caller should not free it because of the const
. So specifying RZ_BORROW
in this case is probably redundant.