Rizin
unix-like reverse engineering framework and cli tools
DEVELOPERS

This file is aimed at developers who want to work on the Rizin code base.

Documentation

There is support for Doxygen document generation in this repo. By running doxygen in the root of this repository, it will autodetect the Doxyfile and generate HTML documentation into doc/doxygen/html/index.html

If you're contributing code or willing to update existing code, you should use the doxygen C-style comments to improve documentation and comments in code. See the Doxygen Manual for more info. Example usage can be found here

static int findMinMax(RzList *maps, ut64 *min, ut64 *max, int skip, int width);
static RzList * maps(RzBinFile *bf)
Definition: bin_bf.c:116
static int findMinMax(RzList *maps, ut64 *min, ut64 *max, int skip, int width)
Find the min and max addresses in an RzList of maps.
Definition: cdebug.c:548
int max
Definition: enough.c:225
void skip(file *in, unsigned n)
Definition: gzappend.c:202
#define min(a, b)
Definition: qsort.h:83
int width
Definition: main.c:10
ut64(WINAPI *w32_GetEnabledXStateFeatures)()

In order to improve the documentation and help newcomers, documenting code is mandatory.

You should add or update the documentation of:

  • code written by you.
  • existing Rizin code you changed.

Exceptions:

  • Trivial changes.

If you have not updated the documentation, explain why. E.g.: Bug fix did not change the general behavior of the function. No documentation update needed.

Code style

C

In order to contribute with patches or plugins, we encourage you to use the same coding style as the rest of the code base.

  • Use git-clang-format 13 to format your code. If clang-format-13 is not available on your Debian-based distribution, you can install it from https://apt.llvm.org/. You should invoke it as below (after making sure that your local copy of dev is up-to-date and your branch is up-to-date with dev):
git-clang-format-13 --extensions c,cpp,h,hpp,inc --style file dev
  • Lines should be at most 100 chars. A tab is considered as 8 chars. If it makes things more readable, you can use more than 100 characters, but this should be the exception, not the rule.
  • Always use braces for if and while.
  • In general, don't use goto. The goto statement only comes in handy when a function exits from multiple locations and some common work such as cleanup has to be done. If there is no cleanup needed, then just return directly.
  • Choose label names which say what the goto does or why the goto exists. An example of a good name could be "out_buffer:" if the goto frees "buffer". Avoid using GW-BASIC names like "err1:" and "err2:".
  • Use rz_return_* functions to check preconditions that are caused by programmers' errors. Please note the difference between conditions that should never happen, and that are handled through rz_return_* functions, and conditions that can happen at runtime (e.g. malloc() returns NULL, input coming from user, etc.), and should be handled in the usual way through if-else.
int check(RzCore *c, int a, int b) {
rz_return_val_if_fail(a >= 0 && b >= 1, false);
if (a == 0) {
/* do something */
...
}
... /* do something else */
}
lzma_check check
Definition: container.h:292
#define rz_return_val_if_fail(expr, val)
Definition: rz_assert.h:108
#define b(i)
Definition: sha256.c:42
#define c(i)
Definition: sha256.c:43
#define a(i)
Definition: sha256.c:41
  • Use rz_warn_if_reached() macros to emit a runtime warning if the code path is reached. It is often useful in a switch cases handling, in the default case:
switch(something) {
case EXPECTED_CASE1:
...
break;
case EXPECTED_CASE2:
...
break;
case UNEXPECTED_CASE:
break;
...
}
#define rz_warn_if_reached()
Definition: rz_assert.h:29
  • Split long conditional expressions into small static inline functions to make them more readable:
+static inline bool inRange(RzBreakpointItem *b, ut64 addr) {
+ return (addr >= b->addr && addr < (b->addr + b->size));
+}
+
+static inline bool matchProt(RzBreakpointItem *b, int rwx) {
+ return (!rwx || (rwx && b->rwx));
+}
+
RZ_API RzBreakpointItem *rz_bp_get_in(RzBreakpoint *bp, ut64 addr, int rwx) {
RzBreakpointItem *b;
RzListIter *iter;
rz_list_foreach (bp->bps, iter, b) {
- if (addr >= b->addr && addr < (b->addr+b->size) && \
- (!rwx || rwx&b->rwx))
+ if (inRange(b, addr) && matchProt(b, rwx)) {
return b;
+ }
}
return NULL;
}
  • Structure in the C files

The structure of the C files in Rizin must be like this:

// SPDX-License-Identifier: LGPL-3.0-only
/* Copyright ... */ ## copyright
#include <rz_core.h> ## includes
static int globals ## const, define, global variables
static void helper(void) {} ## static functions
RZ_IPI void internal(void) {} ## internal apis (used only inside the library)
RZ_API void public(void) {} ## public apis starting with constructor/destructor
#define RZ_IPI
Definition: analysis_wasm.c:11
#define RZ_API
string copyright
Definition: conf.py:67
  • Why return int vs enum

The reason why many places in Rizin-land functions return int instead of an enum type is because enums can't be OR'ed; otherwise, it breaks the usage within a switch statement and swig can't handle that stuff.

rz_core_wrap.cxx:28612:60: error: assigning to 'RzRegisterType' from incompatible type 'long'
arg2 = static_cast< long >(val2); if (arg1) (arg1)->type = arg2; resultobj = SWIG_Py_Void(); return resultobj; fail:
^ ~~~~
rz_core_wrap.cxx:32103:61: error: assigning to 'RzDebugReasonType' from incompatible type 'int'
arg2 = static_cast< int >(val2); if (arg1) (arg1)->type = arg2; resultobj = SWIG_Py_Void(); return resultobj; fail:
^ ~~~~
3 warnings and 2 errors generated.
````
* Do not use `assert.h`, use `rz_util/rz_assert.h` instead.
* You can use `export RZ_DEBUG_ASSERT=1` to set a breakpoint when hitting an assert.
* Function names should be explicit enough to not require a comment
explaining what it does when seen elsewhere in code.
* Use `RZ_API` define to mark exportable (public) methods only for module APIs
* The rest of functions must be static, to avoid polluting the global space.
* Avoid using global variables, they are evil.
* Do not write ultra-large functions: split them into multiple or simplify
the algorithm, only external-copy-pasted-not-going-to-be-maintained code
can be accepted in this way (gnu code, external disassemblers, etc..)
* Use the Rizin types instead of the ones in `<stdint.h>`, which are known to cause some
portability issues. So, instead of `uint8_t`, use `ut8`, etc.. As a bonus point they
are shorter to write.
* Never ever use `%lld` or `%llx`. This is not portable. Always use the `PFMT64x`
macros. Those are similar to the ones in GLIB. See all macroses in `librz/include/rz_types.h`.
* Never use `offsetof()` macros - it's not supported by some compilers. Use `rz_offsetof()` instead.
* Add a single space after the `//` when writing inline comments:
```c
int sum = 0; // set sum to 0

Shell Scripts

  • Use #!/bin/sh
  • Do not use bashisms [[, ‘$’...'` etc.
  • Use our shellcheck.sh script to check for problems and for bashisms

Manage Endianness

As hackers, we need to be aware of endianness.

Endianness can become a problem when you try to process buffers or streams of bytes and store intermediate values as integers with width larger than a single byte.

It can seem very easy to write the following code:

ut8 opcode[4] = { 0x10, 0x20, 0x30, 0x40 };
ut32 value = *(ut32*)opcode;
static int value
Definition: cmd_api.c:93
uint32_t ut32
uint8_t ut8
Definition: lh5801.h:11

... and then continue to use "value" in the code to represent the opcode.

This needs to be avoided!

Why? What is actually happening?

When you cast the opcode stream to a unsigned int, the compiler uses the endianness of the host to interpret the bytes and stores it in host endianness. This leads to very unportable code, because if you compile on a different endian machine, the value stored in "value" might be 0x40302010 instead of 0x10203040.

Solution

Use bitshifts and OR instructions to interpret bytes in a known endian. Instead of casting streams of bytes to larger width integers, do the following:

ut8 opcode[4] = { 0x10, 0x20, 0x30, 0x40 };
ut32 value = opcode[0] | opcode[1] << 8 | opcode[2] << 16 | opcode[3] << 24;

or if you prefer the other endian:

ut32 value = opcode[3] | opcode[2] << 8 | opcode[1] << 16 | opcode[0] << 24;

This is much better because you actually know which endian your bytes are stored in within the integer value, REGARDLESS of the host endian of the machine.

Endian helper functions

Rizin now uses helper functions to interpret all byte streams in a known endian.

Please use these at all times, eg:

val32 = rz_read_be32(buffer) // reads 4 bytes from a stream in BE
val32 = rz_read_le32(buffer) // reads 4 bytes from a stream in LE
val32 = rz_read_ble32(buffer, isbig) // reads 4 bytes from a stream:
// if isbig is true, reads in BE
// otherwise reads in LE
static ut32 rz_read_le32(const void *src)
Definition: rz_endian.h:239
static ut32 rz_read_ble32(const void *src, bool big_endian)
Definition: rz_endian.h:497
static ut32 rz_read_be32(const void *src)
Definition: rz_endian.h:87
Definition: buffer.h:15

There are a number of helper functions for 64, 32, 16, and 8 bit reads and writes.

(Note that 8 bit reads are equivalent to casting a single byte of the buffer to a ut8 value, ie endian is irrelevant).

In case of the access to the RzBuffer *buffer type, there are also helpers like rz_buf_read_bleXX()/rz_buf_write_bleXX(), rz_buf_read_bleXX_at()/rz_buf_write_bleXX_at(), and rz_buf_read_bleXX_offset()/rz_buf_write_bleXX_offset(). In addition to them there are corresponding little-endian or big-endian-only functions like rz_buf_read_leXX()/rz_buf_read_beXX(), rz_buf_read_leXX_at()/rz_buf_read_beXX(), rz_buf_read_leXX_offset()/rz_buf_read_beXX_offset(), and corresponding writing functions.

Packed structures

Due to the various differences between platforms and compilers Rizin has a special helper macro - RZ_PACKED(). Instead of non-portable #pragma pack or __attribute__((packed)) it is advised to use this macro instead. To wrap the code inside of it you just need to write:

RZ_PACKED(union mystruct {
int a;
char b;
})
RZ_PACKED(struct coff_hdr { ut16 f_magic;ut16 f_nscns;ut32 f_timdat;ut32 f_symptr;ut32 f_nsyms;ut16 f_opthdr;ut16 f_flags;})

or in case of typedef:

RZ_PACKED(typedef structmystruct {
int a;
char b;
})

Modules

The Rizin code base is modularized into different libraries that are found in librz/ directory. The binrz/ directory contains the programs which use the libraries.

Hint: To find both the declaration and definition of a function named func_name, you can use the following git grep command:

git grep -nWG "^[^[:blank:]].*func_name("

JSON

Since many places in Rizin output JSON the special API was created, PJ which means "Print Json". It allows to create nested JSON structs with a simple and short API. Full API reference is available in librz/include/rz_util/rz_pj.h.

Here is the short example of how we usually use PJ:

PJ *pj = NULL;
pj = pj_new(); // creates a new instance of the API
if (!pj) {
return false;
}
}
// ... some other logic
// Creating the JSON structure
pj_o(pj); // creates a JSON list
pj_ki(pj, "id", some->id); // creates an element like "id": 6
pj_ks(pj, "name", some->name); // creates an element like "name": "bla"
pj_end(pj); // closes a JSON list
}
// ... some other logic
// Printing the JSON on the screen
pj_free(pj); // free the instance of the API
}
RZ_API void rz_cons_println(const char *str)
Definition: cons.c:233
#define NULL
Definition: cris-opc.c:27
const char int mode
Definition: ioapi.h:137
RZ_API PJ * pj_new(void)
Definition: pj.c:25
RZ_API PJ * pj_ki(PJ *j, const char *k, int d)
Definition: pj.c:149
RZ_API PJ * pj_end(PJ *j)
Definition: pj.c:87
RZ_API const char * pj_string(PJ *pj)
Definition: pj.c:57
RZ_API void pj_free(PJ *j)
Definition: pj.c:34
RZ_API PJ * pj_o(PJ *j)
Definition: pj.c:75
RZ_API PJ * pj_ks(PJ *j, const char *k, const char *v)
Definition: pj.c:170
@ RZ_OUTPUT_MODE_JSON
Definition: rz_types.h:40
Definition: rz_pj.h:12

It will produce the following output:

{"id":6,"name":"bla"}

Licenses

Rizin is trying to comply with the Software Package Data Exchange® (SPDX®), an open standard to communicate in a clear way licenses and copyrights, among other things, of a software. All files in the repository should either have an header specifying the copyright and the license that apply or an entry in .reuse/dep5 file. All pieces of code copied from other projects should have a license/copyright entry as well.

In particular, the SPDX header may look like:

// SPDX-FileCopyrightText: 2021 RizinOrg <info@rizin.re>
// SPDX-License-Identifier: LPGL-3.0-only

You can use the REUSE Software to check the compliance of the project and get the licenses/copyright of each file.

Custom Pointer Modifiers

In Rizin code there are some conventions to help developers use pointers more safely, which are defined in librz/include/rz_types.h:

#define RZ_IN /* do not use, implicit */
#define RZ_OUT /* parameter is written, not read */
#define RZ_INOUT /* parameter is read and written */
#define RZ_OWN /* pointer ownership is transferred */
#define RZ_BORROW /* pointer ownership is not transferred, it must not be freed by the receiver */
#define RZ_NONNULL /* pointer can not be null */
#define RZ_NULLABLE /* pointer can be null */
#define RZ_DEPRECATE /* should not be used in new code and should/will be removed in the future */

Most of them are easy to understand and you can see brief explanation in the comments. But RZ_OWN and RZ_BORROW may be a little tricky to new developers.

Sometimes it may not be immediately clear whether the object you are getting from a function shall be freed or not. Rizin uses RZ_OWN and RZ_BORROW to indicate pointer ownership so you don't have to read complicated function definitions to know whether they should still free objects or not.

You can use the two modifiers in two places and their explanations are as below:

  • before the return type of function
    • RZ_OWN: the ownership of the returned object is transferred to the caller. The caller owns the object, so it must free it (or ensure that something else frees it).
    • RZ_BORROW: the ownership of the returned object is not transferred. The caller can use the object, but it does not own it, so it should not free it.
  • before the parameter of function
    • RZ_OWN: the ownership of the passed argument is transferred to the callee. The callee now owns the object and it is its duty to free it (or ensure that something else frees it). In any case, the caller should not care anymore about freeing that passed object.
    • RZ_BORROW: the ownership of the passed argument is not transferred to the callee, which can use it but it should not free it. After calling this function, the caller still owns the passed object and it should ensure that at some point it is freed.

Examples:

RZ_OWN MyString *capitalize_str(RZ_BORROW char *s) {
MyString *m = RZ_NEWS(MyString);
m->s = strdup(s);
capitalize(m->s);
return m;
}
int main() {
char *s = strdup("Hello World");
MyString *m = capitalize_str(s);
// s was RZ_BORROW, so main still need to free it
free(s);
// ... use m ....
// m was RZ_OWN, so main now has to free it
my_string_free(m);
}
RZ_API void Ht_() free(HtName_(Ht) *ht)
Definition: ht_inc.c:130
return strdup("=SP r13\n" "=LR r14\n" "=PC r15\n" "=A0 r0\n" "=A1 r1\n" "=A2 r2\n" "=A3 r3\n" "=ZF zf\n" "=SF nf\n" "=OF vf\n" "=CF cf\n" "=SN or0\n" "gpr lr .32 56 0\n" "gpr pc .32 60 0\n" "gpr cpsr .32 64 0 ____tfiae_________________qvczn\n" "gpr or0 .32 68 0\n" "gpr tf .1 64.5 0 thumb\n" "gpr ef .1 64.9 0 endian\n" "gpr jf .1 64.24 0 java\n" "gpr qf .1 64.27 0 sticky_overflow\n" "gpr vf .1 64.28 0 overflow\n" "gpr cf .1 64.29 0 carry\n" "gpr zf .1 64.30 0 zero\n" "gpr nf .1 64.31 0 negative\n" "gpr itc .4 64.10 0 if_then_count\n" "gpr gef .4 64.16 0 great_or_equal\n" "gpr r0 .32 0 0\n" "gpr r1 .32 4 0\n" "gpr r2 .32 8 0\n" "gpr r3 .32 12 0\n" "gpr r4 .32 16 0\n" "gpr r5 .32 20 0\n" "gpr r6 .32 24 0\n" "gpr r7 .32 28 0\n" "gpr r8 .32 32 0\n" "gpr r9 .32 36 0\n" "gpr r10 .32 40 0\n" "gpr r11 .32 44 0\n" "gpr r12 .32 48 0\n" "gpr r13 .32 52 0\n" "gpr r14 .32 56 0\n" "gpr r15 .32 60 0\n" "gpr r16 .32 64 0\n" "gpr r17 .32 68 0\n")
static RzSocket * s
Definition: rtr.c:28
int main(int argc, char **argv)
Definition: rz-bb.c:29
#define RZ_NEWS(x, y)
Definition: rz_types.h:283
#define RZ_OWN
Definition: rz_types.h:62
#define RZ_BORROW
Definition: rz_types.h:63
RZ_BORROW MyString *capitalize_str(RZ_BORROW MyFile *f, RZ_OWN char *s) {
MyString *m = RZ_NEWS(MyString);
m->s = s;
capitalize(m->s);
f->m = m;
return m;
}
int main() {
char *s = strdup("Hello World");
MyFile *f = create_my_file();
MyString *m = capitalize_str(f, s);
// s was RZ_OWN, so main does not need to free it. s is now owned by `m`
// ... use m ....
// m was RZ_BORROW, so main is just borrowing it from `f`, and it does not have to free it.
my_file_free(f);
// f was created by main and never transferred to anything else, so main needs to free it.
}
#define f(i)
Definition: sha256.c:46
  • You should use these modifiers consistently in both function definition and declaration.
  • You should use these modifiers when and only when it makes sense. For example, if your function returns const char *, the caller should not free it because of the const. So specifying RZ_BORROW in this case is probably redundant.
  • Since they are used as indications to developers with no special compiler-time restrictions, there is no good way to check if you have used them correctly.