The Thread Local Cache (Tcache) was introduced in glibc 2.26 in order to perform optimizations. The introduction of tcache improved the performance of glibc and also increased the exploit primitives. glibc 2.26, 2.27 and 2.28 are highly vulnerable to double free. Not only a double free, one can easily execute multiple frees on a single chunk in these versions of glibc.

Why tcache?

There is a limit on the number of arenas. When this limit is reached, new threads must share their arenas with other threads. Thus, when a thread needs memory, it may have to wait for other threads sharing the same arena to complete their own requests which causes a slowdown. Tcache was introduced to combat this slowdown. Each thread gets its own tcache which can be considered as a mini arena. When a chunk is freed, it is linked into the thread’s tcache instead of being linked into the thread’s arena. Whenever new requests in the tcache size range will be made, they will be first serviced from the tcache. In single threaded programs, tcache is allocated at the start of the heap. In multithreaded programs, tcache may be allocated anywhere on the heap.

Working of tcache

Tcaches are allocated inside the heap, just like regular chunks.


#ifndef INTERNAL_SIZE_T
# define INTERNAL_SIZE_T size_t
#endif

#define SIZE_SZ (sizeof (INTERNAL_SIZE_T))

#define MALLOC_ALIGNMENT (2 * SIZE_SZ < __alignof__ (long double) \
                          ? __alignof__ (long double) : 2 * SIZE_SZ)
#define MALLOC_ALIGN_MASK (MALLOC_ALIGNMENT - 1)

#define MIN_CHUNK_SIZE  (offsetof(struct malloc_chunk, fd_nextsize))

/* The smallest size we can malloc is an aligned minimal chunk */
#define MINSIZE  \
  (unsigned long)(((MIN_CHUNK_SIZE+MALLOC_ALIGN_MASK) & ~MALLOC_ALIGN_MASK))


# define TCACHE_MAX_BINS 64
# define TCACHE_FILL_COUNT 7
# define MAX_TCACHE_SIZE tidx2usize (TCACHE_MAX_BINS-1)
# define tidx2usize(idx)        (((size_t) idx) * MALLOC_ALIGNMENT + MINSIZE - SIZE_SZ)

The maximum number of tcache bins for a thread is 64 and a particular tcache bin can store at most TCACHE_FILL_COUNT number of chunks i.e. 7. Also, the MAX_TCACHE_SIZE is calculated by using the formula mentioned above. Size of a chunk may vary from 24 to 1032 bytes for 64 bit and 12 to 516 bytes for 32 bits

MALLOC_ALIGNMENT is the minimum alignment for malloc’ed chunks. It must be a power of two at least 2 * SIZE_SZ, even on machines for which smaller alignments would suffice. It may be defined as larger than this though. Note however that code and data structures are optimized for the case of 8-byte alignment.

typedef struct tcache_entry
{
  struct tcache_entry *next;
} tcache_entry;

typedef struct tcache_perthread_struct
{
   char counts[TCACHE_MAX_BINS];
   tcache_entry *entries[TCACHE_MAX_BINS];
} tcache_perthread_struct;

Each thread has its own tcache_perthread_struct. A tcache_perthread_struct is made up of two regions: counts and entries. Each entry stores the head of a singly linked, non circular, LIFO type list of free chunks. These lists hold free chunks of size ranging from 0x20 to 0x410. One important thing which makes tcache different from fastbins is that the tcache references the chunks from the starting address of user data rather than the chunk metadata. The number of chunks in a particular bin are stored in counts array. The first field denotes the 0x20 bin, second field denotes the 0x30 bin and so on….

Let’s say, the first 8 bytes of counts are: 0x0707060504030201. This means that the 0x20 tcache bin contains one free chunk, the 0x30 bin contains two, the 0x40 bin contains three, the 0x50 bin contains four free chunks, and so on… For libc 2.26 to 2.29, the size of a count field is one byte. But, for libc versions>= 2.30, the size of a single count field is 2 bytes.

typedef struct tcache_perthread_struct
{
  uint16_t counts[TCACHE_MAX_BINS];
  tcache_entry *entries[TCACHE_MAX_BINS];
} tcache_perthread_struct;

Note: tcache is given higher priority than an arena. Suppose a 0x40 fastbin as well as a 0x40 tcache bin contains a chunk. New calls to malloc in the size range of 0x40 will be first serviced from the tcache rather than the fastbin. Also, chunks will be first inserted into the tcache if the tcache bin of that size is not full. Various fields related to the tcache can be examined using pwndbg by using the command mp (this is a struct).

{
  trim_threshold = 131072,
  top_pad = 131072,
  mmap_threshold = 131072,
  arena_test = 8,
  arena_max = 0,
  n_mmaps = 0,
  n_mmaps_max = 65536,
  max_n_mmaps = 0,
  no_dyn_threshold = 0,
  mmapped_mem = 0,
  max_mmapped_mem = 0,
  sbrk_base = 0x2008000 "",
  tcache_bins = 64,
  tcache_max_bytes = 1032,
  tcache_count = 7,
  tcache_unsorted_limit = 0
}