3.8 자체 제작 디스어셈블 도구 구현

파이썬 포팅 필요

8.1 굳이 디스어셈블 과정을 자체 제작할 필요성은?

8.1.1 자체 제작 디스어셈블 도구 필요성 사례: 난독화된 코드

8.1.2 자체 제작 디스어셈블 도구를 개발할 또 다른 필요성

8.2 캡스톤 살펴보기

8.2.1 캡스톤 설치하기

8.2.2 캡스톤으로 선형 디스어셈블 도구 제작하기

8.2.3 캡스톤 C API 살펴보기

8.2.4 캡스톤으로 재귀적 디스어셈블 도구 제작하기

8.3 ROP 가젯 스캐너 구현

8.3.1 ROP 개요

8.3.2 ROP 가젯 탐색하기

8장에서는 캡스톤 기반으로 디스어셈블러를 제작할 것이다.

제작 이유

개인 목적에 맞는 디스어셈블러를 사용하기 위함.

상용 디스어셈블러는 특수 목적에 알맞게 제작되지 않았기 때문.

8.1 굳이 디스어셈블 과정을 자체 제작할 필요성은?

일반적인 상용화 디스어셈블러(IDA, objdump)등은 심화된 자동화 분석을 위한 확장성이 떨어지기 때문이며, 또한 커스텀 작업을 하기에 적절하기 않기 때문

8.1.1 자체 제작 디스어셈블 도구 필요성 사례: 난독화된 코드

난독화된 코드, 수작업으로 개조된 바이너리, 메모리 덤프나 펌웨어 등에서 추출한 바이너리 분석에 유용함.

예시1. 명령어 겹침

https://www.notion.so/6-1acf0dfd1e9c40e4969436ff939c4b4e

6장 연습문제에서 사용한 명령어 겹침이랑 비슷한 취지.

일반적인 디스어셈블러는 해당 명령어 조각들이 겹치지 않았다고 가정하고 디스어셈블을 수행한다.

명령어 겹침이 가능한 이유

x86에서는 명령어의 길이가 가변적이기 때문.

예제 8-1로 확인해보자.

binary@binary-VirtualBox:~/code/chapter8$ ls
basic_capstone_linear.cc  basic_capstone_recursive.cc  capstone_gadget_finder.cc  Makefile  overlapping_bb.c
binary@binary-VirtualBox:~/code/chapter8$ make overlapping_bb
gcc -o overlapping_bb overlapping_bb.c
binary@binary-VirtualBox:~/code/chapter8$ ls
basic_capstone_linear.cc     capstone_gadget_finder.cc  overlapping_bb
basic_capstone_recursive.cc  Makefile                   overlapping_bb.c
binary@binary-VirtualBox:~/code/chapter8$ objdump -M intel --start-address=0x4005f6 -d overlapping_bb
00000000004005f6 <overlapping>:
  4005f6:       55                      push   rbp
  4005f7:       48 89 e5                mov    rbp,rsp
  4005fa:       89 7d ec                mov    DWORD PTR [rbp-0x14],edi
  4005fd:       c7 45 fc 00 00 00 00    mov    DWORD PTR [rbp-0x4],0x0
  400604:       8b 45 ec                mov    eax,DWORD PTR [rbp-0x14]
  400607:       83 f8 00                cmp    eax,0x0
  40060a:       0f 85 02 00 00 00       jne    400612 <overlapping+0x1c>
  400610:       83 f0 04                xor    eax,0x4
  400613:       04 90                   add    al,0x90
  400615:       89 45 fc                mov    DWORD PTR [rbp-0x4],eax
  400618:       8b 45 fc                mov    eax,DWORD PTR [rbp-0x4]
  40061b:       5d                      pop    rbp
  40061c:       c3                      ret
C++
복사

0x4006a에서 0x400607에서 수행한 결과값을 가지고 점프를 하는데 점프 오프셋이 조금 이상하다.

400612로 점프를 수행하는데, Alignment는 0x400610과 0x400613으로 되어있다.

0x400612로 확인해보자.

binary@binary-VirtualBox:~/code/chapter8$ objdump -M intel --start-address=0x400612 -d overlapping_bb
0000000000400612 <overlapping+0x1c>:
  400612:       04 04                   add    al,0x4
  400614:       90                      nop
  400615:       89 45 fc                mov    DWORD PTR [rbp-0x4],eax
  400618:       8b 45 fc                mov    eax,DWORD PTR [rbp-0x4]
  40061b:       5d                      pop    rbp
  40061c:       c3                      ret
C++
복사

0x400610(83 f0 04) + 0x400613(04 90)에서 0x400610의 앞에 2바이트(83 f0)이 실행되지 않으며, 04 04(add al,0x4)가 수행되며 남은 0x90은 nop으로 처리되는 것을 볼 수 있다.

또한 명령어 처리도 xor eax,0x04, al,0x90에서 add al,0x4로 바뀌어 수행된다.

이와 같이 명령어 겹침 난독화 방식을 이용할 경우 일반적인 디스어셈블러는 정상적으로 두가지 구문에 대하여 디스어셈블 하지 못하기 때문에

별도의 디스어셈블러를 제작하여 사용하는것이 장기적으로 효율적일 수 있다.

glibc 2.22에 나타난 코드 겹침 기법

7b05a: cmp          DWORD PTR fs:0x18,0x0 //해당 
7b063: je           7b066
7b065: lock cmpxchg QWORD PTR [rip+0x3230fa],rcx
C++
복사

앞의 cmp 구문 결과에 따라서 cmpxchg를 수행하는건 동일하나, 해당 리소스에 lock을 수행할지가 결정된다.

fs:0x18의 경우 TEB(Thread Enviroment Block)의 주소값인데 해당 값이 0인 경우가 있나..?

8.1.2 자체 제작 디스어셈블 도구를 개발할 또 다른 필요성

다양한 EntryPoint에서 분석을 시작하는 디스어셈블러 필요

ROP 가젯 탐색 기능 필요

해당 범위 내에서 모든 코드 조합을 찾아주는 기능

일부 코드 경로의 제거 필요성.

하이브리드 분석 필요성.

업무 효율성

8.2 캡스톤 살펴보기

캡스톤은 디스어셈블 기능을 제공하는 프레임워크로, 간편한 API를 제공하도록 설계되어 있다.

x86, x86-64, ARM, MIPS 아키텍처 지원

C/C++ 및 파이썬 지원

윈도우, 리눅스, MacOS 지원

해당 프레임워크를 사용하면 opcodes, mnemonics, 클래스, 명령어에 의해 읽거나 쓰는 레지스터 등 해당 정보들을 복원 해 낼수 있다.

mnemonics code
연상 기호라는 의미이며, 사람이 알아 보기 쉽도록 나타낸 기호를 의미함

8.2.1 캡스톤 설치하기

생략

8.2.2 캡스톤으로 선형 디스어셈블 도구 제작하기

큰 시각에서 보면 캡스톤은 해당 기능을 수행한다.

바이트 코드 묶음 > 디스어셈블 된 명령어로 출력
C++
복사

캡스톤 사용에 있어 가장 기본적인 기능은

.text 섹션에 존재하는 모든 바이트를 읽어 사람이 읽을 수 있는 수준의 명령어로 변환하거나, mnemonics 명령어 상태로 바꾸는 선형 디스어셈블 작업을 수행하는 것이다.

cs_disasm 함수 호출만으로 해당 기능을 수행 할 수 있다.

Init 작업 및 결과 마무리 작업이 cs_disasm 함수에 포함되어 있다.

/* Linearly disassemble a given binary using Capstone. */

#include <stdio.h>
#include <string>

#include <capstone/capstone.h>

#include "../inc/loader.h"

int
disasm(Binary *bin)
{
  csh dis;
  cs_insn *insns;
  Section *text;
  size_t n;

  text = bin->get_text_section();
  if(!text) {
    fprintf(stderr, "Nothing to disassemble\n");
    return 0;
  }

  if(cs_open(CS_ARCH_X86, CS_MODE_64, &dis) != CS_ERR_OK) {
    fprintf(stderr, "Failed to open Capstone\n");
    return -1;
  }

  n = cs_disasm(dis, text->bytes, text->size, text->vma, 0, &insns);
  if(n <= 0) {
    fprintf(stderr, "Disassembly error: %s\n", cs_strerror(cs_errno(dis)));
    return -1;
  }

  for(size_t i = 0; i < n; i++) {
    printf("0x%016jx: ", insns[i].address);
    for(size_t j = 0; j < 16; j++) {
      if(j < insns[i].size) printf("%02x ", insns[i].bytes[j]);
      else printf("   ");
    }
    printf("%-12s %s\n", insns[i].mnemonic, insns[i].op_str);
  }

  cs_free(insns, n);
	cs_close(&dis);

  return 0;
}

int
main(int argc, char *argv[])
{
  Binary bin;
  std::string fname;

  if(argc < 2) {
    printf("Usage: %s <binary>\n", argv[0]); 
    return 1;
  }

  fname.assign(argv[1]);
  if(load_binary(fname, &bin, Binary::BIN_TYPE_AUTO) < 0) {
    return 1;
  }

  if(disasm(&bin) < 0) {
    return 1;
  }

  unload_binary(&bin);

  return 0;
}
C++
복사

메인 함수부터 확인해보자

int
main(int argc, char *argv[])
{
  Binary bin;
  std::string fname;

  if(argc < 2) {
    printf("Usage: %s <binary>\n", argv[0]);// 파라미터가 2개 미만일 경우 사용법 출력
    return 1;
  }

  fname.assign(argv[1]);
  if(load_binary(fname, &bin, Binary::BIN_TYPE_AUTO) < 0) {
    return 1;
  }//load_binary 함수가 실패했을 경우 1 리턴

  if(disasm(&bin) < 0) {
    return 1;
  }//disasm 함수가 실패했을 경우 1 리턴

  unload_binary(&bin); // 메모리 해제

  return 0;
}
C++
복사

아래는 disasm 함수이다

int
disasm(Binary *bin)
{
  csh dis;
  cs_insn *insns;
  Section *text;
  size_t n;

  text = bin->get_text_section(); // 텍스트 섹션 정보 획득
  if(!text) {
    fprintf(stderr, "Nothing to disassemble\n");
    return 0;
  } // 못 가져 올 경우 0 리턴

  if(cs_open(CS_ARCH_X86, CS_MODE_64, &dis) != CS_ERR_OK) { 
    fprintf(stderr, "Failed to open Capstone\n");
    return -1;
  }// dis 구조체를 x86 아키텍쳐로 설정하며, 64비트 환경으로 설정

  n = cs_disasm(dis, text->bytes, text->size, text->vma, 0, &insns); 
  if(n <= 0) {
    fprintf(stderr, "Disassembly error: %s\n", cs_strerror(cs_errno(dis)));
    return -1;
  }//dis 구조체에서 텍스트 섹션의 정보만큼 읽어서 insns 문자열에 저장 및 길이 리턴

  for(size_t i = 0; i < n; i++) {
    printf("0x%016jx: ", insns[i].address);
    for(size_t j = 0; j < 16; j++) {
      if(j < insns[i].size) printf("%02x ", insns[i].bytes[j]);
      else printf("   ");
    }
    printf("%-12s %s\n", insns[i].mnemonic, insns[i].op_str);
  } //문자열 출력

  cs_free(insns, n); //문자열 프리
	cs_close(&dis); // 구조체 프리

  return 0;
}
C++
복사

위에서 사용한 cn_insn 구조체이다.

typedef struct cs_insn {
unsigned int id; //아키텍쳐 식별, 명령어 형식 구분.
uint64_t     address;
uintl6_t     size;
uint8__t     bytes[16];
char         mnemonic[32];
char         op_str[160];
cs_detail    *detail;
 } cs_insn;
C++
복사

해당 id 값을 이용하여 아키텍쳐 마다 다르게 동작하는 디스어셈블 처리 코드를 짤 수 있다.

switch(insn->id){
	case X86_INS_NOP:
		//*처리코드*
		break;
	case X86_INS_CALL:
		//처리 코드
		break;
	default:
		break;
}
C++
복사

해당 소스로 컴파일된 파일로 /bin/ls를 디스어셈 해 보았다.

binary@binary-VirtualBox:~/code/chapter8$ ./basic_capstone_linear /bin/ls | head -n 10
0x0000000000402a00: 41 57                                           push         r15
0x0000000000402a02: 41 56                                           push         r14
0x0000000000402a04: 41 55                                           push         r13
0x0000000000402a06: 41 54                                           push         r12
0x0000000000402a08: 55                                              push         rbp
0x0000000000402a09: 53                                              push         rbx
0x0000000000402a0a: 89 fb                                           mov          ebx, edi
0x0000000000402a0c: 48 89 f5                                        mov          rbp, rsi
0x0000000000402a0f: 48 81 ec 88 03 00 00                            sub          rsp, 0x388
0x0000000000402a16: 48 8b 3e                                        mov          rdi, qword ptr [rsi]
C
복사

8.2.3 캡스톤 C API 살펴보기

참고 자료는 헤더 파일(capstone.h)이 가장 좋음

capstone.h
캡스톤 API 함수들에 대한 설명이 주석으로 정의되어 있음.
cs_insn, cs_err같은 아키텍쳐에 귀속되어 있지 않거나
cs_arch, cs_mode, cs_err 등 enum 타입이 가지는 값이 정의되어 있음.
C
복사

x86.h 
해당 헤더파일에는 x86 / x86-64아키텍쳐에 특화된 내용이 기록되어 있음.
C
복사

8.2.4 캡스톤으로 재귀적 디스어셈블 도구 제작하기

캡스톤의 상세 디스어셈블 모드

명령어가 접근하려는 레지스터에 대한 정보 제공

operand 값 형식 제공

연산의 종류 제공

6장에서 다뤘듯이 재귀적 디스어셈블은 엔트리포인트를 기점으로 분석을 시작해 나간다(IDA)

장점

선형 디스어셈블에 비해 어셈블리 이해력이 좋다 (명령어 겹침 기법에 대한 대응이 가능)

단점

레지스터로 간접 호출 또는 간접 점프가 일어날 경우 재귀적으로는 분석이 불가능하다.

아래는 재귀적 디스어셈블 도구의 소스이다

/* Recursively disassemble a given binary using Capstone. */

#include <stdio.h>
#include <queue>
#include <map>
#include <string>

#include <capstone/capstone.h>

#include "../inc/loader.h"

void
print_ins(cs_insn *ins)
{
  printf("0x%016jx: ", ins->address);
  for(size_t i = 0; i < 16; i++) {
    if(i < ins->size) printf("%02x ", ins->bytes[i]);// i가 인스트럭션 사이즈보자 작을경우 출력
    else printf("   "); // 클경우 공백 출력
  }
  printf("%-12s %s\n", ins->mnemonic, ins->op_str); //구조체 확인 해봐야함
}

bool
is_cs_cflow_group(uint8_t g)
{
  return (g == CS_GRP_JUMP) || (g == CS_GRP_CALL)
          || (g == CS_GRP_RET) || (g == CS_GRP_IRET);
} // 점프거나 콜이거나 리턴이거나 IRET (Interupt Return)에 가까운듯

bool
is_cs_cflow_ins(cs_insn *ins) 
{
  for(size_t i = 0; i < ins->detail->groups_count; i++) {
    if(is_cs_cflow_group(ins->detail->groups[i])) {
      return true; //ins의 디테일의 그룹스가 트루일경우 트루 리턴
    }
  }
  return false;
}

bool
is_cs_unconditional_cflow_ins(cs_insn *ins)
{
  switch(ins->id) {
  case X86_INS_JMP: 
  case X86_INS_LJMP: // 로우점프
  case X86_INS_RET: 
  case X86_INS_RETF: // 리턴펑션
  case X86_INS_RETFQ: //retfq
    return true;
	default:
    return false;
  }
}

uint64_t
get_cs_ins_immediate_target(cs_insn *ins)
{
  cs_x86_op *cs_op;

  for(size_t i = 0; i < ins->detail->groups_count; i++) { // 그룹스 카운팅
    if(is_cs_cflow_group(ins->detail->groups[i])) { // 인덱스 인스트럭션이 그룹일 경우
      for(size_t j = 0; j < ins->detail->x86.op_count; j++) { 
        cs_op = &ins->detail->x86.operands[j]; // operand가
        if(cs_op->type == X86_OP_IMM) { // 직접 호출 주소값을 참조하는 operand인 경우
          return cs_op->imm; // 주소값 리턴
        }
      }
    }
  }

  return 0;
}

int
disasm(Binary *bin)
{
  csh dis;
  cs_insn *cs_ins;
  Section *text;
  size_t n;
  const uint8_t *pc;
  uint64_t addr, offset, target;
  std::queue<uint64_t> Q;
  std::map<uint64_t, bool> seen;

  text = bin->get_text_section();
  if(!text) {
    fprintf(stderr, "Nothing to disassemble\n");
    return 0;
  } // 텍스트 섹션 정보 획득

  if(cs_open(CS_ARCH_X86, CS_MODE_64, &dis) != CS_ERR_OK) {
    fprintf(stderr, "Failed to open Capstone\n");
    return -1;
  } // 캡스톤 구조체 아키텍쳐 설정
  cs_option(dis, CS_OPT_DETAIL, CS_OPT_ON); // 캡스톤 디테일 옵션 설정

  cs_ins = cs_malloc(dis);
  if(!cs_ins) {
		fprintf(stderr, "Out of memory\n");
    cs_close(&dis);
    return -1;
  } // 캡스톤 메모리 할당

  addr = bin->entry;
  if(text->contains(addr)) Q.push(addr); // bin에서 엔트리 포인트 가져와서 존재할경우 Q에 푸쉬
  printf("entry point: 0x%016jx\n", addr);

  for(auto &sym: bin->symbols) { 
    if(sym.type == Symbol::SYM_TYPE_FUNC
       && text->contains(sym.addr)) { // 심볼 타입이 함수이고, 함수 주소가 텍스트 섹션안에 존재할 경우 큐에 추가
      Q.push(sym.addr);
      printf("function symbol: 0x%016jx\n", sym.addr);
    }
  }

  while(!Q.empty()) {
    addr = Q.front();
    Q.pop(); //addr에 가장 빠른 주소값부터 삽입후 pop
    if(seen[addr]) { //해당 address의 bool 값이 true이면 이미 분석 한 함수로 인식하여 패스
      printf("ignoring addr 0x%016jx (already seen)\n", addr);
      continue;
    }

    offset = addr - text->vma; 
    pc     = text->bytes + offset; // program count
    n      = text->size - offset; //뒤는 어케 짜름? 안짜름?
    while(cs_disasm_iter(dis, &pc, &n, &addr, cs_ins)) { // 반복 디스어셈 수행
      if(cs_ins->id == X86_INS_INVALID || cs_ins->size == 0) {
        break; // 더이상 디스어셈 불가능 할 경우 브렠
      }

      seen[cs_ins->address] = true; // 작업 한 주소값 트루로 변경
      print_ins(cs_ins); // 출력 함수

      if(is_cs_cflow_ins(cs_ins)) {// 그룹일 경우
        target = get_cs_ins_immediate_target(cs_ins); // 직접 주소 호출 또는 점프값 획득
        if(target && !seen[target] && text->contains(target)) {
          Q.push(target);// 직접 호출 주소값이 있고 해당 주소값 분석을 아직 안했고, text 섹션에 포함인 경우 뉴 타겟으로 추가
          printf("  -> new target: 0x%016jx\n", target);
        }
        if(is_cs_unconditional_cflow_ins(cs_ins)) { //간접일경우 브레이크
          break;
        }
      } else if(cs_ins->id == X86_INS_HLT) break; // 작업 완료 인스트럭션일 경우 브레이크,
    }
    printf("----------\n");
  }
	cs_free(cs_ins, 1);
  cs_close(&dis);

  return 0;
}

int
main(int argc, char *argv[]) // 선형 디스어셈블러와 동일
{
  Binary bin;
  std::string fname;

  if(argc < 2) {
    printf("Usage: %s <binary>\n", argv[0]);
    return 1;
  }

  fname.assign(argv[1]);
  if(load_binary(fname, &bin, Binary::BIN_TYPE_AUTO) < 0) {
    return 1;
  }

  if(disasm(&bin) < 0) {
    return 1;
  }

  unload_binary(&bin);

  return 0;
}
C
복사

컴파일 후 실행한 결과이다

entry point: 0x0000000000400500
function symbol: 0x0000000000400530
function symbol: 0x0000000000400570
function symbol: 0x00000000004005b0
function symbol: 0x00000000004005d0
function symbol: 0x00000000004006f0
function symbol: 0x0000000000400680
function symbol: 0x0000000000400500
function symbol: 0x000000000040061d
function symbol: 0x00000000004005f6
0x0000000000400500: 31 ed                                           xor          ebp, ebp
0x0000000000400502: 49 89 d1                                        mov          r9, rdx
0x0000000000400505: 5e                                              pop          rsi
0x0000000000400506: 48 89 e2                                        mov          rdx, rsp
0x0000000000400509: 48 83 e4 f0                                     and          rsp, 0xfffffffffffffff0
0x000000000040050d: 50                                              push         rax
0x000000000040050e: 54                                              push         rsp
0x000000000040050f: 49 c7 c0 f0 06 40 00                            mov          r8, 0x4006f0
0x0000000000400516: 48 c7 c1 80 06 40 00                            mov          rcx, 0x400680
0x000000000040051d: 48 c7 c7 1d 06 40 00                            mov          rdi, 0x40061d
0x0000000000400524: e8 87 ff ff ff                                  call         0x4004b0
0x0000000000400529: f4                                              hlt
----------
0x0000000000400530: b8 57 10 60 00                                  mov          eax, 0x601057
0x0000000000400535: 55                                              push         rbp
0x0000000000400536: 48 2d 50 10 60 00                               sub          rax, 0x601050
0x000000000040053c: 48 83 f8 0e                                     cmp          rax, 0xe
0x0000000000400540: 48 89 e5                                        mov          rbp, rsp
0x0000000000400543: 76 1b                                           jbe          0x400560
  -> new target: 0x0000000000400560
0x0000000000400545: b8 00 00 00 00                                  mov          eax, 0
0x000000000040054a: 48 85 c0                                        test         rax, rax
0x000000000040054d: 74 11                                           je           0x400560
  -> new target: 0x0000000000400560
0x000000000040054f: 5d                                              pop          rbp
0x0000000000400550: bf 50 10 60 00                                  mov          edi, 0x601050
0x0000000000400555: ff e0                                           jmp          rax
----------
C
복사

위와 같이 hlt를 만나면 종료, control flow를 만나면 타겟 추가, 간접 호출을 만나면 타겟 추가 하지 않고 종료하여 다음 타겟으로 넘어가는 것을 볼 수 있다.

8.3 ROP 가젯 스캐너 구현

ROP 공격에 필요한 가젯(garget, 코드조각)을 찾아주는 도구를 만들어 본다.

8.3.1 ROP 개요

Stack Buffer Overflow

취약점 찾음

Shellcode 버퍼에 삽입

Stack BOF를 이용하여 흐름 제어 (shellcode로 흐름 변경)

DEP(Data Execution Prevention)를 적용하여 방지함

ret2libc

셸코드 실행 방식에서 실행 바이너리 내에 존재하는 코드를 이용하여 흐름 변조

해당 기법과 유사한 여러가지 공격 기법을 정리하여 ROP로 정의

각각의 가젯은 retrun 명령어로 종료되며, 산술 덧셈이나 논리적 비교 등의 기본적인 연산을 수행하는 명령어 조각들로 구성된다.

해당 조각들을 취합하여 명령어를 만들어서 공격자가 원하는 임의의 기능을 수행하도록 한다.

ROP 프로그래밍 내에서는 가젯들이 위치한 주소를 스택에 차곡차곡 정리해야 하며, 각각의 가젯들은 ret 명령어로 종료시킴으로써 다음 가젯을 호출하여야 한다.

ROP 프로그래밍을 시작하면 가장 먼저 최초 ret명령어가 수행되어야 하고 가장 첫 번째 가젯이 위치한 주소로 점프하게 된다.

아래의 그림은 예시이다

&g1에서 상수를 eax로 pop 하고 &g2에서 esi에 eax를 더하는 작업을 수행한다.

8.3.2 ROP 가젯 탐색하기

예제 코드는 ROP 가젯을 탐색하는 프로그램의 구현 내용이다.

목록의 결과를 통해 적당한 가젯을 선택하고, 조합하여 해당 바이너리를 익스플로잇 해보자.

로직의 경우

리턴을 가장 먼저 탐색하며

해당 위치로부터 거꾸로 올라가 가장 긴 가젯을 찾는다.

해당 로직을 사용하면 ret 명령어의 주변만 확인하면 된다.

/* Find ROP gadgets in a given binary using Capstone. */

#include <stdio.h>
#include <map>
#include <vector>
#include <string>

#include <capstone/capstone.h>

#include "../inc/loader.h"

bool
is_cs_cflow_group(uint8_t g)
{
  return (g == CS_GRP_JUMP) || (g == CS_GRP_CALL)
          || (g == CS_GRP_RET) || (g == CS_GRP_IRET);
}

bool
is_cs_cflow_ins(cs_insn *ins)
{
  for(size_t i = 0; i < ins->detail->groups_count; i++) {
    if(is_cs_cflow_group(ins->detail->groups[i])) {
      return true;
    }
  }

  return false;
}

bool
is_cs_ret_ins(cs_insn *ins)
{
  switch(ins->id) {
  case X86_INS_RET:
    return true;
  default:
    return false;
  }
}

int
find_gadgets_at_root(Section *text, uint64_t root,
                     std::map<std::string, std::vector<uint64_t> > *gadgets,
                     csh dis)
{
  size_t n, len;
  const uint8_t *pc;
  uint64_t offset, addr;
  std::string gadget_str;
  cs_insn *cs_ins;

const size_t max_gadget_len    = 5; /* instructions */
  const size_t x86_max_ins_bytes = 15; //x86의 경우 개별 명령어가 15바이트를 넘지 않음.
  const uint64_t root_offset     = max_gadget_len*x86_max_ins_bytes;

  cs_ins = cs_malloc(dis);
  if(!cs_ins) {
    fprintf(stderr, "Out of memory\n");
    return -1;
  }

  for(uint64_t a = root-1;
	              text->contains(a) && a >= offset-root_offset;
               a--) {
    addr   = a;
    offset = addr - text->vma;
    pc     = text->bytes + offset;
    n      = text->size - offset;
    len    = 0;
    gadget_str = "";
    while(cs_disasm_iter(dis, &pc, &n, &addr, cs_ins)) {
      if(cs_ins->id == X86_INS_INVALID || cs_ins->size == 0) {
        break;//instruction 미상이거나 size가 0일경우 리턴
      } else if(cs_ins->address > root) {
        break; //address가 루트보다 클 경우 리턴
      } else if(is_cs_cflow_ins(cs_ins) && !is_cs_ret_ins(cs_ins)) {
        break; //그룹 컨트롤 플로우가 등장하거나, 리턴 인스트럭션인 경우
      } else if(++len > max_gadget_len) {
        break; // length가 맥스 가젯 랭스보다 긴경우 
      }

      gadget_str += std::string(cs_ins->mnemonic)
                    + " " + std::string(cs_ins->op_str); // 인스트럭션 add

      if(cs_ins->address == root) { //루트에 도달하면 이라는데 루트값으로 갈수가 있나?
        (*gadgets)[gadget_str].push_back(a);// 주소 추가
        break;
      }

      gadget_str += "; ";
    }
  }

  cs_free(cs_ins, 1);

  return 0;
}

int
find_gadgets(Binary *bin)
{
  csh dis;
  Section *text;
  std::map<std::string, std::vector<uint64_t> > gadgets;

  const uint8_t x86_opc_ret = 0xc3;

  text = bin->get_text_section();
  if(!text) {
    fprintf(stderr, "Nothing to disassemble\n");
    return 0;
  }

  if(cs_open(CS_ARCH_X86, CS_MODE_64, &dis) != CS_ERR_OK) {
    fprintf(stderr, "Failed to open Capstone\n");
    return -1;
  }
  cs_option(dis, CS_OPT_DETAIL, CS_OPT_ON);

  for(size_t i = 0; i < text->size; i++) {
    if(text->bytes[i] == x86_opc_ret) {//리턴일 경우 
      if(find_gadgets_at_root(text, text->vma+i, &gadgets, dis) < 0) { // 가젯 탐색 함수 호출
        break;
      }
    }
  }

  for(auto &kv: gadgets) { //가젯 키값만큼 출력
    printf("%s\t[ ", kv.first.c_str()); // 인스트럭션 출력
    for(auto addr: kv.second) {
      printf("0x%jx ", addr); // 어드레스 출력
    }
    printf("]\n");
  }

  cs_close(&dis);

  return 0;
}

int
main(int argc, char *argv[])
{
  Binary bin;
  std::string fname;

  if(argc < 2) {
		printf("Usage: %s <binary>\n", argv[0]);
    return 1;
  }

  fname.assign(argv[1]);
  if(load_binary(fname, &bin, Binary::BIN_TYPE_AUTO) < 0) {
    return 1;
  }

  if(find_gadgets(&bin) < 0) {
    return 1;
  }

  unload_binary(&bin);

  return 0;
}
C
복사

해당 소스를 컴파일 해 실행하면

binary@binary-VirtualBox:~/code/chapter8$ ./capstone_gadget_finder /bin/ls | head -n 10
adc al, 0xb8; add dword ptr [rax], eax; add byte ptr [rax], al; ret     [ 0x412f00 ]
adc bl, al; nop ; nop word ptr cs:[rax + rax]; mov rax, qword ptr [rdi + 0x18]; ret     [ 0x40b133 ]
adc byte ptr [r11 + 9], sil; shl rax, 4; add rax, qword ptr [rbx]; pop rbx; ret         [ 0x40ae40 ]
adc byte ptr [r8], r8b; ret     [ 0x40b5ac ]
adc byte ptr [rax + 0x39], cl; push rdi; or byte ptr [rdi - 0x46], dh; mov rax, rcx; ret        [ 0x40b4bd ]
adc byte ptr [rax - 0x77], cl; ret      [ 0x40eb10 ]
adc byte ptr [rax], al; ret     [ 0x40b5ad ]
adc byte ptr [rbp - 0x14], dh; xor eax, eax; ret        [ 0x412f12 ]
adc byte ptr [rbx + 0x5d], bl; pop r12; pop r13; pop r14; ret   [ 0x411881 ]
adc byte ptr [rbx + 0x5d], bl; pop r12; ret     [ 0x40bb1f 0x40be0a ]
C
복사

이러한 가젯들이 출력된다.