JVM字节码基础 – Exocoetus

classFile {
  u4                magic;                                 魔数（文件开头的文件识别标识）
  u2                minor_version;                         版本号
  u2                major_version;                         版本号
  u2                constant_pool_count;                   
  cp_info           constant_pool[constant_pool_count-1];  常量池（存储字符串和较大整数）
  u2                access_flags;                          类访问标记（）
  u2                this_class;                            类索引
  u2                super_class;                           超类索引
  u2                interfaces_count;
  u2                interfaces[interfaces_count];          接口表索引
  u2                fields_counts;
  field_info        fields[fields_count];                  字段表
  u2                methods_count;
  method_info       methods[methods_count];                方法表
  u2                attributes_count;
  attribute_info    attributes[attributes_count];          属性表
}

常量池类型结构

上图中，tag 表示常量项的类型。后缀为 index 表示是一个常量池索引，指向常量池中的对象。

字段表和方法表结构

上图 exception_table 中，start_pc / end_pc / handler_pc 都是指向 code 字节数组的索引值。

start_pc 和 end_pc 表示异常处理器覆盖的字节码开始和结束的位置，左闭右开。handler_pc 表示异常处理 handler 在 code 字节数组的起始位置，异常被捕获以后该跳转到何处继续执行。catch_type 表示需要处理的 catch 的异常类型。

JVM 执行到方法 [start_pc, end_pc) 范围内的字节码发生异常时，如果异常是 catch_type 对应的异常类或其子类，则跳转到 code 字节数组 handler_pc 处继续处理。

Code_attribute 教学用例详见书 P29

字节码指令

字节码指令概念

字节码指令由 opcode (操作码) 和可选的 operand (操作数) 构成。

<opcode> [<operand1>, <operand2>]

例 bipush 100
将整型常量 100 压栈到栈顶

字节码使用大端序（高位在前，低位在后）。

字节码指令的大致用途：

加载和存储指令（iload 将一个整型值从局部变量表加载到操作数栈）
控制转移指令（条件分支 ifeq）
对象操作（创建类实例的指令 new）
方法调用（invokevirtual 指令调用对象的实例方法）
运算指令和类型转换（加法指令 iadd）
线程同步（monitorenter 和 monitorexit 支持 synchronized 语义）
异常处理（athrow 显式抛出异常）

加载和存储指令

分为 load 类、store 类、常量加载

load 类：将局部变量表中的变量加载到操作数栈，如 lload, fload, dload, aload 等。

store 类：将栈顶的数据存储到局部变量表中，如 lstore, fstore, dstore, astore 等。

常量加载相关：const 类和 push 类（将常量值直接加载到操作数栈顶），ldc 类（从常量池加载对应的常量到操作数栈顶）。如 lad #10 是将常量池中下标为 10 的常量数据加载到操作数栈上。

同是 int 型常量，为了使字节码更紧凑，int 型常量根据值 n 的范围，使用的指令按照如下的规则：
n ∈ [-1, 5] -> 使用 iconst_n 操作数+操作码占一个字节
n ∈ [-128,127] -> 使用 bipush_n 操作数+操作码占两个字节
n ∈ [-32768, 32767] -> 使用 sipush_n 操作数+操作码占三个字节
n 其他范围 -> 使用 ldc 例：ldc #i

aconst_null ----- null 入栈
iconst_m1  ------ int -1 压栈
iconst_<n> ------ int n(0～5) 压栈
lconst_<n> ------ long n(0～1) 压栈
fconst_<n> ------ float n(0～2) 压栈
dconst_<n> ------ double n(0～1) 压栈
bipush ---------- int -128～127 压栈
sipush ---------- int -32768～32767 压栈
ldc ------------- int/float/String 类型常量值压栈（寻址255个常量池索引值）
ldc_w ----------- 同 ldc，覆盖寻址常量池
ldc2_w ---------- long/double 类型常量池压栈
<T>load --------- 局部变量表指定位置，T类型变量入栈 {i, l, f, d, a(引用)}
<T>load_<n> ----- 局部变量表中下标为 n(0～3) 的 T类型变量入栈 {i, l, f, d, a}
<T>aload -------- 将指定数组中特定位置的 T类型变量入栈 {i, l, f, d, a, b, c, s}
<T>store -------- 将栈顶类型T的数据存储到局部变量表的指定位置 {i, l, f, d, a}
<T>store_<n> ---- 将栈顶类型T的数据存储到局部变量表下标 n(0～3) 位置 {i, l, f, d, a}
<T>astore ------- 将栈顶类型T的数据存储到数组的指定位置 {i, l, f, d, a, b, c, s}

操作数栈指令

pop ------------- 出栈
pop2 ------------ 弹出 一个 long/double 类型数据或 两个其他类型数据
dup ------------- 赋值栈顶元素并入栈
dup_x1 ---------- 复制操作数栈栈顶并插入到栈顶以下第 2 个值下
dup2 ------------ 复制栈顶两个数据并入栈
swap ------------ 交换栈顶两个元素

...

运算和类型转换指令

+  --------- add {i, l, f, d}
-  --------- sub {i, l, f, d}
/  --------- div {i, l, f, d}
*  --------- mul {i, l, f, d}
%  --------- rem {i, l, f, d}
negate(-)--- neg {i, l, f, d}
&  --------- and {i, l}
|  --------- or  {i, l}
^  --------- xor {i, l}

type	*int*	*long*	*float*	*double*	*byte*	*char*	*short*
*int*	/	i2l	i2f	i2d	i2b	i2c	i2s
*long*	l2i	/	l2f	l2d	/	/	/
*float*	f2i	f2l	/	f2d	/	/	/
*double*	d2i	d2l	d2f	/	/	/	/

虽然 boolean, char, byte, short 是不同的数据类型，但是在 JVM 层面都被当作 int 来处理。

多种类型数据混合运算时，系统会自动将数据转为范围更大的数据类型（增加精度），称为宽化类型转换。

同理，大范围转为小范围的数据类型称为窄化类型转换（丢失精度），如 long -> int, double -> float

byte  -->  short  -->  int  -->  long  -->  float  -->  double
------------------------- widening -------------------------> 

byte  <--  short  <--  int  <--  long  <--  float  <--  double
<------------------------ narrowing ------------------------

控制转移指令

/**
 * 条件转移
 */
// 比较栈顶 int 型变量的跳转条件
ifeq ---- a == 0
ifne ---- a != 0
iflt ---- a < 0
ifle ---- a <= 0
ifgt ---- a > 0
ifge ---- a >= 0

// 比较栈顶两个 int 型变量的跳转条件
if_icmpeq ---- a == b
if_icmpne ---- a != b
if_icmplt ---- a < b
if_icmple ---- a <= b
if_icmpgt ---- a > b
if_icmpge ---- a >= b

// 比较栈顶两个引用类型变量的跳转条件
if_acmpeq ---- a == b
if_acmpne ---- a != b

// 无条件跳转
goto


/**
 * 复合条件转移
 */
tableswitch ---- switch 条件跳转，case 紧凑时使用
lookupswitch --- switch 条件跳转，case 稀疏时使用

/**
 * 无条件转移
 */
goto / goto_w / jsr / jsr_w / ret

例

public int isPositive(int n) {
  if (n > 0) {
    return 1;
  } else {
    return 0;
  }
}

0: iload_1    局部变量表中下标为 1 的 int 型变量入栈
1: ifle 6     出栈，判断是否小于 0，是则跳转 6
4: iconst_1   操作数栈 入栈常量 1
5: ireturn    出栈返回，调用结束
6: iconst_0   操作数栈 入栈常量 0
7: ireturn    出栈返回，调用结束

for 循环的字节码原理

书中对 for 循环的实现细节给出一个示例

public int sum(int[] numbers) {
  int sum = 0;
  for (int number : numbers) {
    sum += number;
  }
  return sum;
}

// 字节码
0: iconst_0
1: istore_2
2: aload_1
3: astore_3
4: aload_3
5: arraylength
6: istore  4
8: iconst_0
9: istore  5
11: iload  5
13: iload  4
15: if_icmpge  35
18: aload_3
19: iload  5
21: iaload
22: istore  6
24: iload_2
25: iload  6
27: iadd
28: istore_2
29: iinc  5, 1
32: goto  11
35: iload_2
36: ireturn

并且给出了一个实际的整型数组 [10， 20， 30] 作为入参，放入函数中进行推演。为了方便直观的体现书中描述的流程，我做成了 PPT 图片，如下：

左上角为操作数栈，下侧为局部变量表。

switch-case

编译器使用 tableswitch 和 lookupswitch 两条指令来生成 switch 语句的编译代码。

编译器首先会对 case 的值做分析，其中 case 值较为集中紧凑时，使用 tableswitch，类似于计数排序（不是基数排序）和打表法，主要思想是空间换时间。示例如下：

int chooseNear(int i) {
  switch (i) {
    case 100: return 0;
    case 101: return 1;
    case 104: return 4;
    default: return -1;
  }
}

// 字节码
0: iload_1
1: tableswitch {
    100: 36
    101: 38
    102: 42
    103: 42
    104: 40
    default: 42
}
42: iconst_m1
43: ireturn

case 值被补成连续的空间，就可以使用时间复杂度为 O(1) 的查找。

对应的，lookupswitch 用来处理不集中的 case，对键值排序后，采用事件复杂度为 O(logn) 的二分查找。

上述是针对整型 case，对于 String 类型的 case，采用先比较哈希，冲突时再对比字符串值。（和 Java 大部分使用哈希的数据结构原理是相同的）

++i 和 i++

示例代码如下：

public static void foo() {
  int i = 0;
  for (int j = 0; j < 50; j++){
    i = i++; 
  }
  System.out.println(i);
}

// 字节码
...
10: iload_0
11: iinc  0, 1
14: istore_0
...

由于 11 行指令是直接对局部变量表进行增 1 操作，所以会被 14 行的 istore 覆盖掉，而 iload_0 和 istore_0 之间操作数栈顶的值并无变化，所以相当于无意义的操作。因此 i = i++ 这行代码无作用，不会改变 i 的值。

再将 i++ 替换成 ++i, 字节码变化如下：

...
10: iinc  0, 1
13: iload_0
14: istore_0
...

这里先对局部变量表进行增 1 操作，13 行和 14 行的存取并无意义。

示例

public static void bar() {
  int i = 0;
  i = i++ + ++i;
  System.out.println(i);
}

// 字节码
0: iconst_0
1: istore_0
2: iload_0
3: iinc  0, 1
6: iinc  0, 1
9: iload_0
10: iadd
11: istore_0

这里先取 0 位置的值自增 2 次，然后再取 0 位置的值相加，最后放回 0 位置。

try-catch-finally

示例：

public class TryCatchFinallyDemo {
  public void foo() {
    try {
      tryItOut();
    } catch (MyException e) {
      handleException(e);
    }
  }
}

// 字节码
0: aload_0
1: invokevirtual #2      // Method tryItOut:()V
4: goto  13
7: astore_1
8: aload_0
9: aload_1
10: invokevirtual #4     // Method handleException:(Ljava/lang/Exception;)V
13: return
Exception table:
  from    to    target    type
  0       4     7         Class MyException

通过观察 Exception table 可以看到，0～4 行指令受到监控。在范围内，通过 invokevirtual #2 指令调用 tryItOut() 方法，如果未抛出异常旧会跳转到指令 13 进行返回。如果异常则跳转到 target: 7，即第 7 行指令。之后将会加载 this 和异常对象到栈上，调用 handleException 进行处理。

多个 catch 则会在 return 前添加 goto 到 invokevirtual 部分。并添加 target (跳转的目的指令) 到 Exception table 中，如下：

0: aload_0
1: invokevirtual #2      // Method tryItOut:()V
4: goto  13
7: astore_1
8: aload_0
9: aload_1
10: invokevirtual #4     // Method handleException1:(Ljava/lang/Exception;)V
13: goto  22

16: astore_1
17: aload_0
18: aload_1
19: invokevirtual #8     // Method handleException2:(Ljava/lang/Exception;)V

22: return
Exception table:
  from    to    target    type
  0       4     7         Class MyException1
  0       4     16        Class MyException2

关于 Finally，Java 编译器采用复制 finally 代码块的方式，将其内容插入到 try 和 catch 代码块中所有正常退出和异常退出之前。

public void foo() {
  try {
    tryItOut();
  } catch (MyException e) {
    handleException(e);
  } finally {
    handleFinally();
  }
}

0: aload_0
1: invokevirtual #2      // Method tryItOut:()V
// finally
4: aload_0
5: invokevirtual #9      // Method handleFinally:()V
8: goto  31

11: astore_1
12: aload_0
13: aload_1
14: invokevirtual #4     // Method handleException:(Ljava/lang/Exception;)V
//finally
17: aload_0
18: invokevirtual #9     // Method handleFinally:()V
21: goto  31

24: astore_2
25: aload_0
26: invokevirtual #9     // Method handleFinally:()V
29: aload_2
30: athrow
31: return
Exception table:
  from    to    target    type
  0       4     11        Class MyException
  0       4     24        any
  11      17    24        any

为了保证 finally 的执行，在正常流程 return 之前插入代码块。如果出现异常，则跳转 MyException ，并在 return 前加入代码块。为了保证 MyException 和 handleException 出现异常依然能够调用 finally，检测 11～17 行指令，如果出现异常则转入第三部分的 finally 代码块，并抛出。即，finally 代码块会拦截所有出口。

因此，如果在 finally 语句中添加 return，即使 try-catch 中也定义了 return，也会在这些出口之前调用并返回。如下：

public int foo() {
  try {
    int a = 1 / 0;
    return 0;
  } catch (Exception e) {
    int b = 1 / 0;
    return 1;
  } finally {
    return 2;
  }
}

// 返回 2

如果在 finally 中修改数值，由于 finally 会将返回值存在临时变量中，再进行修改并不会改变临时变量的值。而返回是取出临时变量值进行返回，所以 finally 虽然会拦截出口，但并不能用于在返回前对值进行修改，具体详见书 P69～71。

try-with-resources

由于上述 finally 的特性，因此在使用 try-finally 时候很容易出现 try 中抛出异常被 finally 淹没的情况。

// 当 write 和 close 同时异常时，会只抛出 close 的
public static void foo() throws IOException {
  FileOutputStream in = null;
  try {
    in = new FileOutputStream("test.txt");
    in.write(1);
  } finally {
    if (in != null) {
      in.close();
    }
  }
}

因此 Java 7 在 Throwable 类中增加了 addSuppressed 方法，将被抑制的异常记录下来，可以做到不丢失任何异常。

// 改写后
public static void foo() throws IOException {
  FileOutputStream in = null;
  Exception exception = null;
  try {
    in = new FileOutputStream("test.txt");
    in.write(1);
  } catch (Exception e) {
    exception = e;
    throw e;
  } finally {
    if (in != null) {
      if (exception != null) {
        try {
          in.close();
        } catch (Exception e) {
          exception.addSuppressed(e);
        }
      } else {
          in.close();
      }
    }
  }
}

对象相关字节码指令

<init>方法

对象初始化方法、类构造方法、非静态变量初始化、对象初始化代码块都会被编译进 <init>方法中。

public class Initializer {
  private int a = 10;
  public Initializer() {
    int c = 30;
  }
  {
    int b = 20;
  }
}

// 字节码
public Initializer();
descriptor: ()V
flags: ACC_PUBLIC
Code:
  stack=2, locals=2, args_size=1
    0: aload_0
    1: invokespecial #1    // Method java/lang/Object."<init>":()V
    4: aload_0
    5: bipush  10
    7: putfield #2
    10: bipush  20
    12: istore_1
    13: bipush
    15: istore_1
    16: return

Java 语法允许将成员变量初始化和初始化语句块写在构造器方法外，但最终编译后都会统一编译进<init>方法。

new

当我们使用 Java 语言 new 一个新对象时，在字节码中调用了三条指令

0: new #2                  // class  字节码的 new 指令，创建类实例引用
3: dup                     // invokespecial 会消耗(pop)栈顶的引用，所以复制一份
4: invokespecial #3        // Method ."<init>":()V 调用初始化方法

<clinit>

类的静态初始化方法、类静态初始化快、静态变量初始化都会被编译进<clinit>方法中。（猜测是因为静态资源统一管理所以单独设定编译初始化）

public class Initializer {
  private static int a = 0;
  static {
    System.out.println("static");
  }
}

// 字节码
static {};
descriptor: ()V
flags: ACC_STATIC
Code:
  stack=2, locals=0, args_size=0
    0: iconst_0
    1: putstatic #2
    4: getstatic #3
    7: ldc #4
    9: invokevirtual #5
    12: return

与<init>不同的是，<clinit>不会直接通过 invokevirtual 调用，而是在 new, getstatic, putstatic, invokestatic 四个指令触发时调用。