Hadoop源码解析之Writable类-CFANZ编程社区

`Hadoop`源码解析之`Writable`类

1.源码

package org.apache.hadoop.io;

import java.io.DataOutput;
import java.io.DataInput;
import java.io.IOException;

import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;

@InterfaceAudience.Public
@InterfaceStability.Stable
public interface Writable {
...
}

2. 方法详解

write()方法

//Serialize the fields of this object to out#将本对象中的字段序列化至out中
@param out： DataOuput to serialize this object into.
@throws IOException
  void write(DataOutput out) throws IOException;

readFields()方法

@param in: DataInput to deseriablize this object from.
@throws IOException#可能抛出异常
  void readFields(DataInput in) throws IOException;
}

Deserialize the fields of this object from in. For efficiency, implementations should attempt to re-use storage in the existing object where possible.

A serializable object which implements a simple, efficient, serialization protocol, based on {@link DataInput} and {@link DataOutput}.

Any key or value type in the Hadoop Map-Reduce framework implements this interface.

Implementations typically implement a static read(DataInput) method which constructs
a new instance, calls readFields(DataInput)and returns the instance.

将来自in（输入流）中对象的字段反序列化，从效率而言，实现应该尽可能的尝试重用既在的对象。
基于DataInput和DataOutPut，实现一个简单的，有效的，序列化协议的一个序列化对象。
Hadoop MapReduce 框架中的任何key，value类型 都必须实现这个接口
典型的实现是：实现静态的read(DataInput)方法，这个方法方法构造一个新的实例；同时调用readFields(DataInput)方法并且返回一个实例。

3.示例

Example

public class MyWritable implements Writable {
      // Some data     
      private int counter;
      private long timestamp;

      public void write(DataOutput out) throws IOException {
        out.writeInt(counter);
        out.writeLong(timestamp);
      }

      public void readFields(DataInput in) throws IOException {
        counter = in.readInt();
        timestamp = in.readLong();
      }

      public static MyWritable read(DataInput in) throws IOException {
        MyWritable w = new MyWritable();
        w.readFields(in);
        return w;
      }
    }

4. 实现类

4.1 `IntWritable`

类释义

A WritableComparable for ints.

针对java 中基础类型int的WritableComparable类型 => IntWritable

构造器

IntWritable() 
IntWritable(int value)

一个是无参构造器；一个是带有一个int参数的构造器

set()方法

public void set(int value)
    Set the value of this IntWritable.

get()方法

public int get()
    Return the value of this IntWritable.

… 因为实现的是WritableComparable 接口，所以会有其他方法，诸如：equal()，readFields()，write()等方法。这里不再一一介绍。

类代码示例

import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Writable;

import java.io.*;

public class WritableDemo {
    public static byte[] serialize(Writable writable) throws IOException {
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        DataOutputStream dataOut = new DataOutputStream(out);
        writable.write(dataOut);
        dataOut.close();
        return out.toByteArray();
    }

    public static byte[] deserialize(Writable writable,byte[] bytes) throws IOException {
        ByteArrayInputStream in = new ByteArrayInputStream(bytes);
        DataInputStream dataIn = new DataInputStream(in);
        writable.readFields(dataIn);
        dataIn.close();
        return bytes;
    }

    public static void main(String[] args) throws IOException {

        /*step 1: 将 IntWritable 对象作为输入流，放到bytes中
         */
        IntWritable inW = new IntWritable(163);
        byte[] bytes = serialize(inW);
        System.out.println("bytes.length: "+bytes.length);

        /*step 2: 将上述的 bytes 作为输入流，放到outW中
         */
        IntWritable outW = new IntWritable();
        deserialize(outW, bytes);
        System.out.println(outW.get());
    }
}

Hadoop源码解析之Writable类_hadoop

write a small helper method that wraps a java.io.ByteArrayOutputStream in a java.io.DataOutputStream (an implementation of java.io.DataOutput) to capture the bytes in the serialized stream.

其中 DataOutputStream 类介绍如下：

A data output stream lets an application write primitive Java data types to an output stream in a portable way. An application can then use a data input stream to read the data back in.

一个数据输出流，允许应用以一个合适的方式将原生java 数据类型写到一个输出流中。应用能够数据输入流从而将数据读入。

既然有DataOutputStream 类，那么就有DataInputSteam 类

A data input stream lets an application read primitive Java data types from an underlying input stream in a machine-independent way. An application uses a data output stream to write data that can later be read by a data input stream.