0
点赞
收藏
分享

微信扫一扫

Hadoop源码解析之Writable类

西曲风 2022-01-28 阅读 57


​Hadoop​​​源码解析之​​Writable​​类

1.源码

package org.apache.hadoop.io;

import java.io.DataOutput;
import java.io.DataInput;
import java.io.IOException;

import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;

@InterfaceAudience.Public
@InterfaceStability.Stable
public interface Writable {
...
}

2. 方法详解

  • ​write()​​方法
//Serialize the fields of this object to out#将本对象中的字段序列化至out中
@param out: DataOuput to serialize this object into.
@throws IOException
void write(DataOutput out) throws IOException;
  • ​readFields()​​方法
@param in: DataInput to deseriablize this object from.
@throws IOException#可能抛出异常
void readFields(DataInput in) throws IOException;
}


Deserialize the fields of this object from in. For efficiency, implementations should attempt to re-use storage in the existing object where possible.



A serializable object which implements a simple, efficient, serialization protocol, based on {@link DataInput} and {@link DataOutput}.



Any key or value type in the Hadoop Map-Reduce framework implements this interface.



Implementations typically implement a static read(DataInput) method which constructs
a new instance, calls readFields(DataInput)and returns the instance.



  • 将来自​​in​​(输入流)中对象的字段反序列化,从效率而言,实现应该尽可能的尝试重用既在的对象。
  • 基于​​DataInput​​​和​​DataOutPut​​,实现一个简单的,有效的,序列化协议的一个序列化对象。
  • ​Hadoop MapReduce​​ 框架中的任何key,value类型 都必须实现这个接口
  • 典型的实现是:实现静态的​​read(DataInput)​​​方法,这个方法方法构造一个新的实例;同时调用​​readFields(DataInput)​​方法并且返回一个实例。

3.示例

  • ​Example​
public class MyWritable implements Writable {
// Some data
private int counter;
private long timestamp;

public void write(DataOutput out) throws IOException {
out.writeInt(counter);
out.writeLong(timestamp);
}

public void readFields(DataInput in) throws IOException {
counter = in.readInt();
timestamp = in.readLong();
}

public static MyWritable read(DataInput in) throws IOException {
MyWritable w = new MyWritable();
w.readFields(in);
return w;
}
}

4. 实现类

4.1 ​​IntWritable​
  • 类释义


A WritableComparable for ints.


针对java 中基础类型​​int​​​的​​WritableComparable​​​类型 => ​​IntWritable​

  • 构造器
IntWritable() 
IntWritable(int value)

一个是无参构造器;一个是带有一个int参数的构造器

  • ​set()​​方法
public void set(int value)
Set the value of this IntWritable.
  • ​get()​​方法
public int get()
Return the value of this IntWritable.

… 因为实现的是​​WritableComparable​​​ 接口,所以会有其他方法,诸如:​​equal()​​​,​​readFields()​​​,​​write()​​等方法。这里不再一一介绍。

  • 类代码示例
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Writable;

import java.io.*;

public class WritableDemo {
public static byte[] serialize(Writable writable) throws IOException {
ByteArrayOutputStream out = new ByteArrayOutputStream();
DataOutputStream dataOut = new DataOutputStream(out);
writable.write(dataOut);
dataOut.close();
return out.toByteArray();
}

public static byte[] deserialize(Writable writable,byte[] bytes) throws IOException {
ByteArrayInputStream in = new ByteArrayInputStream(bytes);
DataInputStream dataIn = new DataInputStream(in);
writable.readFields(dataIn);
dataIn.close();
return bytes;
}

public static void main(String[] args) throws IOException {

/*step 1: 将 IntWritable 对象作为输入流,放到bytes中
*/
IntWritable inW = new IntWritable(163);
byte[] bytes = serialize(inW);
System.out.println("bytes.length: "+bytes.length);

/*step 2: 将上述的 bytes 作为输入流,放到outW中
*/
IntWritable outW = new IntWritable();
deserialize(outW, bytes);
System.out.println(outW.get());
}
}

Hadoop源码解析之Writable类_hadoop


write a small helper method that wraps a java.io.ByteArrayOutputStream in a java.io.DataOutputStream (an implementation of java.io.DataOutput) to capture the bytes in the serialized stream.


其中 ​​DataOutputStream​​ 类介绍如下:


A data output stream lets an application write primitive Java data types to an output stream in a portable way. An application can then use a data input stream to read the data back in.


一个数据输出流,允许应用以一个合适的方式将原生java 数据类型写到一个输出流中。应用能够数据输入流从而将数据读入。

既然有​​DataOutputStream​​​ 类,那么就有​​DataInputSteam​​ 类


A data input stream lets an application read primitive Java data types from an underlying input stream in a machine-independent way. An application uses a data output stream to write data that can later be read by a data input stream.


5.总结


  • ​Writable​​这个类的作用就在于序列化<—>反序列化的相互转化上。
  • 注意在任何的​​MapReduce​​​ 代码中,如果想写/读一个非​​MapReduce​​​自定义的变量类型,那么就必须实现​​Writable​​接口,然后实现这两个方法。否则会报错。错误示例详见我的博客【】


举报

相关推荐

0 条评论