1.创建maven项目
 首先创建一个maven工程,具体流程可查看这篇文章
 创建Maven项目
 2.接下来是pom文件的编辑
 这里我们用的spark版本是2.4.5,scala是2.12,所以要选择spark-streaming-kafka-0-10_2.12
<dependencies>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.12</artifactId>
            <version>2.4.5</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_2.12</artifactId>
            <version>2.4.5</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming-kafka-0-10_2.12</artifactId>
            <version>2.4.0</version>
        </dependency>
    </dependencies>
    <build>
        <plugins>
            <!-- 该插件用于将Scala代码编译成class文件 -->
            <plugin>
                <groupId>net.alchim31.maven</groupId>
                <artifactId>scala-maven-plugin</artifactId>
                <version>3.2.2</version>
                <executions>
                    <execution>
                        <!-- 声明绑定到maven的compile阶段 -->
                        <goals>
                            <goal>testCompile</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>3.0.0</version>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
 
这是pom文件需要配置的内容
3.启动kafka
 依赖下载好之后,我们就可以开始写代码了,可以先在linux中开启kafka
 启动zookeeper
zkServer.sh start
#再启动我们的kafka
kafka-server-start.sh config/server.properties
 
启动kafka的时候注意我们的配置文件的路径
创建一个topic
 创建Topic:(创建一个名为test的topic,只有一个副本,一个分区。)
kafka-topics.sh --create --zookeeper master:2181 --replication-factor 1 --partitions 1 --topic test
 
创建生产者
kafka-console-producer.sh --broker-list master:9092 --topic test
 
注意自己虚拟机的ip地址,我的是master
 4.创建spark-streaming
 OK,接下来我们在idea中创建一个scala文件来作为消费者
import org.apache.kafka.clients.consumer.ConsumerRecord
import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.spark.SparkConf
import org.apache.spark.streaming.dstream.InputDStream
import org.apache.spark.streaming.kafka010.{ConsumerStrategies, KafkaUtils, LocationStrategies, LocationStrategy}
import org.apache.spark.streaming.{Seconds, StreamingContext}
object KafkaSparkStreamingConsumer
 {
    def main(args: Array[String]): Unit = {
        val conf = new SparkConf().setMaster("local[*]").setAppName("NetworkWordCount")
        val ssc = new StreamingContext(conf, Seconds(5))
        // master为linux主机名,如果指定为master,并且在idea直接运行
        // 则需要在C:\Windows\System32\drivers\etc\hosts下添加主机名和linux ip的映射
        // 或者直接指定 linux 的ip也可以
        val brokers = "master:9092"
        val topics = Array("test2")
        val kafkaParams = Map[String, Object](
            "bootstrap.servers" -> brokers,
            "key.deserializer" -> classOf[StringDeserializer],
            "value.deserializer" -> classOf[StringDeserializer],
            "group.id" -> "group1",
            "auto.offset.reset" -> "latest",
            "enable.auto.commit" -> (false: java.lang.Boolean)
        )
        val dstream: InputDStream[ConsumerRecord[String, String]] = KafkaUtils
            .createDirectStream[String, String](ssc,
                LocationStrategies.PreferConsistent,
                ConsumerStrategies.Subscribe[String, String](topics, kafkaParams)
            )
        dstream
            .map((record: ConsumerRecord[String, String]) => {
                println(record.value())
                record.value()
            })
            .flatMap(_.split(" "))
            .map((_, 1))
            .reduceByKey(_ + _)
            .print()
        ssc.start()
        ssc.awaitTermination()
    }
}
 
KafkaUtils.createDirectStream
 通过Direct方式创建DStream,这种方式是kafka 0.8版本新增加的功能,我们现在用的是0.10版本,API又有了新的变化,现在我们对此方法做一个说明:
LocationStrategies.PreferConsistent:
 topic中的分区均匀的分配到executor,也就是每个executor均匀的去消费topic中的分区数据,每个executor对应的topic分区数一样。
ConsumerStrategies.Subscribe[String, String](topics, kafkaParam):
 指定消费者要消费的topic,以及配置参数。
运行,就可以看到我们的streaming处理结果了
 










