Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Hadoop2.6.0 Learning Notes (4) Analysis of TextInputFormat and RecordReader

2025-01-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Lu Chunli's work notes, who said that programmers should not have literary style?

The simplest MapReduce program

Package com.lucl.hadoop.mapreduce;public class MiniMRDriver extends Configured implements Tool {public static void main (String [] args) {try {ToolRunner.run (new MiniMRDriver (), args);} catch (Exception e) {e.printStackTrace () } @ Override public int run (String [] args) throws Exception {Job job = Job.getInstance (this.getConf (), this.getClass (). GetSimpleName ()); job.setJarByClass (MiniMRDriver.class); FileInputFormat.addInputPath (job, new Path (args [0])); FileOutputFormat.setOutputPath (job, new Path (args [1])); return job.waitForCompletion (true)? 0: 1 }}

View data for MapReduce tasks

[hadoop@nnode code] $hdfs dfs-text / data/HTTP_SITE_FLOW.log Video website 15 1527 Information Security 20 3156 site Statistics 24 6960 search engine 28 3659 site Statistics 3 1938 Integrated Portal 15 1938 search engine 21 9531 search engine 63 11058 [hadoop@nnode code] $

Package and run the MapReduce program

[hadoop@nnode code] $hadoop jar MiniMR.jar / data/HTTP_SITE_FLOW.log / 20151130211915 hadoop jar MiniMR.jar 11 INFO client.RMProxy: Connecting to ResourceManager at nnode/192.168.137.117:803215/11/30 21:19:48 INFO input.FileInputFormat: Total input paths to process: 115-11-30 21:19:48 INFO mapreduce.JobSubmitter: number of splits:115/11/30 21:19:49 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1448889273221_000115/11/ 30 21:19:50 INFO impl.YarnClientImpl: Submitted application application_1448889273221_000115/11/30 21:19:50 INFO mapreduce.Job: The url to track the job: http://nnode:8088/proxy/application_1448889273221_0001/15/11/30 21:19:50 INFO mapreduce.Job: Running job: job_1448889273221_000115/11/30 21:20:26 INFO mapreduce.Job: Job job_1448889273221_0001 running in uber mode: false15/11/30 21:20:26 INFO mapreduce.Job: map 0 reduce 0 reduce 11 INFO mapreduce.Job 30 21:20:59 INFO mapreduce.Job: map 100% reduce 0 Universe 21:21:30 INFO mapreduce.Job: map 100% reduce 100-11-30 21:21:31 INFO mapreduce.Job: Job job_1448889273221_0001 completed successfully15/11/30 21:21:31 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=254 FILE: Number of bytes written=213863 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=277 HDFS: Number of bytes written=194 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms) = 30256 Total time spent by all reduces in occupied slots (ms) = 27787 Total time spent by all map tasks (ms) = 30256 Total time spent by all reduce tasks (ms) = 27787 Total vcore-seconds taken by all map tasks=30256 Total vcore -seconds taken by all reduce tasks=27787 Total megabyte-seconds taken by all map tasks=30982144 Total megabyte-seconds taken by all reduce tasks=28453888 Map-Reduce Framework Map input records=8 Map output records=8 Map output bytes=232 Map output materialized bytes=254 Input split bytes=103 Combine input records=0 Combine output records=0 Reduce input groups=8 Reduce shuffle bytes=254 Reduce input records=8 Reduce output records=8 Spilled Records=16 Shuffled Maps = 1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms) = 182 CPU time spent (ms) = 2000 Physical memory (bytes) snapshot=305459200 Virtual memory (bytes) snapshot=1697824768 Total committed heap usage (bytes) = 136450048 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=174 File Output Format Counters Bytes Written=194 [hadoop@nnode code] $

View the output result

[hadoop@nnode code] $hdfs dfs-ls / 201511302119Found 2 items-rw-r--r-- 2 hadoop hadoop 0 2015-11-3021: 21 / 201511302119 20 315644 site Statistics 24 696066 search engine 28 365988 site Statistics 3 1938109 Integrated Portal 15 1938131 search engine 21 9531153 search engine 63 11058 [hadoop@nnode code] $

Mapper class and Reducer class are not specified here, and the input data and output result storage path are specified through FileInputFormat and FileOutputFormat. After execution, the row offset and row contents are saved to the specified output path.

The default implementation of FileInputFormat is TextInputFormat, which is specifically used to process text data, using carriage return newline characters as the split mark of a line, where key is the line offset and value is the content of this line.

The class is defined as follows:

Public class TextInputFormat extends FileInputFormat {@ Override public RecordReader createRecordReader (InputSplit split, TaskAttemptContext context) {/ / slightly return new LineRecordReader (recordDelimiterBytes);} @ Override protected boolean isSplitable (JobContext context, Path file) {/ / whether slicable}}

In the Job task, you can use public void setInputFormatClass (Class)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report