Mapreduce programming example

Analysis of Mapreduce

Mapreduce is a computing framework. Since it is a framework for computing, the expression is an input. The mapreduce operation inputs the input. The output is defined by the calculation model defined by itself. The output is The results we need.

What we have to learn is the operating rules of this calculation model. When running a mapreduce computing task, the task process is divided into two phases: the map phase and the reduce phase, each of which uses a key-value pair (input) and output (output). What the programmer has to do is define the functions of these two phases: the map function and the reduce function.

Mapreduce programming example 1, data deduplication

"Data deduplication" is mainly to grasp and use parallelization ideas to meaningfully filter data. Counting the number of types of data on a large data set, calculating the number of visits from the website log, and so on, seemingly complex tasks involve data deduplication. Let's go into the MapReduce programming of this example.

1.1 Example Description

Deduplicate the data in the data file. Each line in the data file is a single piece of data.

The sample input is as follows:

1) file1:

Mapreduce programming example

2) file2:

Mapreduce programming example

The sample output looks like this:

Mapreduce programming example

1.2 Design ideas

The ultimate goal of data deduplication is to have data that appears more than once in the original data appear only once in the output file. We naturally think of giving all the records of the same data to a reduce machine, no matter how many times this data appears, as long as it is output once in the final result. Specifically, the input of reduce should use data as the key, but not the value-list. When reduce receives a "key, value-list", it directly copies the key to the output key and sets the value to a null value.

In the MapReduce process, the output "key, value" of the map is aggregated into "key, value-list" after the shuffle process, and then passed to reduce. Therefore, from the designed reduce input can be reversed, the output key of the map should be data, and the value is arbitrary. Continue to push back, the key of the map output data is data, and in this instance each data represents a line of content in the input file, so the task to be completed in the map phase is to set the value to Hadoop after the default job input method is used. Key, and output directly (any value in the output). The result in the map is passed to reduce after the shuffle process. The reduce phase does not control how many values ​​each key has. It simply copies the input key into the output key and outputs it (the value in the output is set to null).

1.3 program code

The program code is as follows:

Package com.hebut.mr;

Import java.io.IOExcepTIon;

Import org.apache.hadoop.conf.ConfiguraTIon;

Import org.apache.hadoop.fs.Path;

Import org.apache.hadoop.io.IntWritable;

Import org.apache.hadoop.io.Text;

Import org.apache.hadoop.mapreduce.Job;

Import org.apache.hadoop.mapreduce.Mapper;

Import org.apache.hadoop.mapreduce.Reducer;

Import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

Import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

Import org.apache.hadoop.uTIl.GenericOpTIonsParser;

Public class Dedup {

/ / map copy the value in the input to the key of the output data, and directly output

Public static class Map extends Mapper "Object, Text, Text, Text" {

Private static Text line=new Text();//each line of data

/ / Implement the map function

Public void map(Object key,Text value,Context context)

Throws IOException, InterruptedException{

Line=value;

Context.write(line, new Text(""));

}

}

//reduce copies the key in the input to the key of the output data and outputs it directly.

Public static class Reduce extends Reducer "Text, Text, Text, Text" {

/ / Implement the reduce function

Public void reduce(Text key,Iterable"Text" values, Context context)

Throws IOException, InterruptedException{

Context.write(key, new Text(""));

}

}

Public static void main(String[] args) throws Exception{

Configuration conf = new Configuration();

//This sentence is very important

Conf.set("mapred.job.tracker", "192.168.1.2:9001");

String[] ioArgs=new String[]{"dedup_in","dedup_out"};

String[] otherArgs = new GenericOptionsParser(conf, ioArgs).getRemainingArgs();

If (otherArgs.length != 2) {

System.err.println("Usage: Data Deduplication" in "out"");

System.exit(2);

}

Job job = new Job(conf, "Data Deduplication");

job.setJarByClass(Dedup.class);

/ / Set the Map, Combine and Reduce processing classes

job.setMapperClass(Map.class);

job.setCombinerClass(Reduce.class);

job.setReducerClass(Reduce.class);

/ / Set the output type

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(Text.class);

/ / Set the input and output directory

FileInputFormat.addInputPath(job, new Path(otherArgs[0]));

FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));

System.exit(job.waitForCompletion(true) ? 0 : 1);

} }

1.4 Code Results

1) Prepare test data

Create the input file "dedup_in" folder in the "/user/hadoop" directory via "DFS Locations" under Eclipse (Note: "dedup_out" does not need to be created.) As shown in Figure 1.4-1, it has been successfully created.

Mapreduce programming example

Then create two txt files locally and upload them to the "/user/hadoop/dedup_in" folder via Eclipse. The contents of the two txt files are the same as the "example description" files. As shown in Figure 1.4-2, after successful upload.

Viewing "Master.Hadoop" from SecureCRT can also confirm the two files we uploaded.

Mapreduce programming example

View the contents of the two files as shown in Figure 1.4-3:

Mapreduce programming example

2) View the running results

At this point, we right click on the "/user/hadoop" folder in Eclipse's "DFS Locations" to refresh, then we will find a "dedup_out" folder, and there are 3 files inside, then open the double "part- The r-00000" file will display the contents in the middle of Eclipse. As shown in Figure 1.4-4.

Mapreduce programming example

At this point, you can compare the results with what we expected before.

XPON Dual Band ONU

Dual frequency band refers to the characteristic that the same communication device can work in two frequency bands. There are many applications, such as dual-band routers. A dual-band router refers to a router that supports two channels, one 2.4G channel and one 5G channel. Generally, routers are designed to be 2.4GHZ, so there will be mutual interference between signals in the same area. At this time, if you choose the 5.GHZ frequency band mode, it will effectively avoid signal interference in these frequency bands, and wireless signals will be transmitted accordingly. farther.

ONU (Optical Network Unit) optical network unit, ONU is divided into active optical network unit and passive optical network unit. Generally, devices equipped with network monitoring including optical receivers, upstream optical transmitters, and multiple bridge amplifiers are called optical nodes. The PON uses a single optical fiber to connect to the OLT, and then the OLT connects to the ONU. ONU provides services such as data, IPTV (i.e. interactive network television), and voice (using IAD, i.e. Integrated Access Device), truly realizing "triple-play" applications. The ONU has two functions: it selectively receives the broadcast sent by the OLT, and responds to the OLT if the data needs to be received; collects and buffers the Ethernet data that the user needs to send, and sends it to the OLT according to the assigned sending window The end sends the buffered data. The application of ONU can effectively improve the uplink bandwidth utilization of the entire system, and it can also configure the channel bandwidth according to the network application environment and applicable service characteristics to carry as many end users as possible without affecting the communication efficiency and communication quality, and improve the network utilization Rate and reduce user costs.

5.0G WIFI XPON ONU, 2.4G+5.0G WIFI XPON ONU, WIFI XPON ONU with 2.4G and 5.0G, XPON ONU with Dual Band, Dual Band WIFI XPON ONU

Shenzhen GL-COM Technology CO.,LTD. , https://www.szglcom.com