COMP5434 (Fall 2019) Big Data Computing

 

大数据计算作业代写 A sample input file is given below. Each line corresponds to a point-of-interest (POI), which contains a keyword, coordinate values x and y

 

Individual Assignment 2       Due Date: 10:00am, 2nd December, 2019

Please submit your assignment in Blackboard
and follow our requirements in Section 2.

1. Problem statement 大数据计算作业代写

A sample input file is given below. Each line corresponds to a point-of-interest (POI), which contains a keyword, coordinate values x and y (separated by white space).大数据计算作业代写

park 3 5

lake 2 3

mall 1 4 大数据计算作业代写

park 2 4

lake 9 8

mall 2 7

We measure the distance between two points p1=(x1,y1) and p2=(x2,y2) by:

_________________

dist(p1, p2) = Ö(x1 – x2)2 + (y1 – y2)2

Each keyword k is associated with a group G(k) of points.

[Example] The group of “park” contains two points: (3,5) and (2,4).

There are 2 questions in this programming assignment.
You should write a MapReduce program to solve each of them.大数据计算作业代写

Question Q1: Find the centroid (i.e., the mean position of points) of each group.

[Example]

Input: the sample input above

Output:

lake  5.5  5.5

mall  1.5  5.5

park  2.5  4.5

大数据计算作业代写

Question Q2: Find the diameter (i.e., the maximum distance between any two points inside a group) of each group.

[Example]

Input: the sample input above

Output:

lake  8.602 大数据计算作业代写

mall  3.162

park  1.414

2. Requirements 大数据计算作业代写

  1. Though MapReduce support multiple languages, in this assignment, you should use Java (Java 8) for implementation.
  2. You submission should be organized as follows

<YourStudentID> // your folder name, [Example] 19001234g

— Q1.java              // source file for question 1

— Q1.jar                // jar file for question 1, compiled and archived from Q1.java 大数据计算作业代写

— Q2.java              // source file for question 2

— Q2.jar                // jar file for question 2, compiled and archived from Q2.java

  1. Archive the above structure as <YourStudentID>.zip and submit this .zip file in blackboard. [Example]zip
  2. Make sure that you can compile your source file and run with the latest Hadoop version’s (i.e., Hadoop 3.2.1) pseudo-distributed mode.大数据计算作业代写
  3. Your jar file should be directly runnable on Linux platform with the following call:

bin/hadoop jar Q1.jar Q1 <input path> <output path>

bin/hadoop jar Q2.jar Q2 <input path> <output path>

  1. Your output result should preserve double precision.
  2. You should only use one MapReduce round to solve each sub-question.
  3. [Hint] You may use the Ubuntu image we provided for this assignment.

-Google drive:

https://drive.google.com/file/d/1lMqmTAj2sC2gVqkVWW-MDUR24vv-a3Si/view?usp=sharing

-The Y drive in COMP Lab: Y:\Subject\COMP5434
       Note: These files will get expired on November 7!

3. Grading criteria 大数据计算作业代写

20 marks will be given if your program can be compiled.

-for each .java file, 10 marks

80 marks will be given if your program is correct. We will test the correctness of your program by using 8 test cases (4 for each sub-question). 大数据计算作业代写

-For each test case, 10 marks

Notice this is an individual assignment. Plagiarism will result in 0 mark!

大数据计算作业代写

联系客服提交作业获取报价与时间?

最快2~12小时即可完成,用技术和耐心帮助客户高效高质量完成作业.