Skip to content
This repository was archived by the owner on Jan 14, 2025. It is now read-only.

Commit a10e337

Browse files
authored
Data Processing Workshop (#16)
Adds five modules for participants to learn about processing data with serverless architectures. - Update gulpfile lint task to ignore vendored files - Make lint task the default gulp task - Add eslint dependencies to package.json - Fix eslint warnings in WebApplication/gulpfile.js
1 parent 154f0d7 commit a10e337

File tree

69 files changed

+3055
-13
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

69 files changed

+3055
-13
lines changed

.eslintignore

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
WebApplication/1_StaticWebHosting/website/js/main.js
+228
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,228 @@
1+
# Module 1: File Processing
2+
3+
In this module you'll use Amazon Simple Storage Service (S3), AWS Lamdba, and Amazon DynamoDB to process data from JSON files. Objects created in the Amazon S3 bucket will trigger an AWS Lambda function to process the new file. The Lambda function will read the data and populate records into an Amazon DynamoDB table.
4+
5+
## Architecture Overview
6+
7+
<kbd>![Architecture](../images/file-processing-architecture.png)</kbd>
8+
9+
Our producer is a sensor attached to a unicorn - Shadowfax - currently taking a passenger on a Wild Ryde. This sensor aggregates sensor data every minute including the distance the unicorn traveled and maximum and minimum magic points and hit points readings in the previous minute. These readings are stored in [data files][data/shadowfax-2016-02-12.json] which are uploaded on a daily basis to Amazon S3.
10+
11+
The Amazon S3 bucket has an [event notification][event-notifications] configured to trigger the AWS Lambda function that will retrieve the file, process it, and populate the Amazon DynamoDB table.
12+
13+
## Implementation Instructions
14+
15+
### 1. Create an Amazon S3 bucket
16+
17+
Use the console or CLI to create an S3 bucket. Keep in mind, your bucket's name must be globally unique. We recommend using a name like `wildrydes-uploads-yourname`.
18+
19+
<details>
20+
<summary><strong>Step-by-step instructions (expand for details)</strong></summary><p>
21+
22+
1. From the AWS Console click **Services** then select **S3** under Storage.
23+
24+
1. Click **+Create Bucket**
25+
26+
1. Provide a globally unique name for your bucket such as `wildrydes-uploads-yourname`.
27+
28+
1. Select a region for your bucket.
29+
30+
<kbd>![Create bucket screenshot](../images/file-processing-s3-bucket.png)</kbd>
31+
32+
1. Use the default values and click **Next** through the rest of the sections and click **Create Bucket** on the review section.
33+
34+
</p></details>
35+
36+
### 2. Create an Amazon DynamoDB Table
37+
38+
Use the Amazon DynamoDB console to create a new DynamoDB table. Call your table `UnicornSensorData` and give it a **Partition key** called `Name` of type **String** and a **Sort key** called `StatusTime` of type **Number**. Use the defaults for all other settings.
39+
40+
After you've created the table, note the Amazon Resource Name (ARN) for use in the next section.
41+
42+
<details>
43+
<summary><strong>Step-by-step instructions (expand for details)</strong></summary><p>
44+
45+
1. From the AWS Management Console, choose **Services** then select **DynamoDB** under Databases.
46+
47+
1. Choose **Create table**.
48+
49+
1. Enter `UnicornSensorData` for the **Table name**.
50+
51+
1. Enter `Name` for the **Partition key** and select **String** for the key type.
52+
53+
1. Tick the **Add sort key** checkbox. Enter `StatusTime` for the **Sort key** and select **Number** for the key type.
54+
55+
1. Check the **Use default settings** box and choose **Create**.
56+
57+
<kbd>![Create table screenshot](../images/file-processing-dynamodb-create.png)</kbd>
58+
59+
1. Scroll to the bottom of the Overview section of your new table and note the **ARN**. You will use this in the next section.
60+
61+
</p></details>
62+
63+
### 3. Create an IAM role for your Lambda function
64+
65+
Use the IAM console to create a new role. Give it a name like `WildRydesFileProcessorRole` and select AWS Lambda for the role type. Attach the managed policy called `AWSLambdaBasicExecutionRole` to this role in order to grant permissions for your function to log to Amazon CloudWatch Logs.
66+
67+
You'll need to grant this role permissions to access both the S3 bucket and Amazon DynamoDB table create in the previous sections:
68+
69+
- Create an inline policy allowing the role access to the `ddb:PutItem` action for the Amazon DynamoDB table you created in the previous section.
70+
71+
- Create an inline policy allowing the role access to the `s3:GetObject` action for the S3 bucket you created in the first section.
72+
73+
<details>
74+
<summary><strong>Step-by-step instructions (expand for details)</strong></summary><p>
75+
76+
1. From the AWS Console, click on **Services** and then select **IAM** in the Security, Identity & Compliance section.
77+
78+
1. Select **Roles** from the left navigation and then click **Create new role**.
79+
80+
1. Select **AWS Lambda** for the role type from **AWS Service Role**.
81+
82+
**Note:** Selecting a role type automatically creates a trust policy for your role that allows AWS services to assume this role on your behalf. If you were creating this role using the CLI, AWS CloudFormation or another mechanism, you would specify a trust policy directly.
83+
84+
1. Begin typing `AWSLambdaBasicExecutionRole` in the **Filter** text box and check the box next to that role.
85+
86+
1. Click **Next Step**.
87+
88+
1. Enter `WildRydesFileProcessorRole` for the **Role Name**.
89+
90+
1. Click **Create role**.
91+
92+
1. Type `WildRydesFileProcessorRole` into the filter box on the Roles page and click the role you just created.
93+
94+
1. On the Permissions tab, expand the **Inline Policies** section and click the link to create a new inline policy.
95+
96+
<kbd>![Inline policies screenshot](../images/file-processing-policies.png)</kbd>
97+
98+
1. Ensure **Policy Generator** is selected and click **Select**.
99+
100+
1. Select **Amazon DynamoDB** from the **AWS Service** dropdown.
101+
102+
1. Select **BatchWriteItem** from the Actions list.
103+
104+
1. Type the ARN of the DynamoDB table you created in the previous section in the **Amazon Resource Name (ARN)** field. The ARN is in the format of:
105+
106+
```
107+
arn:aws:dynamodb:REGION:ACCOUNT_ID:table/UnicornSensorData
108+
```
109+
110+
For example, if you've deployed to US East (N. Virginia) and your account ID is 123456789012, your table ARN would be:
111+
112+
```
113+
arn:aws:dynamodb:us-east-1:123456789012:table/UnicornSensorData
114+
```
115+
116+
To find your AWS account ID number in the AWS Management Console, click on **Support** in the navigation bar in the upper-right, and then click **Support Center**. Your currently signed in account ID appears in the upper-right corner below the Support menu.
117+
118+
<kbd>![Policy generator screenshot](../images/file-processing-policy-generator.png)</kbd>
119+
120+
1. Click **Add Statement**.
121+
122+
<kbd>![Policy screenshot](../images/file-processing-policy-result.png)</kbd>
123+
124+
1. Select **Amazon S3** from the **AWS Service** dropdown.
125+
126+
1. Select **Get Object** from the Actions list.
127+
128+
1. Type the ARN of the S3 table you created in the first section in the **Amazon Resource Name (ARN)** field. The ARN is in the format of:
129+
130+
```
131+
arn:aws:s3:::YOUR_BUCKET_NAME_HERE/*
132+
```
133+
134+
For example, if you've named your bucket `wildrydes-uploads-johndoe`, your bucket ARN would be:
135+
136+
```
137+
arn:aws:s3:::wildrydes-uploads-johndoe/*
138+
```
139+
140+
<kbd>![Policy generator screenshot](../images/file-processing-policy-generator-s3.png)</kbd>
141+
142+
1. Click **Add Statement**.
143+
144+
<kbd>![Policy screenshot](../images/file-processing-policy-result-full.png)</kbd>
145+
146+
1. Click **Next Step** then **Apply Policy**.
147+
148+
</p></details>
149+
150+
### 4. Create a Lambda function for processing
151+
152+
Use the console to create a new Lambda function called `WildRydesFileProcessor` that will be triggered whenever a new object is created in the bucket created in the first section.
153+
154+
Use the provided [index.js](lambda/WildRydesFileProcessor/index.js) example implementation for your function code by copying and pasting the contents of that file into the Lambda console's editor. Ensure you create an environment variable with the key `TABLE_NAME` and the value `UnicornSensorData`.
155+
156+
Make sure you configure your function to use the `WildRydesFileProcessorRole` IAM role you created in the previous section.
157+
158+
<details>
159+
<summary><strong>Step-by-step instructions (expand for details)</strong></summary><p>
160+
161+
1. Click on **Services** then select **Lambda** in the Compute section.
162+
163+
1. Click **Create a Lambda function**.
164+
165+
1. Click the **Blank Function** blueprint card.
166+
167+
1. Click on the dotted outline and select **S3**. Select **wildrydes-data-yourname** from **Bucket**, **Object Created (All)** from **Event type**, and tick the **Enable trigger** checkbox.
168+
169+
<kbd>![Create Lambda trigger screenshot](../images/file-processing-configure-trigger.png)</kbd>
170+
171+
1. Click **Next**.
172+
173+
1. Enter `WildRydesFileProcessor` in the **Name** field.
174+
175+
1. Optionally enter a description.
176+
177+
1. Select **Node.js 6.10** for the **Runtime**.
178+
179+
1. Copy and paste the code from [index.js](lambda/WildRydesFileProcessor/index.js) into the code entry area.
180+
181+
<kbd>![Create Lambda function screenshot](../images/file-processing-lambda-create.png)</kbd>
182+
183+
1. In **Environment variables**, enter an environment variable with key `TABLE_NAME` and value `UnicornSensorData`.
184+
185+
<kbd>![Lambda environment variable screenshot](../images/file-processing-lambda-env-var.png)</kbd>
186+
187+
1. Leave the default of `index.handler` for the **Handler** field.
188+
189+
1. Select `WildRydesFileProcessorRole` from the **Existing Role** dropdown.
190+
191+
<kbd>![Define handler and role screenshot](../images/file-processing-lambda-role.png)</kbd>
192+
193+
1. Expand **Advanced settings** and set **Timeout** to **5** minutes to accommodate large files.
194+
195+
1. Click **Next** and then click **Create function** on the Review page.
196+
197+
<kbd>![Lambda trigger status screenshot](../images/file-processing-trigger-status.png)</kbd>
198+
199+
</p></details>
200+
201+
## Implementation Validation
202+
203+
1. Using either the AWS Management Console or AWS Command Line Interface, copy the provided [data/shadowfax-2016-02-12.json][data/shadowfax-2016-02-12.json] data file to the Amazon S3 bucket created in the first section.
204+
205+
You can either download this file via your web browser and upload it using the AWS Management Console, or you use the AWS CLI to copy it directly:
206+
207+
```console
208+
aws s3 cp s3://wildrydes-data-processing/data/shadowfax-2016-02-12.json s3://YOUR_BUCKET_NAME_HERE
209+
```
210+
211+
1. Click on **Services** then select **DynamoDB** in the Database section.
212+
213+
1. Click on **UnicornSensorData**.
214+
215+
1. Click on the **Items** tab and verify that the table has been populated with the items from the data file.
216+
217+
<kbd>![DynamoDB items screenshot](../images/file-processing-dynamodb-items.png)</kbd>
218+
219+
When you see items from the JSON file in the table, you can move onto the next module: [Real-time Data Streaming][data-streaming-module].
220+
221+
## Extra Credit
222+
223+
- Enhance the implementation to gracefully handle lines with malformed JSON. Edit the file to include a malformed line and verify the function is able to process the file. Consider how you would handle unprocessable lines in a production implementation.
224+
- Inspect the Amazon CloudWatch Logs stream associated with the Lambda function and note the duration the function executes. Change the provisioned write throughput of the DynamoDB table and copy the file to the bucket once again as a new object. Check the logs once more and note the lower duration.
225+
226+
[event-notifications]: http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
227+
[data/shadowfax-2016-02-12.json]: https://s3.amazonaws.com/wildrydes-data-processing/data/shadowfax-2016-02-12.json
228+
[data-streaming-module]: ../2_DataStreaming/README.md

0 commit comments

Comments
 (0)