|
| 1 | +# Module 1: File Processing |
| 2 | + |
| 3 | +In this module you'll use Amazon Simple Storage Service (S3), AWS Lamdba, and Amazon DynamoDB to process data from JSON files. Objects created in the Amazon S3 bucket will trigger an AWS Lambda function to process the new file. The Lambda function will read the data and populate records into an Amazon DynamoDB table. |
| 4 | + |
| 5 | +## Architecture Overview |
| 6 | + |
| 7 | +<kbd></kbd> |
| 8 | + |
| 9 | +Our producer is a sensor attached to a unicorn - Shadowfax - currently taking a passenger on a Wild Ryde. This sensor aggregates sensor data every minute including the distance the unicorn traveled and maximum and minimum magic points and hit points readings in the previous minute. These readings are stored in [data files][data/shadowfax-2016-02-12.json] which are uploaded on a daily basis to Amazon S3. |
| 10 | + |
| 11 | +The Amazon S3 bucket has an [event notification][event-notifications] configured to trigger the AWS Lambda function that will retrieve the file, process it, and populate the Amazon DynamoDB table. |
| 12 | + |
| 13 | +## Implementation Instructions |
| 14 | + |
| 15 | +### 1. Create an Amazon S3 bucket |
| 16 | + |
| 17 | +Use the console or CLI to create an S3 bucket. Keep in mind, your bucket's name must be globally unique. We recommend using a name like `wildrydes-uploads-yourname`. |
| 18 | + |
| 19 | +<details> |
| 20 | +<summary><strong>Step-by-step instructions (expand for details)</strong></summary><p> |
| 21 | + |
| 22 | +1. From the AWS Console click **Services** then select **S3** under Storage. |
| 23 | + |
| 24 | +1. Click **+Create Bucket** |
| 25 | + |
| 26 | +1. Provide a globally unique name for your bucket such as `wildrydes-uploads-yourname`. |
| 27 | + |
| 28 | +1. Select a region for your bucket. |
| 29 | + |
| 30 | + <kbd></kbd> |
| 31 | + |
| 32 | +1. Use the default values and click **Next** through the rest of the sections and click **Create Bucket** on the review section. |
| 33 | + |
| 34 | +</p></details> |
| 35 | + |
| 36 | +### 2. Create an Amazon DynamoDB Table |
| 37 | + |
| 38 | +Use the Amazon DynamoDB console to create a new DynamoDB table. Call your table `UnicornSensorData` and give it a **Partition key** called `Name` of type **String** and a **Sort key** called `StatusTime` of type **Number**. Use the defaults for all other settings. |
| 39 | + |
| 40 | +After you've created the table, note the Amazon Resource Name (ARN) for use in the next section. |
| 41 | + |
| 42 | +<details> |
| 43 | +<summary><strong>Step-by-step instructions (expand for details)</strong></summary><p> |
| 44 | + |
| 45 | +1. From the AWS Management Console, choose **Services** then select **DynamoDB** under Databases. |
| 46 | + |
| 47 | +1. Choose **Create table**. |
| 48 | + |
| 49 | +1. Enter `UnicornSensorData` for the **Table name**. |
| 50 | + |
| 51 | +1. Enter `Name` for the **Partition key** and select **String** for the key type. |
| 52 | + |
| 53 | +1. Tick the **Add sort key** checkbox. Enter `StatusTime` for the **Sort key** and select **Number** for the key type. |
| 54 | + |
| 55 | +1. Check the **Use default settings** box and choose **Create**. |
| 56 | + |
| 57 | + <kbd></kbd> |
| 58 | + |
| 59 | +1. Scroll to the bottom of the Overview section of your new table and note the **ARN**. You will use this in the next section. |
| 60 | + |
| 61 | +</p></details> |
| 62 | + |
| 63 | +### 3. Create an IAM role for your Lambda function |
| 64 | + |
| 65 | +Use the IAM console to create a new role. Give it a name like `WildRydesFileProcessorRole` and select AWS Lambda for the role type. Attach the managed policy called `AWSLambdaBasicExecutionRole` to this role in order to grant permissions for your function to log to Amazon CloudWatch Logs. |
| 66 | + |
| 67 | +You'll need to grant this role permissions to access both the S3 bucket and Amazon DynamoDB table create in the previous sections: |
| 68 | + |
| 69 | +- Create an inline policy allowing the role access to the `ddb:PutItem` action for the Amazon DynamoDB table you created in the previous section. |
| 70 | + |
| 71 | +- Create an inline policy allowing the role access to the `s3:GetObject` action for the S3 bucket you created in the first section. |
| 72 | + |
| 73 | +<details> |
| 74 | +<summary><strong>Step-by-step instructions (expand for details)</strong></summary><p> |
| 75 | + |
| 76 | +1. From the AWS Console, click on **Services** and then select **IAM** in the Security, Identity & Compliance section. |
| 77 | + |
| 78 | +1. Select **Roles** from the left navigation and then click **Create new role**. |
| 79 | + |
| 80 | +1. Select **AWS Lambda** for the role type from **AWS Service Role**. |
| 81 | + |
| 82 | + **Note:** Selecting a role type automatically creates a trust policy for your role that allows AWS services to assume this role on your behalf. If you were creating this role using the CLI, AWS CloudFormation or another mechanism, you would specify a trust policy directly. |
| 83 | + |
| 84 | +1. Begin typing `AWSLambdaBasicExecutionRole` in the **Filter** text box and check the box next to that role. |
| 85 | + |
| 86 | +1. Click **Next Step**. |
| 87 | + |
| 88 | +1. Enter `WildRydesFileProcessorRole` for the **Role Name**. |
| 89 | + |
| 90 | +1. Click **Create role**. |
| 91 | + |
| 92 | +1. Type `WildRydesFileProcessorRole` into the filter box on the Roles page and click the role you just created. |
| 93 | + |
| 94 | +1. On the Permissions tab, expand the **Inline Policies** section and click the link to create a new inline policy. |
| 95 | + |
| 96 | + <kbd></kbd> |
| 97 | + |
| 98 | +1. Ensure **Policy Generator** is selected and click **Select**. |
| 99 | + |
| 100 | +1. Select **Amazon DynamoDB** from the **AWS Service** dropdown. |
| 101 | + |
| 102 | +1. Select **BatchWriteItem** from the Actions list. |
| 103 | + |
| 104 | +1. Type the ARN of the DynamoDB table you created in the previous section in the **Amazon Resource Name (ARN)** field. The ARN is in the format of: |
| 105 | + |
| 106 | + ``` |
| 107 | + arn:aws:dynamodb:REGION:ACCOUNT_ID:table/UnicornSensorData |
| 108 | + ``` |
| 109 | + |
| 110 | + For example, if you've deployed to US East (N. Virginia) and your account ID is 123456789012, your table ARN would be: |
| 111 | + |
| 112 | + ``` |
| 113 | + arn:aws:dynamodb:us-east-1:123456789012:table/UnicornSensorData |
| 114 | + ``` |
| 115 | + |
| 116 | + To find your AWS account ID number in the AWS Management Console, click on **Support** in the navigation bar in the upper-right, and then click **Support Center**. Your currently signed in account ID appears in the upper-right corner below the Support menu. |
| 117 | + |
| 118 | + <kbd></kbd> |
| 119 | + |
| 120 | +1. Click **Add Statement**. |
| 121 | + |
| 122 | + <kbd></kbd> |
| 123 | + |
| 124 | +1. Select **Amazon S3** from the **AWS Service** dropdown. |
| 125 | + |
| 126 | +1. Select **Get Object** from the Actions list. |
| 127 | + |
| 128 | +1. Type the ARN of the S3 table you created in the first section in the **Amazon Resource Name (ARN)** field. The ARN is in the format of: |
| 129 | + |
| 130 | + ``` |
| 131 | + arn:aws:s3:::YOUR_BUCKET_NAME_HERE/* |
| 132 | + ``` |
| 133 | + |
| 134 | + For example, if you've named your bucket `wildrydes-uploads-johndoe`, your bucket ARN would be: |
| 135 | + |
| 136 | + ``` |
| 137 | + arn:aws:s3:::wildrydes-uploads-johndoe/* |
| 138 | + ``` |
| 139 | + |
| 140 | + <kbd></kbd> |
| 141 | + |
| 142 | +1. Click **Add Statement**. |
| 143 | + |
| 144 | + <kbd></kbd> |
| 145 | + |
| 146 | +1. Click **Next Step** then **Apply Policy**. |
| 147 | + |
| 148 | +</p></details> |
| 149 | + |
| 150 | +### 4. Create a Lambda function for processing |
| 151 | + |
| 152 | +Use the console to create a new Lambda function called `WildRydesFileProcessor` that will be triggered whenever a new object is created in the bucket created in the first section. |
| 153 | + |
| 154 | +Use the provided [index.js](lambda/WildRydesFileProcessor/index.js) example implementation for your function code by copying and pasting the contents of that file into the Lambda console's editor. Ensure you create an environment variable with the key `TABLE_NAME` and the value `UnicornSensorData`. |
| 155 | + |
| 156 | +Make sure you configure your function to use the `WildRydesFileProcessorRole` IAM role you created in the previous section. |
| 157 | + |
| 158 | +<details> |
| 159 | +<summary><strong>Step-by-step instructions (expand for details)</strong></summary><p> |
| 160 | + |
| 161 | +1. Click on **Services** then select **Lambda** in the Compute section. |
| 162 | + |
| 163 | +1. Click **Create a Lambda function**. |
| 164 | + |
| 165 | +1. Click the **Blank Function** blueprint card. |
| 166 | + |
| 167 | +1. Click on the dotted outline and select **S3**. Select **wildrydes-data-yourname** from **Bucket**, **Object Created (All)** from **Event type**, and tick the **Enable trigger** checkbox. |
| 168 | + |
| 169 | + <kbd></kbd> |
| 170 | + |
| 171 | +1. Click **Next**. |
| 172 | + |
| 173 | +1. Enter `WildRydesFileProcessor` in the **Name** field. |
| 174 | + |
| 175 | +1. Optionally enter a description. |
| 176 | + |
| 177 | +1. Select **Node.js 6.10** for the **Runtime**. |
| 178 | + |
| 179 | +1. Copy and paste the code from [index.js](lambda/WildRydesFileProcessor/index.js) into the code entry area. |
| 180 | + |
| 181 | + <kbd></kbd> |
| 182 | + |
| 183 | +1. In **Environment variables**, enter an environment variable with key `TABLE_NAME` and value `UnicornSensorData`. |
| 184 | + |
| 185 | + <kbd></kbd> |
| 186 | + |
| 187 | +1. Leave the default of `index.handler` for the **Handler** field. |
| 188 | + |
| 189 | +1. Select `WildRydesFileProcessorRole` from the **Existing Role** dropdown. |
| 190 | + |
| 191 | + <kbd></kbd> |
| 192 | + |
| 193 | +1. Expand **Advanced settings** and set **Timeout** to **5** minutes to accommodate large files. |
| 194 | + |
| 195 | +1. Click **Next** and then click **Create function** on the Review page. |
| 196 | + |
| 197 | + <kbd></kbd> |
| 198 | + |
| 199 | +</p></details> |
| 200 | + |
| 201 | +## Implementation Validation |
| 202 | + |
| 203 | +1. Using either the AWS Management Console or AWS Command Line Interface, copy the provided [data/shadowfax-2016-02-12.json][data/shadowfax-2016-02-12.json] data file to the Amazon S3 bucket created in the first section. |
| 204 | + |
| 205 | + You can either download this file via your web browser and upload it using the AWS Management Console, or you use the AWS CLI to copy it directly: |
| 206 | + |
| 207 | + ```console |
| 208 | + aws s3 cp s3://wildrydes-data-processing/data/shadowfax-2016-02-12.json s3://YOUR_BUCKET_NAME_HERE |
| 209 | + ``` |
| 210 | + |
| 211 | +1. Click on **Services** then select **DynamoDB** in the Database section. |
| 212 | + |
| 213 | +1. Click on **UnicornSensorData**. |
| 214 | + |
| 215 | +1. Click on the **Items** tab and verify that the table has been populated with the items from the data file. |
| 216 | + |
| 217 | + <kbd></kbd> |
| 218 | + |
| 219 | +When you see items from the JSON file in the table, you can move onto the next module: [Real-time Data Streaming][data-streaming-module]. |
| 220 | + |
| 221 | +## Extra Credit |
| 222 | + |
| 223 | +- Enhance the implementation to gracefully handle lines with malformed JSON. Edit the file to include a malformed line and verify the function is able to process the file. Consider how you would handle unprocessable lines in a production implementation. |
| 224 | +- Inspect the Amazon CloudWatch Logs stream associated with the Lambda function and note the duration the function executes. Change the provisioned write throughput of the DynamoDB table and copy the file to the bucket once again as a new object. Check the logs once more and note the lower duration. |
| 225 | + |
| 226 | +[event-notifications]: http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html |
| 227 | +[data/shadowfax-2016-02-12.json]: https://s3.amazonaws.com/wildrydes-data-processing/data/shadowfax-2016-02-12.json |
| 228 | +[data-streaming-module]: ../2_DataStreaming/README.md |
0 commit comments