Skip to content

Commit db9cbcb

Browse files
Merge branch 'main' into main
2 parents 4c762e3 + a64439e commit db9cbcb

File tree

81 files changed

+3025
-539
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

81 files changed

+3025
-539
lines changed

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
<!--
2+
*Thank you for contributing to the Amazon Managed Service for Apache Flink examples - we are happy that you want to help us improve our examples. Please go through the checklist below, which will get the contribution into a shape in which it can be best reviewed.*
3+
4+
## Contribution Checklist
5+
6+
- Verify your PR follows the contribution recommendation of the repository
7+
- Each PR should add/modify a single example and address a single concern
8+
- Once all items of the checklist are addressed, remove the above text and this checklist, leaving only the filled out template below.
9+
-->
10+
11+
## Purpose of the change
12+
13+
*For example: modify the Java Kinesis Sink to provide the stream ARN*
14+
15+
## Verifying this change
16+
17+
Please test your changes both running locally, in the IDE, and in Managed Service for Apache Flink. All examples must run
18+
in both environment without code changes.
19+
20+
Describe how you tested your application, show the output of the running application with screenshots.
21+
22+
## Significant changes
23+
24+
*(Please check any boxes [x] if the answer is "yes". You can first publish the PR and check them afterward, for convenience.)*
25+
26+
- [ ] Completely new example
27+
- [ ] Updated an existing example to newer Flink version or dependencies versions
28+
- [ ] Improved an existing example
29+
- [ ] Modified the runtime configuration of an existing example (i.e. added/removed/modified any runtime properties)
30+
- [ ] Modified the expected input or output of an existing example (e.g. modified the source or sink, modified the record schema)

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,4 +12,4 @@ env/
1212
venv/
1313
.java-version
1414
/pyflink/
15-
*-dev.json*
15+

CONTRIBUTING.md

Lines changed: 73 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,70 @@ documentation, we greatly value feedback and contributions from our community.
66
Please read through this document before submitting any issues or pull requests to ensure we have all the necessary
77
information to effectively respond to your bug report or contribution.
88

9+
## Goal of this repository
10+
11+
The goal of this repository is to provide working example of common patterns for Apache Flink in general, and, in
12+
particular, for [Amazon Managed Service for Apache Flink](https://aws.amazon.com/managed-service-apache-flink).
13+
Each example illustrates a single, specific pattern, trying to make it simple removing external dependencies when possible.
14+
15+
The examples also try to illustrate best practices and consistent approaches to common problems.
16+
17+
The goal of this repository is not to illustrate *solutions*, end-to-end architectures.
18+
19+
The AWS team managing the repository reserves the right to modify or reject new example proposals.
20+
21+
### Guidelines or new examples
22+
23+
* An example should focus on a single Flink API, unless the goal is to show the integration between different APIs.
24+
* Unless the goal of the example is showing a particular sink, the application should sink data to Kinesis Data Stream,
25+
as JSON. This makes the output more easily verifiable, simplify the setup when running locally, and testing the application.
26+
* Examples should all work both locally, in the IDE (e.g. IntelliJ, PyCharm), and on Managed Service for Apache Flink,
27+
without any code changes.
28+
29+
#### Runtime configuration
30+
31+
* follow the pattern used in
32+
[this example](https://github.com/aws-samples/amazon-managed-service-for-apache-flink-examples/tree/main/java/GettingStarted)
33+
to provide the configuration when running locally.
34+
* Use separate `PropertyGroupID` for each component, for example for each source and sink. Name them groups and
35+
properties consistently. See at
36+
[this configuration](https://github.com/aws-samples/amazon-managed-service-for-apache-flink-examples/blob/main/java/Windowing/src/main/resources/flink-application-properties-dev.json)
37+
as an example.
38+
* The committed `flink-application-properties-dev.json` file MUST NOT contain any real resource ARN, Account ID, URL,
39+
or any secrets. Use placeholders instead (e.g. `arn:aws:kinesis:<region>:<accountId>:stream/OutputStream`) or obviously
40+
fake resource names.
41+
42+
#### External resources
43+
44+
* Minimize the external resources.
45+
* If the application requires external resources, provide instructions in the README.
46+
* Providing an additional CloudFormation template is optional and does not exempt from human-readable instructions in
47+
the README.
48+
49+
#### External data generators
50+
51+
* Avoid depending on external data generators as possible. Use the [DataGen connector](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/connectors/datastream/datagen/)
52+
to generate synthetic data.
53+
* If the example requires an external data generator, try to use [this one](https://github.com/aws-samples/amazon-managed-service-for-apache-flink-examples/tree/main/python/data-generator),
54+
or include it in the example.
55+
56+
#### README and documentation
57+
58+
* Write an exhaustive README explaining what is the goal of the example, how the application works, Flink and connector
59+
versions, external dependencies, permissions, and runtime configuration. Use [this example](https://github.com/aws-samples/amazon-managed-service-for-apache-flink-examples/tree/main/java/KafkaConfigProviders/Kafka-SASL_SSL-ConfigProviders)
60+
as a reference.
61+
* Make sure the example works with what explained in the README, and without any implicit dependency or configuration.
62+
63+
#### AWS authentication and credentials
64+
65+
* AWS credentials must never be explicitly passed to the application.
66+
* Any permissions must be provided from the IAM Role assigned to the Managed Apache Flink application. When running locally, leverage the IDE AWS plugins.
67+
68+
#### Dependencies in PyFlink examples
69+
* Use the pattern illustrated by [this example](https://github.com/aws-samples/amazon-managed-service-for-apache-flink-examples/tree/main/python/GettingStarted)
70+
to provide JAR dependencies and build the ZIP using Maven.
71+
* If the application also requires Python dependencies, use the pattern illustrated by [this example](https://github.com/aws-samples/amazon-managed-service-for-apache-flink-examples/tree/main/python/PythonDependencies)
72+
leveraging `requirements.txt`.
973

1074
## Reporting Bugs/Feature Requests
1175

@@ -21,6 +85,7 @@ reported the issue. Please try to include as much information as you can. Detail
2185

2286

2387
## Contributing via Pull Requests
88+
2489
Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that:
2590

2691
1. You are working against the latest source on the *main* branch.
@@ -31,20 +96,18 @@ To send us a pull request, please:
3196

3297
1. Fork the repository.
3398
2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change.
34-
3. Ensure local tests pass.
35-
4. Commit to your fork using clear commit messages.
36-
5. Send us a pull request, answering any default questions in the pull request interface.
37-
6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation.
38-
39-
GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and
40-
[creating a pull request](https://help.github.com/articles/creating-a-pull-request/).
99+
3. Update the README to reflect any changes in the code.
100+
4. Ensure any local tests pass.
101+
5. **Manually test** your changes running the application locally, in the IDE, AND in Managed Service for Apache Flink. The application must run in both without code changes.
102+
6. Commit to your fork using clear commit messages. Ensure you do not commit any private configuration. In particular, check the local configuration JSON file.
103+
7. Send us a pull request, describing your changes.
104+
8. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation.
41105

42-
43-
## Finding contributions to work on
44-
Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start.
106+
Refer to GitHub documentation about [forking a repository](https://help.github.com/articles/fork-a-repo/) and [creating a pull request](https://help.github.com/articles/creating-a-pull-request/), for further details.
45107

46108

47109
## Code of Conduct
110+
48111
This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct).
49112
For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact
50113
opensource-codeofconduct@amazon.com with any additional questions or comments.

java/AsyncIO/pom.xml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
<maven.compiler.source>${target.java.version}</maven.compiler.source>
1818
<maven.compiler.target>${target.java.version}</maven.compiler.target>
1919
<flink.version>1.20.0</flink.version>
20-
<flink.connector.version>4.3.0-1.19</flink.connector.version>
20+
<flink.connector.version>5.0.0-1.20</flink.connector.version>
2121
<kda.runtime.version>1.2.0</kda.runtime.version>
2222
<log4j.version>2.23.1</log4j.version>
2323
<jackson.version>2.16.2</jackson.version>
@@ -67,7 +67,7 @@
6767
</dependency>
6868
<dependency>
6969
<groupId>org.apache.flink</groupId>
70-
<artifactId>flink-connector-kinesis</artifactId>
70+
<artifactId>flink-connector-aws-kinesis-streams</artifactId>
7171
<version>${flink.connector.version}</version>
7272
</dependency>
7373
<dependency>

java/AsyncIO/src/main/java/com/amazonaws/services/msf/ProcessingFunction.java

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
11
package com.amazonaws.services.msf;
22

3-
import com.google.common.base.Preconditions;
43
import org.apache.flink.configuration.Configuration;
54
import org.apache.flink.streaming.api.functions.async.ResultFuture;
65
import org.apache.flink.streaming.api.functions.async.RichAsyncFunction;
6+
import org.apache.flink.util.Preconditions;
7+
78
import org.asynchttpclient.AsyncHttpClient;
89
import org.asynchttpclient.DefaultAsyncHttpClientConfig;
910
import org.asynchttpclient.Dsl;

java/AsyncIO/src/main/java/com/amazonaws/services/msf/RetriesFlinkJob.java

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
package com.amazonaws.services.msf;
22

33
import com.amazonaws.services.kinesisanalytics.runtime.KinesisAnalyticsRuntime;
4-
import com.google.common.base.Preconditions;
54
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
65
import org.apache.flink.api.common.serialization.SimpleStringSchema;
76
import org.apache.flink.api.common.typeinfo.TypeInformation;
@@ -17,6 +16,8 @@
1716
import org.apache.flink.streaming.api.functions.async.AsyncRetryStrategy;
1817
import org.apache.flink.streaming.util.retryable.AsyncRetryStrategies;
1918
import org.apache.flink.streaming.util.retryable.RetryPredicates;
19+
import org.apache.flink.util.Preconditions;
20+
2021
import org.apache.logging.log4j.LogManager;
2122
import org.apache.logging.log4j.Logger;
2223

java/AvroGlueSchemaRegistryKafka/pom.xml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@
5353
<scope>provided</scope>
5454
</dependency>
5555

56-
<!-- Amazon Managed Service for Apache Flink (formerly Kinesis Analytics) runtime-->
56+
<!-- Library to retrieve runtime application properties in Managed Service for Apache Flink -->
5757
<dependency>
5858
<groupId>com.amazonaws</groupId>
5959
<artifactId>aws-kinesisanalytics-runtime</artifactId>
@@ -66,6 +66,7 @@
6666
<groupId>org.apache.flink</groupId>
6767
<artifactId>flink-connector-base</artifactId>
6868
<version>${flink.version}</version>
69+
<scope>provided</scope>
6970
</dependency>
7071
<dependency>
7172
<groupId>org.apache.flink</groupId>

java/AvroGlueSchemaRegistryKinesis/pom.xml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,13 +52,14 @@
5252
<scope>provided</scope>
5353
</dependency>
5454

55-
<!-- Kinesis Data Analytics -->
55+
<!-- Library to retrieve runtime application properties in Managed Service for Apache Flink -->
5656
<dependency>
5757
<groupId>com.amazonaws</groupId>
5858
<artifactId>aws-kinesisanalytics-runtime</artifactId>
5959
<version>${kda.runtime.version}</version>
6060
<scope>provided</scope>
6161
</dependency>
62+
6263
<dependency>
6364
<groupId>com.amazonaws</groupId>
6465
<artifactId>aws-kinesisanalytics-flink</artifactId>
@@ -70,6 +71,7 @@
7071
<groupId>org.apache.flink</groupId>
7172
<artifactId>flink-connector-base</artifactId>
7273
<version>${flink.version}</version>
74+
<scope>provided</scope>
7375
</dependency>
7476
<dependency>
7577
<groupId>org.apache.flink</groupId>

java/CustomMetrics/pom.xml

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
<maven.compiler.source>${target.java.version}</maven.compiler.source>
1818
<maven.compiler.target>${target.java.version}</maven.compiler.target>
1919
<flink.version>1.20.0</flink.version>
20-
<flink.connector.version>4.3.0-1.19</flink.connector.version>
20+
<flink.connector.version>5.0.0-1.20</flink.connector.version>
2121
<kda.runtime.version>1.2.0</kda.runtime.version>
2222
<log4j.version>2.23.1</log4j.version>
2323
<jackson.version>2.16.2</jackson.version>
@@ -37,11 +37,12 @@
3737
</dependencyManagement>
3838

3939
<dependencies>
40-
<!-- Amazon Managed Service for Apache Flink (formerly Amazon Kinesis Data Analytics) runtime-->
40+
<!-- Library to retrieve runtime application properties in Managed Service for Apache Flink -->
4141
<dependency>
4242
<groupId>com.amazonaws</groupId>
4343
<artifactId>aws-kinesisanalytics-runtime</artifactId>
4444
<version>${kda.runtime.version}</version>
45+
<scope>provided</scope>
4546
</dependency>
4647

4748
<!-- Apache Flink dependencies -->
@@ -70,10 +71,11 @@
7071
<groupId>org.apache.flink</groupId>
7172
<artifactId>flink-connector-base</artifactId>
7273
<version>${flink.version}</version>
74+
<scope>provided</scope>
7375
</dependency>
7476
<dependency>
7577
<groupId>org.apache.flink</groupId>
76-
<artifactId>flink-connector-kinesis</artifactId>
78+
<artifactId>flink-connector-aws-kinesis-streams</artifactId>
7779
<version>${flink.connector.version}</version>
7880
</dependency>
7981
<dependency>

java/DynamoDBStreamSource/README.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# DynamoDB Streams Source example
2+
3+
* Flink version: 1.20
4+
* Flink API: DataStream API
5+
* Language: Java (11)
6+
* Flink connectors: DynamoDb Streams Source
7+
8+
9+
This example demonstrate how to use Flink DynamoDB Streams source.
10+
11+
This example uses the `DynamoDbStreamsSource` provided in Apache Flink's connector ecosystem.
12+
13+
### Pre-requisite set up
14+
15+
To run this example, the following resources needs to be created:
16+
1. A DynamoDB table - the example uses a table schema documented using `@DynamoDbBean`. See `DdbTableItem`.
17+
2. Set up DynamoDB Stream against the created table. See [DynamoDB Streams documentation](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html).
18+
3. Add items to the DynamoDB table using the schema created via console.
19+
20+
21+
### Runtime configuration
22+
23+
The application reads the runtime configuration from the Runtime Properties, when running on Amazon Managed Service for Apache Flink,
24+
or, when running locally, from the [`resources/flink-application-properties-dev.json`](resources/flink-application-properties-dev.json) file located in the resources folder.
25+
26+
All parameters are case-sensitive.
27+
28+
| Group ID | Key | Description |
29+
|-----------------|---------------|---------------------------|
30+
| `InputStream0` | `stream.arn` | ARN of the input stream. |
31+
32+
Every parameter in the `InputStream0` group is passed to the DynamoDB Streams consumer, for example `flink.stream.initpos`.
33+
34+
See Flink DynamoDB connector docs](https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/dynamodb/) for details about configuring the DynamoDB connector.
35+
36+
To configure the application on Managed Service for Apache Flink, set up these parameter in the *Runtime properties*.
37+
38+
To configure the application for running locally, edit the [json file](resources/flink-application-properties-dev.json).
39+
40+
### Running in IntelliJ
41+
42+
You can run this example directly in IntelliJ, without any local Flink cluster or local Flink installation.
43+
44+
See [Running examples locally](../running-examples-locally.md) for details.
45+

0 commit comments

Comments
 (0)