You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2. create an index mapping where the tracking field is of date nanosecond type and invokes the defined pipeline:
94
+
95
+
[source, json]
96
+
PUT /_template/my_template
97
+
{
98
+
"index_patterns": ["test-*"],
99
+
"settings": {
100
+
"index.default_pipeline": "my-pipeline",
101
+
},
102
+
"mappings": {
103
+
"properties": {
104
+
"event": {
105
+
"properties": {
106
+
"ingested": {
107
+
"type": "date_nanos",
108
+
"format": "strict_date_optional_time_nanos"
109
+
}
110
+
}
111
+
}
112
+
}
113
+
}
114
+
}
115
+
116
+
3. define a query that looks at all data of the indices, sorted by the tracking field, and with a range filter since the last value seen until present:
117
+
118
+
[source,json]
119
+
{
120
+
"query": {
121
+
"range": {
122
+
"event.ingested": {
123
+
"gt": ":last_value",
124
+
"lt": ":present"
125
+
}
126
+
}
127
+
},
128
+
"sort": [
129
+
{
130
+
"event.ingested": {
131
+
"order": "asc",
132
+
"format": "strict_date_optional_time_nanos",
133
+
"numeric_type": "date_nanos"
134
+
}
135
+
}
136
+
]
137
+
}
138
+
139
+
4. configure the Elasticsearch input to query the indices with the query defined above, every minute, and track the `event.ingested` field:
slices => 5 # optional use of slices to speed data processing, should be less than number of primary shards
153
+
schedule => '* * * * *' # every minute
154
+
schedule_overlap => false # don't accumulate jobs if one takes longer than 1 minute
155
+
}
156
+
}
157
+
158
+
With this setup, as new documents are indexed an `test-*` index, the next scheduled run will:
159
+
160
+
1. select all new documents since the last observed value of the tracking field;
161
+
2. use PIT+search_after to paginate through all the data;
162
+
3. update the value of the field at the end of the pagination.
163
+
164
+
[id="plugins-{type}s-{plugin}-scheduling"]
52
165
==== Scheduling
53
166
54
167
Input from this plugin can be scheduled to run periodically according to a specific
@@ -659,6 +772,8 @@ The value of this field is injected into each query if the query uses the placeh
659
772
For the first query after a pipeline is started, the value used is either read from <<last_run_metadata_path>> file,
660
773
or taken from <<tracking_field_seed>> setting.
661
774
775
+
Note: The tracking value is updated only after the PIT+search_after run completes, it won't update during the search_after pagination. This is to allow use of slices.
0 commit comments