feat(cubesql): Add `XIRR` aggregate function #9508

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

MazterQyou merged 1 commit into master from cubesql/xirr-udaf

Apr 24, 2025

Member

MazterQyou commented Apr 23, 2025

Check List

Tests have been run in packages where changes made if available
Linter has been run for changed code
Tests for the changes have been added if not covered yet
Docs have been added / updated if required

Description of Changes Made

This PR adds support for XIRR aggregate function. Related test is included.


          feat(cubesql): Add XIRR aggregate function

bb5876d

Signed-off-by: Alex Qyoun-ae <4062971+MazterQyou@users.noreply.github.com>

MazterQyou requested a review from a team as a code owner

April 23, 2025 17:24

codecov bot commented Apr 23, 2025 •

edited

Loading

Codecov Report

Attention: Patch coverage is 75.78125% with 62 lines in your changes missing coverage. Please review.

Project coverage is 80.52%. Comparing base (1d15182) to head (bb5876d).
Report is 6 commits behind head on master.

Files with missing lines	Patch %	Lines
...l/cubesql/src/compile/engine/udf/extension/xirr.rs	72.56%	62 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #9508      +/-   ##
==========================================
- Coverage   83.89%   80.52%   -3.38%     
==========================================
  Files         229      383     +154     
  Lines       83569    96892   +13323     
  Branches        0     2223    +2223     
==========================================
+ Hits        70111    78018    +7907     
- Misses      13458    18559    +5101     
- Partials        0      315     +315

Flag	Coverage Δ
cube-backend	`59.02% <ø> (?)`
cubesql	`83.87% <75.78%> (-0.03%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

KSDaemon approved these changes

View reviewed changes

Member

KSDaemon left a comment

👍🏻 LGTM!

ovr requested review from srh and mcheshkov

April 24, 2025 11:30

MazterQyou merged commit c7fb71b into master

74 checks passed

MazterQyou deleted the cubesql/xirr-udaf branch

April 24, 2025 14:01

mcheshkov approved these changes

View reviewed changes

Member

mcheshkov left a comment

LGTM, left some notes, nothing critical

rust/cubesql/cubesql/src/compile/engine/udf/extension/xirr.rs

+                  };
+                  let signature = Signature::one_of(
+                      type_signatures,
+                      Volatility::Volatile, // due to the usage of [`f64::powf`]

Member

mcheshkov Apr 24, 2025

I think it should be Immutable - given same inputs it should generate same output, so it is OK to lift that calculation. It's safe to keep it Volatile, it should just deny some optimizations in DF.

Non-determinism in floating point precision does not turn functions into voltaile:
BuiltinScalarFunction::Log10 or BuiltinScalarFunction::Cos are Volatility::Immutable, but implemented with regular f64::log10 and f64::cos, which are non-deterministic due to different host and target platforms, compiler version and what not. It's not like now or random, where results are expected to be different.

https://github.com/apache/datafusion/blob/5eb0968fb4b110c75cb560837807a2dad026bed3/datafusion/functions/src/math/mod.rs#L178-L185

https://github.com/apache/datafusion/blob/5eb0968fb4b110c75cb560837807a2dad026bed3/datafusion/functions/src/macros.rs#L183

https://github.com/apache/datafusion/blob/5eb0968fb4b110c75cb560837807a2dad026bed3/datafusion/functions/src/macros.rs#L228-L232

https://doc.rust-lang.org/std/primitive.f64.html#method.log10

rust/cubesql/cubesql/src/compile/engine/udf/extension/xirr.rs

+                                  date_type.clone(),
+                              ]));
+                              // Signatures with `initial_guess` argument; only [`DataType::Float64`] is accepted
+                              const INITIAL_GUESS_TYPE: DataType = DataType::Float64;

Member

mcheshkov Apr 24, 2025

Minor nit - I'd lift INITIAL_GUESS_TYPE closer to NUMERIC_TYPES

rust/cubesql/cubesql/src/compile/engine/udf/extension/xirr.rs

+                  }
+                  fn update_batch(&mut self, values: &[ArrayRef]) -> Result<()> {
+                      let payments = cast(&values[0], &DataType::Float64)?;

Member

mcheshkov Apr 24, 2025

Given that implementation always casts payments to Float64 would not be enough to just declare first argument as Float64 in signature, and rely on DF to insert coercion in calls?

Same for dates and on_errors

Member Author

MazterQyou Apr 24, 2025 •

edited

Loading

@mcheshkov DF would throw errors when using non-specified types in this case. If, say, on_error could only accept Float64, then specifying argument as 0 instead of 0.0 would produce Int32 and would error on logical plan building. I wanted to avoid forcing the user to explicitly cast all the arguments, especially with dates, considering Cube's time type is Timestamp.
With signature, there doesn't seem to be any automatic coercion, but I might be wrong.

rust/cubesql/cubesql/src/compile/engine/udf/extension/xirr.rs

+                          self.add_pair(payment, date)?;
+                      }
+                      let values_len = values.len();
+                      if values_len < 3 {

Member

mcheshkov Apr 24, 2025

Minor nit - feels like it could be a bit simpler with values.get(2).map(...)

rust/cubesql/cubesql/src/compile/engine/udf/extension/xirr.rs

+                      Arc::new(|| Ok(Box::new(XirrAccumulator::new())));
+                  let state_type: StateTypeFunction = Arc::new(|_| {
+                      Ok(Arc::new(vec![
+                          DataType::List(Box::new(Field::new("item", DataType::Float64, true))),

Member

mcheshkov Apr 24, 2025

Let's use different field names here for different components of state

Member Author

MazterQyou Apr 24, 2025

The item field name seems to be static for DataType::List in DF. Looking through the code, there doesn't seem to be any List where the field name would differ; I believe different field names are for maps or similar structures.

rust/cubesql/cubesql/src/compile/engine/udf/extension/xirr.rs

+                      for (payment, date) in payments.into_iter().zip(dates) {
+                          self.add_pair(payment, date)?;
+                      }
+                      let states_len = states.len();

Member

mcheshkov Apr 24, 2025

No need to check different states length here, it should always be 4, as returned from Accumulator::state()

Member Author

MazterQyou Apr 24, 2025

Yes, you are correct. I also noticed this mistake when extending the function with optional arguments, and then forgot to fix it before publishing the PR. I'll get rid of those checks in some future chore.

rust/cubesql/cubesql/src/compile/engine/udf/extension/xirr.rs

+                      const MAX_ITERATIONS: usize = 100;
+                      const TOLERANCE: f64 = 1e-6;
+                      const DEFAULT_INITIAL_GUESS: f64 = 0.1;
+                      let Some(min_date) = self.pairs.iter().map(|(_, date)| *date).min() else {

Member

mcheshkov Apr 24, 2025

Minor nit - IMO min_by_key() looks nicer

rust/cubesql/cubesql/src/compile/engine/udf/extension/xirr.rs

+                              }
+                              let rate_positive = 1.0 + rate_of_return;
+                              let denominator = rate_positive.powf(*year_difference);
+                              net_present_value += *payment / denominator;

Member

mcheshkov Apr 24, 2025

I think, this can be more accurate (and, probably, performant) with FMA: net_present_value = payment.mul_add(denominator.recip(), net_present_value);
IDK if LLVM can do this for us, but I think it would not.

And same for derivative_value, but I'm not sure how to handle multiple multiplication

rust/cubesql/cubesql/src/compile/engine/udf/extension/xirr.rs

+                              if *payment == 0.0 {
+                                  continue;
+                              }
+                              let rate_positive = 1.0 + rate_of_return;

Member

mcheshkov Apr 24, 2025

Minor nit - rate_positive is same for all pairs, can be lifter out of inner loop

rust/cubesql/cubesql/src/compile/engine/udf/extension/xirr.rs

+                              let rate_positive = 1.0 + rate_of_return;
+                              let denominator = rate_positive.powf(*year_difference);
+                              net_present_value += *payment / denominator;
+                              derivative_value -= *year_difference * *payment / denominator / rate_positive;

Member

mcheshkov Apr 24, 2025

Just an observation - *year_difference * *payment is same for pair, and does not change between iterations, so we can trade memory for CPU here

srh mentioned this pull request

feat(cubestore): Add XIRR aggregate function to Cube Store #9520

Open

4 tasks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet