The problem is a variant of subarray counting. Given an array of numbers, let's say, `1,2,2,3,2,1,2,2,2,2`

I look for subarrays and count the frequency of each. I start with looking from some `K`

length subarrays (example K = 3).

Count of subarray `1,2,2`

is `C1:2`

.

Count of subarray `2,2,3`

is `1`

.

Count of subarray `2,3,2`

is `1`

.

and so on.

Now, I look for subarrays of length 2.
Count of subarray `1,2`

is `C2: 2`

. But (1,2) is a subset of the subarray `1,2,2`

. So, I calculate its count by subtracting `C1`

from `C2`

which gives count of `1,2`

as 0. Similarly, count of `2,2`

is `1`

.
My problem is in handling cases where more than one parent subset exists. I don't consider the sub-arrays in my result set whose frequency comes out to be 1. Example:
`1,2,3,1,2,3,1,2,2,3`

Here, Count of `1,2,3`

is `2`

.

Count of `2,3,1`

is `2`

.

Now, when I look for count of `2,3`

, it should be 1 as all the greater length parents have covered the occurrences. How shall I handle these cases?

The approach I thought was to mark all the pattern occurrences of the parent. In the above case, mark all the occurrences of `1,2,3`

and `2,3,1`

. Array looks like this:

`1,2,3,1,2,3,1,2,2,3`

`X,X,X,X,X,X,X,2,2,3`

where X denotes the marked position. Now, frequency of `2,3`

we see is 1 as per the positions which are unmarked. So, basically, I mark all the pattern occurrences I find at the current step. For the next step, I start looking for patterns from the unmarked locations only to get the correct count.

I am dealing with the large data on which this seems a bit not-so-good thing to do. Also, I'm not sure if it's correct or not. Any other approaches or ideas can be of big help?