Search data from a data set without reading each element

0 votes
asked Sep 11, 2017 by user77108

I have just started learning algorithms and data structures and I came by an interesting problem.
I need some help in solving the problem.

There is a data set given to me. Within the data set are characters and a number associated with each of them. I have to evaluate the sum of the largest numbers associated with each of the present characters. The list is not sorted by characters however groups of each character are repeated with no further instance of that character in the data set.
Moreover, the largest number associated with each character in the data set always appears at the largest position of reference of that character in the data set. We know the length of the entire data set and we can get retrieve the data by specifying the line number associated with that data set.
For Eg.

C-7  
C-9  
C-12  
D-1  
D-8 
A-3  
M-67  
M-78  
M-90  
M-91  
M-92   
K-4  
K-7  
K-10  
L-13  
length=15  
get(3)= D-1(stores in class with character D and value 1)  

The answer for the above should be 13+10+92+3+8+12 as they are the highest numbers associated with L,K,M,A,D,C respectively.
The simplest solution is, of course, to go through all of the elements but what is the most efficient algorithm(reading the data set lesser than the length of the data set)?

2 Answers

0 votes
answered Sep 11, 2017 by bricky

You'll have to go through them each one by one, since you can't be certain what the key is.

Just for sake of easy manipulation, I would loop over the dataset and check if the key at index i is equal to the index at i+1, if it's not, that means you have a local max.

Then, store that value into a hash or dictionary if there's not already an existing key:value pair for that key, if there is, do a check to see if the existing value is less than the current value, and overwrite it if true.

0 votes
answered Sep 13, 2017 by anony-mousse

While you could use statistics to optimistically skip some entries - say you read A 1, you skip 5 entries you read A 10 - good. You skip 5 more, B 3, so you need to go back and also read what is inbetween.

But in reality it won't work. Not on text.

Because IO happens in blocks. Data is stored in chunks of usually around 8k. So that is the minimum read size (even if your programming language may provide you with other sized reads, they will eventually be translated to reading blocks and buffering them).

How do you find the next line? Well you read until you find a \n...

So you don't save anything on this kind of data. It would be different if you had much larger records (several KB, like files) and an index. But building that index will require reading all at least once.

So as presented, the fastest approach would likely be to linearly scan the entire data once.

Welcome to Q&A, where you can ask questions and receive answers from other members of the community.
Website Online Counter

...